Search This Blog

Tuesday, December 18, 2012

Data-related sessions at ALA-Midwinter

ALA-Midwinter is coming up in Seattle, January 25-29, 2013, and there are several session related to data curation, management, literacy, and other related issues. This list no doubt leaves something out, post any additions to the comments.

OCLC Americas Member Meeting and Symposium
Friday, January 25
11:00 am – 2:00 pm | Red Lion Hotel Fifth Avenue, Emerald Ballroom
GIS Discussion Group & Map Collection Management Discussion Group
W Seattle Hotel, Great Room 2BC
Saturday, January 26 8am - 11:30a

ACRL ULS Campus Administration & Leadership Discussion Group
Topic: Planning & Partnerships for Research Data Services
Saturday, January 26, 2013, 3:00p-4:00p
Westin Seattle (Denny/Mercer)

OCLC Linked Data Roundtable
Saturday January 26, 10:30a-12:00p
Convention Ctr Room 213

Digital Literacy Forum: Setting the Agenda
Saturday January 26, 3:00p-4:00p
Washington State Convention Center, TCC304

Top Technology Trends
Topic: "If Data I Created Resides in a Cloud Environment, Is It Still Mine?"
Sunday, January 27, 2013, 10:30a to 11:30a
Washington State Convention Center, Room 606-607

Linked Library Data
Sunday, Jan 27, 10:30-11:30
Washington State Convention Center, Room 205

ACRL WGSS Section is sponsoring a digital humanities discussion group.
Sunday, January 27, 2013 - 10:30a to 11:30a
Westin Seattle Hotel Cascade

ACRL Digital Curation Interest Group
Sunday, Jan. 27, 1-2:30 PM
Westin Seattle Hotel, Denny/Mercer

Digital Literacy Task Force Meeting
Sunday, Jan. 27, 1-2:30 PM
Washington State Convention Center, Room 307-308

Library Technology Challenges: Woes and Wows
Sunday, Jan. 27, 1-2:30 PM
Washington State Convention Center, TCC301

ACRL Digital Humanities Discussion Group
Sunday, Jan 27, 4:30-5:30p
Westin Seattle Hotel, Fifth Ave Room

Tuesday, December 11, 2012

Big data vs data mining vs statistics vs etc. FAQ

Love this link from Flowing Data to an article by William Briggs (he bills himself on his blog as Statistician to the Stars!). In the article, Briggs answers questions FAQ-style to talk about the differences between big data, data mining, statistics, probability, etc. He's got a good sense of humor, and is clear about what he sees as distinguishing characteristic of each field.

Given that the hype about "big data" lately seems about ready to jump the shark, I love his definition. While he acknowledges that vast amounts of data are interesting for the facts contained within, Big Data is not likely to save us from ourselves:
What is big data?
Whatever the labeler wants it to be; data that is not small; a faddish buzz word; a recognition that it’s difficult to store and access massive databases; a false (but with occasional, and temporary, bright truths) hope that if characteristics down to the microsecond are known and stored we can predict everything about that most unpredictable species, human beings. See this Guardian article. See also false hope (itself contained in the hubris entry in any encyclopedia).
Big data is a legitimate computer science topic, where timely access to tidbits buried under mountains of facts is a major concern. It is also of interest to programmers who must take and use these data in the models spoken of above, all in finite time. But more data rather than less does not imply a new or different philosophy of modeling or uncertainty.

Jer Thorp had similar things to say in a Harvard Business Review blog recently; he would like to see people have a better understanding of data ownership, along with more conversations about data and ethics. Oh, and he'd like to see data understood as an entirely new societal resource by bringing artists into the mix. Let the conversation begin.

Thursday, December 6, 2012

Interview with Stephanie Wright #IDCC13

UW's Data Services Coordinator Stephanie Wright will be speaking at the upcoming 8th International Digital Curation Conference in Amsterdam in January 2013. The conference brings together folks who create, manage and use information, and those who research and teach about the curation process. This year's theme, "Infrastructure, Intelligence, Innovation: driving the Data Science Agenda," has invited speakers from around the globe to discuss issues such as stewardship in the marine sciences, data stories from the business world, supporting data-intensive research, and many others.

In preparation for her participation in the symposium, titled "What is a Data Scientist?," the conference organizers interviewed Wright and asked her opinion on pressing issues, her thoughts of types of data, and more. Follow the conference January 14-16 via #idcc13.

Wednesday, December 5, 2012

UW Data Management Guide gets recognized!

Our Data Management Guide got a nice write-up from Kevin the Librarian, an NLM librarian and archivist who has been compiling a list of libraries that have data management guides. Listed as one of his top five along with the University of Minnesota, MIT, California Digital Library, and Purdue, each resource listed offers a little something different, from examples of data management plans, data planning checklists, and information on the importance of data sharing.

If you want to keep reading, he has another excellent post from July 2012 on data curation and where librarians fit in. He talks about our skill set and why we're ready-made to help researchers take care of their data. He also summarizes some nice tidbits from the data curation lifecycle that will be familiar to anyone who's helped someone archive their research.

Tuesday, November 20, 2012

Data sharing and journals

Nice post from Carly Strasser at California Digital Library about data sharing policies in journals, including a list of which journals follow the Joint Data Archiving Policy from Dryad. Although the journals that require data archiving began with those in the field of evolution, other discplines' journals are coming on board. 

Wednesday, October 24, 2012

Webinar on Text Mining

The Center for Research Libraries is hosting a webinar on text-mining in a few weeks.  It will explore trends in text mining and how publishers and libraries are responding to the challenges that come with it.  Topics to be addressed (from the announcement):
  • the types of resources being “mined”, including e-journal databases and digitized newspapers and archives
  • recent text-mining projects
  • the challenges and issues these present for database publishers
  • what role, if any, libraries can play to support these activities
  • what new services are envisioned and what is in the pipeline
The webinar will take place online Tuesday, Nov. 13th 11a - 12:30p Pacific Time.  You can register here (may need to sign up for free account first):

Friday, October 5, 2012

Special Report About Missing Data in Clinical Trials

New England Journal of Medicine just released a special report titled "The Prevention and Treatment of Missing Data in Clinical Trials".  The research indicated that trial design, flexible treatment regimens, better follow-up and use of more scientific methods to adjust for missing data could improve results for clinical trials.

To read a review of the article: 

Here's the citation for the original study: N Engl J Med 2012; 367:1355-1360; October 4, 2012 DOI: 10.1056/NEJMsr1203730
UW subscribes to NEJM and you can access the study from a campus computer or through the Libraries proxy server.

(Thanks, Cynthia)

Thursday, October 4, 2012

New Tool to Help Manage Data: DataUp

Microsoft announced the release of a new open-source tool to help researchers "document, manage and archive" tabular data.  Best part is that it can be used as a web app or as an add-in for Microsoft Excel.

Per the release, whether you go for the Excel extension or the online app, DataUp can help you with four main tasks:
  1. Perform a best-practices check to ensure good data organization
  2. Guide users through creation of metadata for their Excel file
  3. Help users obtain a unique identifier for their dataset
  4. Connect users to a major repository, where their data can be deposited and shared with others
Read more about the tool and check it out for yourself here:

Tuesday, September 25, 2012

Data Scientists are Sexy

Okay so that's not exactly what the article says but close enough.

The most recent issue of Harvard Business Review published an article titled "Data Scientist: The Sexiest Job of the 21st Century".  Well worth the read as it talks about how data scientists are the hot new job because there is high demand and not many of them out there.  It also discusses the combination of skills that are needed, the companies that are recruiting for them and how they're being used.  The are some interesting comments, as well.

Tuesday, September 11, 2012

ICPSR Data Fair On Election Data

ICPSR will be hosting their online 2012 Data Fair on October 1-3.  This year's theme is "Analyzing Election Data with ICPSR" and "the series of webcasts will focus on election data held in ICPSR's archives, and how to use them for analysis and teaching."

From the announcement:
The event is designed for the social sciences data community at large including researchers, librarians, teaching faculty, students, and policymakers from around the world who are interested in the use of social science data.
The first day will provide an orientation to ICPSR's services, including a tutorial on navigating our newly redesigned Web site. Other topics will include the American National Election Studies, minority voting behavior, and using election data in classroom instruction.
The event is free and open to everyone. The tentative schedule of sessions, with links to register for the webcasts, is available here (NOTE: all times are Eastern):

Tuesday, August 21, 2012

Teaching Secondary Analysis of Qualitative Data

If you're going to be in the vicinity of Essex, UK in September, the Economic and Social Data Service (ESDS) is hosting a free half-day workshop at the UK Data Archive "aimed at best practices in teaching qualitative analysis of secondary data."

For more information, check out their webpage:

Wednesday, August 15, 2012

Symposium on Global Scientific Data Infrastructure

If you plan to be in the Washington, DC area on August 29th, the National Academy of Sciences is hosting a symposium on Global Scientific Data Infrastructure.  For those of us for whom that would be too far a jaunt for a meeting that would be less than 3 hours (3p-5:45p Eastern), there is the option to participate in a real-time audio-only webcast of the proceedings.  They will also archive that version and make it available on the Board on Research Data and Information's website afterward.

From the announcement:
The Forum will examine potential near-term actions and outcomes that can serve as a focus for community efforts toward a global organization for the exchange of scientific data, initially referred to as the Data Web Forum (DWF). The BRDI Forum will facilitate discussion of the following questions:
1. What useful short-term efforts and deliverables could a global scientific community organization take on that would facilitate data-driven interoperability? Are there any low-hanging fruits or some common elements or approaches that should be addressed early in the process?
2. What stakeholder communities are essential to success or to the implementation of the deliverables raised in #1? (see [1] below) What could be done in the near term by such an organization to promote effective participation by these communities?

On the day of the event, a link will be posted on the National Academies website ( Webcast listeners will be able to listen either through Windows Media Player or through RealPlayer. 

For more information including a detailed agenda:

Monday, August 13, 2012

Big Data Got A Mighty Voice

I've made no secret that I'm not overly fond of the term "Big Data" but like it or not, it's a term that has stuck and seems to be making folks aware of the issues surrounding data management.

A large part of my annoyance at the term is because it is so vague.  Thankfully a colleague sent me the following article and it's the best I've read yet describing "Big Data".

How Big Data Became So Big - New York Times (August 11, 2012)

Thursday, August 9, 2012

NISO Forum on Managing & Citing Research Data

Early bird registration is now open for the upcoming NISO Forum: Tracking it Back to the Source: Managing and Citing Research Data being held in Denver, CO on September 24, 2012.

There's a great lineup of presenters including Joan Starr from CDL, Jim Mullins from Purdue, Mark Parsons from NSIDC and a keynote from Allen Renear from U Ill-UC.  That's just looking at the first few sessions.

Early registration is available until Sept 10 with discounts for NISO members and students.  Check out the website for more information:

New CLIR Report: The Problem of Data

The Council on Library and Information Resources has released a new report on data titled "The Problem of Data".

From the abstract:
Jahnke and Asher explore workflows and methodologies at a variety of academic data curation sites, and Keralis delves into the academic milieu of library and information schools that offer instruction in data curation. Their conclusions point to the urgent need for a reliable and increasingly sophisticated professional cohort to support data-intensive research in our colleges, universities, and research centers.
For more info:
If you want to go straight to reading it, here's the PDF download link (1.3 MB):

Thursday, August 2, 2012

2nd Draft of Creative Commons License 4.0

The 2nd draft of the latest Creative Commons license version (4.0) is now available for public comment.

Here's a blog post about it:
You can view the license text and a comparison chart between the first draft and this one here: Drafts and related documentation
And you can find a summary of changes and explanations here: Draft 2 Public Discussion Page

Monday, July 30, 2012

Databib: A Registry of Data Repositories

Thanks to funding from the Institute of Museum and Library Services and Purdue University Libraries, there is now a tool to help you identify and locate online repositories for research data.

From the announcement:

Over 200 data repositories have been cataloged in Databib, with more being added every week. Users and bibliographers create and curate records that describe data repositories that can be browsed and searched.
  • What repositories are appropriate for a researcher to submit his or her data to?
  • How do users find appropriate data repositories and discover datasets to meet their needs?
  • How can librarians help patrons locate and integrate data into their research or learning?
For more information or to check the listing out for yourself, go to:

Friday, July 20, 2012

Pew Report on Big Data

The Pew Research Center's Internet and American Life Project released a report on "Big Data".  The report is based on a survey of 1,021 "Internet experts and other Internet users".

You can find the report and information about the survey here:

Thursday, July 19, 2012

Lectures from Digital Humanities 2012

The Digital Humanities 2012 conference was held in Hamburg, Germany this week. From the conference website:
Digital Humanities is the annual international conference of the Alliance of Digital Humanities Organizations (ADHO). ADHO is an umbrella organization whose goals are to promote and support digital research and teaching across arts and humanities disciplines, drawing together humanists engaged in digital and computer-assisted research, teaching, creation, dissemination, and beyond, in all areas reflected by its diverse membership.
The conference program is located here:
It includes links to abstracts and where you see a link for "Lecture2Go" there are recorded presentations you can watch.  A few titles that caught my eye include:

"The potential of using crowd-sourced data to re-explore the demography of Victorian Britain"
"Social Network Analysis and Visualization in The Papers of Thomas Jefferson"
and "Uncertain Date, Uncertain Place: Interpreting the History of Jewish Communities in the Byzantine Empire using GIS"

Those were just a few.

Tuesday, July 17, 2012

Presentations from Research Data Access & Preservation Summit

Thanks to Tina Jayroe from U of Wisc-Milwaukee, the first batch of video presentations from the 2012 Research Data Access & Preservation Summit (RDAP) are now available on YouTube.  The conference website can give you more information about the presentations listed below:

Bill Anderson, University of Texas, Austin [RDAP Welcome & Introductions]:

Data Management Plans and Policies panel, moderated by Reagan Moore

Suzanne Allard, DataOne:
Dave Fellinger, Data Direct Networks:
Carol Beaton Meyer, Earth Science Information Partners (ESIP):
Reagen Moore, DataNet Federation Consortium:
Aletia Morgan, Rutgers University Community Repository:
Ryan Stearns, Texas Digital Libraries:
Peter Wittenburg, European Data Infrastructure (EUDAT):

Data Citation panel, moderated by Joe Hourcle
Joe Hourcle, Solar Data Analysis Center, NASA:
Paul Uhlir, National Academy of Sciences (part 1)
Paul Uhlir, National Academy of Sciences (part 2)

More videos to be posted as time permits.

*UPDATE 7/19/12* More videos posted:

Curation Service Models panel, moderated by Matt Myernick
David Minor, University of California, San Diego:
Barbara Pralle, John Hopkins University:
Michael Witt, Purdue University

SIG-DL Sustainability panel, moderated by Gail Steinhart & Susan Wells Parham
Robert McDonald, Indiana University Libraries:
Oya Rieger, Cornell University Library, arXiv:
Peggy Schaeffer, Dryad Digital Repository:

Training Data Management Practitioners Panel, moderated by Bill Anderson
Kirk Borne, George Mason University:
Peter Fox, Rensselaer Polytechnic Institute:
Jian Qin, Syracuse University:

Reagan Moore [RDAP Summary and Wrap-up]:

Monday, July 16, 2012

New Geoscience Data Journal

The Royal Meteorological Society has partnered with Wiley to publish an open access journal which "will publish short, earth science data papers cross-linked to datasets that have been deposited in approved data centres and awarded DOIs."

From the announcement:
Geoscience Data Journal is online-only and will publish short data papers (articles describing a dataset, giving details including collection, processing, software and file formats) covering topics ranging from weather and climate, to oceanography, atmospheric chemistry and geology. All published data papers will be linked to datasets, which provide details of the collection, processing and file formatting of data.

“Issues around provenance, curation, recognition and discovery of data have always been important, but never as much as over recent years,” said Professor Paul Hardaker, Chief Executive of the Royal Meteorological Society. “Being able to publish data in a peer-reviewed journal not only helps to address many of these challenges, but for the first time will help to recognise the contribution that data and those scientists that work with data make to the wider community.” 
The journal is now accepting papers.  You can find out more information here: http://onlinelibrary.wiley.

Monday, July 9, 2012

Latest Readings Related to Data

Several readings made available recently related to data:

UK Cabinet Office published an Open Data White Paper:
"sets out how we’re putting data and transparency at the heart of government and public services."

UK Information Commissioner released a "consultation on the draft Anonymisation code of practice"
"The code of practice will provide guidance on how to assess the risks of identification and how information can be successfully anonymised."

The Royal Society released a report "Science as an Open Enterprise"
"highlights the need to grapple with the huge deluge of data created by modern technologies in order to preserve the principle of openness and to exploit data in ways that have the potential to create a second open science revolution."