Search This Blog

Monday, December 2, 2013

UW Collaborating on $37M Data Science Initiative

Big news for the University of Washington: UW, along with NYU and Berkeley, has been given 5-year, $37.8M award from the Gordon and Betty Moore Foundation and the Alfred P. Sloan Foundation to advance the growth of data-intensive discovery across a broad range of fields. It's a huge, cross-institutional and multi-disciplinary effort that will build and explore new data science challenges and environments.

The UW team includes more than a dozen faculty, and is led by Ed Lazowska, Director of the UW eScience Institute. Berkeley's team is led by Saul Perlmutter and NYW's by Yann LeCun.

Fernando Perez from Berkeley has written a good description of his hopes for the project, what he thinks it means and what he thinks it might help solve (hint: it involves more than just data science). 

Monday, September 16, 2013

New Reports on Data Archiving and Citation

Two new reports have been published that deal with data issues in research, from proper documentation and archiving, through use of data in research and publication, down to citation. The first is the brief but concise Lost Science: Protecting Data Through Improved Archiving by Karen E. Simmons ( This short but on-point article uses concrete examples from NASA data to show what can happen when digital data isn't properly documented, when documentation and formatting standards aren't followed or change rapidly, and the potential loss to science and society at large when bountiful, important, and historic information is lost.

The second report is from the U.S. CODATA and the Board on Research Data and Information (BRDI): Out of Cite, Out of Mind: The Current State of Practice, Policy and Technology for the Citation of Data ( From the abstract: "This report discusses the current state of data citation practices, its supporting infrastructure, a set of guiding principles for implementing data citation, challenges to implementation of good data citation practices, and open research questions." This is the second report on data citation issues from this group: the first, For Attribution-Developing Data Attribution and Citation Practices and Standards (2012), is available from the National Academies Press online at:

Wednesday, September 4, 2013

Upcoming Data-related Webinars

There are three upcoming webinars that may be of interest to data-minded folks:

  • DuraSpace is hosting Stewarding Research Data with Fedora and Islandora, September 10, 2013, 11am-12pm Eastern. Mark Leggott from the University of Prince Edward Island and founder of the open source Islandora project will be speaking. From the blurb: “In one example at UPEI, Islandora tools are being built to sync data from systems like DropBox and Google Drive to Fedora, providing immediate preservation services for any arbitrary collection of data. This Physical Data Model is intended to provide a quick and seamless integration with Islandora where the researchers can subsequently enrich and optionally choose to share their data with others. In another example the Smithsonian is applying a set of Intellectual Data Models to steward research output from a variety of projects. In this case data is ingested into Islandora against a domain-specific data model that applies specific metadata forms, data transformations and data viewers to make the data more accessible immediately on ingest. Register here:
  • NISO is hosting a two-part webinar on Research Data Curation. Part 1 is on E-Science Librarianship (September 11, 1-2:30pm Eastern), and will discuss “new librarian strategies, tools, and technologies developed to support the lifecycle of scholarly production and data curation. Specific challenges that face research libraries will be described and potential responses will be explored, along with a discussion of the types of skills and services that will be required for librarians to effectively curate research output.”  Registration is here: 2 is on Libraries and Big Data (September 18, 1-2:30pm Eastern), and will explore librarians and their role in data curation: “There are many challenges to effectively manage and curate this data—challenges that are both similar and different to managing document archives. Libraries can and are assuming a key role in making this information more useful, visible, and accessible, such as creating taxonomies, designing metadata schemes, and systematizing retrieval methods. Our panelists will talk about their experience with big data curation, best practices for research data management, and the tools used by libraries as they take on this evolving role. Registration is here:
  • The National Research Council's Board on Research Data and Information will be hosting a public symposium titled Privacy in a Big Data World, September 23, 3-5:30pm Eastern. The symposium will discuss such issues as providing adequate privacy protection for individuals without impeding research and innovation, how different regulatory approaches to privacy impact national and transnational research, and how society’s perspective on privacy is evolving.More detail can be found here:

Wednesday, July 24, 2013

Reports from Open Repositories 2013 and IASSIST2013

Two UW Libraries Data Services Team members were able to attend recent conferences related to data services: Open Repositories 2013, and the International Association for Social Science Information Services and Technology 2013. 

OR2013 was held on Prince Edward Island from July 8th-11th. Stephanie Wright attended two workshops looking at the future of institutional repositories and how institutional repositories deal with data. Plenaries by Victoria Stodden and Jean-Claude Guedon were inspiring and both were focused (in different ways) on research reproducibility and scholarly communications and altmetrics.  The tweets were flowing so if you'd like to read the thoughts of attendees, check out the Twitter hashtag #OR2013, or you can check out the summarized version on Storify, and a summary of the conference from an IASSIST perspective at

Jennifer Muilenburg attended IASSIST2013 in Cologne, Germany, from May 28-31. Presentations included several on training researchers on research data management, different approaches to institutional repositories, issues around data collection in university libraries, access to restricted data, and lots more. A brief summary of the conference can be found here: Due to spotty wifi, the tweets were lacking, but some of what made it through includes links to presentations.

Wednesday, July 10, 2013

Upcoming Conferences and Sessions and Workshops, Oh My!

Since beginning to follow groups and conferences relevant to data management issues for information professionals in November 2012, I've known that there is always something upcoming in the not-too-distant future that looks fascinating and informative. There is a panoply happening right now, though, that could have us all booked out of the office for the bulk of 2013 (given inexhaustible travel budgets, that is). Here are a few upcoming events that have caught my eye:

  • Happening right now is Open Repositories 2013 (#or2013) in Prince Edward Island, CA, July 8-12. I've been following the twitter feed via the hashtag and Storify; lots of interesting talk going on around data policies, data curation methods and technologies, the research lifecycle...
  • The University of East London has been training various types of staff on research data management over the last year. They're summarizing some of their work at a daylong workshop, "Support for support: training those in RDM support roles," July 16, London, UK. I'm currently working my way through some of UEL's online curriculum offerings for librarians, and very much wish I could be there for this session.
  • For those interested in the metadata side of scientific data, Camp-4-Data in Lisbon, Portugal, on September 6, will be exploring many facets of metadata standards used to manage scientific data. This is being held just before iPres, the 10th International Conference on the Preservation of Digital Objects, and DCMI, the International Conference on Dublin Core and Metadata Applications. Head in a whirl yet?
  • The HathiTrust Research Center UnCamp 2013, September 8-9, Urbana, IL, is targeted to digital humanities tool developers, researchers and librarians of HathiTrust institutions, and will include hands-on coding and demonstration, use cases, and community building in an un-conference programmming format. Register early to help form the program.
  • Data Information Literacy Symposium, West Lafayette, IN, September 23-24. This workshop will "explore roles for practicing librarians in teaching competencies in data management and curation to graduate students." Registration for this is currently full, but following via twitter should be interesting.
  • The Digital Humanities Data Curation Workshop is being held in College Park, MD, October 16-18. Their resource guide is a great place to start if you can't attend one of their workshops.
  • The 2013 Digital Library Foundation Forum, November 4-6 in Austin, TX. Proposed sessions include one on using a CRM tool to track data management services in an academic library, one on the influence of faculty rank on attitudes toward research data management, several presentations on encouraging better and more specific use of metadata, fostering a culture of data sharing among researchers, data management education for librarians and researchers...

I'm sure there are others out there that I missed; if you have a suggestion, please add it in the comments below.

Tuesday, June 11, 2013

Data-related Sessions at ALA

Many thanks to Lynda Kellam at UNC-Greensboro for compiling a list of data-related sessions on the schedule for #ALA2013. These cover data curation, GIS, scholarly impact, information literacy and more! If you have anything else to add, please do so in the comments.

Data, E-Data, Data Curation: Our New Frontier
Saturday, June 29, 2013 - 8:30am to 10:00am
McCormick Place Convention Center S501bcd
Moderator: Abigail Goben, Asst Information Services Librarian, University of Illinois Chicago
Moderator: Sarah Sheehan
Speaker: Dorothea Salo
Speaker: James Mullins
Speaker: Joan Starr
Speaker: Robert Sandusky
Data management and curation may be a great new opportunity but how are libraries tackling it? We already know how to archive traditional materials but what do we do with terabytes of faculty research data? How do we manage that data set for our students' research? Join us for a big picture view of the issues surrounding e-data collection and access from Joan Starr, Jim Mullins, Dorothea Salo, and Robert Sandusky. Bring questions to help you identify opportunities and challenges already happening on your campus.

Map and Geospatial Information Round Table (MAGIRT)/GODORT GIS Discussion Group
Saturday, June 29, 2013 - 8:30am to 10:00am
McCormick Place Convention CenterS504a
Moderator: Angela Lee, Libraries and Museums Manager, ESRI
Moderator: Tracey Hughes, Librarian, CMC Alpine Campus Library
The GIS Discussion Group begins at 8:30 am and focuses on topics related to geographic information systems (GIS). In addition to discussion topics brought forward by session attendees, there will also be a focused topic discussion on education for geospatial librarians. This is a co-sponsored meeting with GODORT.

The Research Footprint: Libraries Tracking and Enhancing Scholarly and Scientific Impact
Saturday, June 29, 2013 - 8:30am to 10:00am
McCormick Place Convention Center N427bc
Speaker: Cathy Sarli, Scholarly Communications Specialist, Becker Medical Library
Speaker: Jason Priem, Impact Story, UNC Royster Scholar, University of North Carolina at Chapel Hill
Speaker: Kristi Holmes, Bioinformaticist, Becker Medical Library
Speaker: Rush G. Miller, The Hillman University Librarian, Director of the University Library System & Professor, University of Pittsburgh
Increasingly, libraries are building services designed to assess and improve the impact of their institutions' research activities. This is an increasingly important, but complex task as more and more scholarship is digitally shared and accessed through traditional and non-traditional pathways.
This program will offer knowledge about:
*The data and expertise libraries are using to track and enhance research dissemination.
*Library programs built upon this data and expertise.

The Census, your patrons and the DataFerrett
Saturday, June 29, 2013 - 3:00pm to 4:00pm
Hyatt Regency McCormick Place Burnham 23A-C
Speaker: Kendra Morgan
Speaker: Stephen R Laue
A hands-on workshop on Accessing Census Statistics (American Community Survey) with officials from US Census Bureau to learn how to use the DataFerrett which is an analytical and visualization tool that searches and retrieves data across federal datasets, and creates complex tabulations, business graphics and thematic maps. This workshop will demonstrate how to: browse the datasets accessible via The Data Web; select variables form datasets; create new variables from existing ones; and produce customized analyses using tables, graphs, and maps.

Numeric and Geospatial Data Services in Academic Libraries Interest Group (ACRL)
Saturday, June 29, 2013 - 4:30pm to 5:30pm
Hyatt Regency Chicago Skyway 269
Moderator: Lynda Kellam
This is the annual meeting of the Numeric and Geospatial Data Services in Academic Libraries Interest Group (Association of College and Research Libraries)
Hashtag: #DIGdata

Digital Curation Interest Group (ACRL)
Sunday, June 30, 2013 - 1:00pm to 2:30pm
McCormick Place Convention Center N135
The Digital Curation Interest Group is a group of mainly library- and archives-based practitioners who meet to discuss challenges, tools, user needs, use and reuse, etc., related to lifecycle management of digital data and content.

Building Financial Literacy Reference Skills
Sunday, June 30, 2013 - 2:30pm to 4:00pm
McCormick Place Convention Center Hall A, Exhibit Floor
Poster session
Author: Kristin McDonough, Director, New York Public Library
Author: Marzena Ermler, Manager of Professional Development, New York Public Library
Author: TJ Woods, Learning & Development Specialist, New York Public Library
This poster session will demonstrate the training methodology NYPL has developed in conjunction with financial experts to increase the financial education awareness among library staff and users. We will present the programmatic goals of the Money Matters Training Program that educate staff on core concepts of financial education and related reference sources. The program launched in February 2012 and the evaluative data collected indicates that staff members who participated in the program increased their comfort level in providing reference services in financial education, focusing on areas such as banking, credit, identity theft, and investing . The program also equips our staff with information about how to conduct financial education training in their communities. Visitors to our poster session will leave with access to a complete staff training curriculum including trainer’s guides, participant worksheets, ppt. presentations, and online e-learning modules that can be used to seamlessly replicate the Money Matters Pro financial literacy training at their own libraries. During the poster session we will present the Money Matters Pro website, clips of training classes, success stories, and samples of the training materials.

Connecting the Dots: Defining Scholarly Services in a Research Lifecycle Model
Sunday, June 30, 2013 - 2:30pm to 4:00pm
McCormick Place Convention Center Hall A, Exhibit Floor
Author: Andrew Todd, Regional Campus Librarian, University of Central Florida
Author: Sai Deng, Metadata Librarian, University of Central Florida
Librarians at the University of Central Florida (UCF) Libraries have created a visual model depicting the cycle of research at an institutional level while embedding scholarly services into the flow. The Research Lifecycle is used as a basis to build a framework for the faculty research process and gain both support and funding for new infrastructure and services.
The UCF Research Lifecycle includes four sub-cycles: a planning cycle, a project cycle, a publication cycle and the 21st century digital scholarship cycle. In each cycle, supporting services are added to the research flow. Amid the existing services provided by different university units, potential services are added to bridge the missing links in the lifecycle. Some of these missing services include data hosting, research computing and an institutional repository. The center of each cycle shows corresponding activities related to sponsored or grant-funded research, which form an important part of the institution’s scholarly research activities.
This project was initiated by the university library’s Scholarly Communication Task force. During its development, librarians collaborated with other campus departments, including the Office of Research and Commercialization, the Institute for Simulation and Training, and the Faculty Center for Teaching and Learning. The lifecycle graphic and its related services are available at: Taskforce members also solicited feedback from teaching faculty, which yielded a wide variety of constructive comments and were incorporated into the model. The lifecycle at its current state has received widespread campus attention and generated interest from University administrators.

How to Teach and Assess Discipline-Specific Information Literacy (ACRL)
Monday, July 1, 2013 - 10:30am to 11:30am
McCormick Place Convention Center N427a
Speaker: Christina Connor, Instruction and Emerging Technologies Librarian, Ramapo College of New Jersey
Speaker: Nicholas Salter, Assistant Professor of Psychology, Ramapo College of New Jersey
Discipline-specific information literacy is an essential topic for all students to understand. Using Psychology courses as an example, this presentation will discuss faculty-librarian collaborative teaching approaches based on skill and age-level. This approach uses the new ALA/ACRL Psychology Information Literacy Standards. Quantitative assessment data will be discussed. Suggestions from this presentation will help all fields teach discipline-specific information literacy.

Wednesday, April 17, 2013

Public Comment on Access to Federally Supported Research and Development Data and Publications

In response to both the recent OSTP memorandum and the proposed bill (FASTR) that call for increased public access to data and publications resulting from federally funded research, a group of cooperating agencies and the National Research Council have organized two planning meetings held May 14-17 to gather stakeholder input (also included are "brief introductory addresses by a select few experts and summarizing commentary by equally few rapporteurs").

Two meetings will be held, one focused on publications (May 14-15) and the other on data (May 16-17). The public is invited to attend in person (at the National Academy of Sciences in DC) or via webcast, but registration is required. Attendees may also request time to present a verbal or written statement.

Sponsoring Agencies:
Department of Agriculture
Department of Commerce
National Institute of Standards and Technology
National Technical Information Service
National Oceanic and Atmospheric Administration
Department of Defense
Department of Education
Department of Energy
Department of Health and Human Services
Agency for Healthcare Research and Quality
Centers for Disease Control and Prevention
Department of the Interior
United States Geological Survey
Department of Transportation
Environmental Protection Agency
Institute of Museum and Library Services
National Aeronautics and Space Administration
National Endowment for the Humanities
National Science Foundation
Smithsonian Institution

Sunday, April 14, 2013

Thomson Reuters' Web of Science Presents: The Data Citation Index!

Last October, Web of Science (WoS) launched a new service known as the Data Citation Index (DCI). It allows users to track and discover data from tons of research projects. The DCI covers info for datasets from multidisciplinary and international repositories, and is housed within the WoS we already know and love. Researchers comfortable looking up papers in WoS can find related datasets, or vice versa. Collocating research data and papers is great step toward making datasets more discoverable.

Data discoverability through the DCI is good for researchers and librarians. Tracking citations of datasets gives researchers better and more consistent context of work relevant to their own. That enlarges their scope of understanding, reach of influence, and reduces the chances of performing redundant research. By tossing tons of data citations in together, the DCI also creates a context of metadata and attribution facets used for datasets. That helps data librarians create better data management plans for their repositories.

Citation tracking is a great thing for data management, but some crinkles still need to be smoothed out. An evaluative report by the University of Minnesota on their trial of the DCI highlighted some pros and cons. The metadata for different but apparently similar datasets contained equally ambiguous terms, for example. That musses up the sense of proper context and standardization the DCI intends to provide. Datasets themselves, however, are eminently discoverable thanks to WoS’s preexisting search and results refinement functions.

The DCI is notable step for data management and exposes some of its major challenges, such as: reining in variation between metadata, aggregating interdisciplinary repositories, and discoverability.

Guest Bloggers

You will see a few new names attached to the blog posts in the near future.  Jenny and I have opened up blog posts to interested bloggers from the University of Washington iSchool.  The first such post will be by Tal Noznisky posting a review on the Data Citation Index by Thomson Reuters.  Welcome, Tal!

Wednesday, April 3, 2013

Global Health Data Exchange Updates

The Global Health Data Exchange (GHDx) has announced a relaunch with enhanced navigation and additional features, including improved navigation, background information on countries, data series, and organizations. The advanced search allows users to search specific queries by geography, time, data type, keyword, and data source. Data can also be sorted on whether the actual research data is publicly and freely available. An announcement about the new features in version 2 is available online.

In addition, search results can now be sorted by the year that data collection started, and can be easily exported. Geographies now have a time component to reflect historical changes to country names or boundaries over (e.g., the Soviet Union).

Launched in March 2011 by the Institute for Health Metrics and Evaluation (IHME) at the University of Washington, the GHDx is the world’s largest catalog and repository of health-related data. It currently contains more than 8000 records with carefully researched information about data, including a standardized English title, local-language title, geography and time covered by the data, a suggested citation and information about current data providers. In addition, detailed keywords provide information about the topics that are covered by the data. Many datasets can be downloaded directly from the GHDx.

Monday, April 1, 2013

The Library "Reboot" in Nature

The current issue of Nature (Volume 495 Number 7442) is focused on changes in publishing; one article in particular highlights some of the current challenges to academic libraries, and what some organizations are doing to both remain at the University's core of research, as well as transform the way they deliver and store information.

"Publishing Frontiers: The Library Reboot" shares examples from some US, Australian and UK libraries. It covers how some libraries are focused on offering non-traditional ways to use and visualize the data and information housed by the library. It also discusses research data management, and how many libraries around the world either have or are planning to offer RDM to their campuses, in part as an extension of the information storage and retrieval that libraries have always been doing, and in part to stay central to the academic research mission. “I see us moving up the food chain and being co-contributors to the creation of new knowledge,” says Sarah Thomas, the head of libraries at the University of Oxford, UK.

Monday, March 4, 2013

The New OSTP Policy and Data

The OSTP policy changes last month that mandate greater access to federally funded research left us wondering what, exactly, it'll mean for federally funded research data. There have been several good blog posts written that summarize what the policy states and an interpretation of what the changes might mean in the realm of data.

The Scholarly Kitchen gives a history of how the policy came to be, deciphers what it means, and provides a list of agencies covered (NIH, CDC, FDA, ARHQ, NSF, NASA, DOE, USDA, FAA, FHWA, NIST, NOAA, USGS, EPA, DOD, VA, USAID, Dept. of Education, and the Smithsonian), and some first impressions of what it means for public access to funded research papers.

Carly Strasser from California Digital Library takes a look at the policy from both a scholarly article and data perspective, providing a short-and-sweet summary in plain English about potential changes from the policy.

And last, Kristin Briney spends time looking at what the policy means for data in particular.

Basically, what all this means is that data management plans will now be required of researchers on federal grants, and these plans should be supported by the various agencies. There is no particular mandate for sharing, just the "maximizing of access to research data." There is a lot of potential there for increases in data management plan creation and support, open repositories and greater access to the content therein. Agencies have 6 months from the announcement to create a policy; come August, there will be some interesting things to discuss.

Monday, February 25, 2013

DataONE Summer Internship Program

Data folks interested in spending part of their summer working on intensive data-and-science-related topics can now apply for one of eight open DataONE internship positions. The internships are available to undergraduates, graduates and post-graduates who have received their masters in the last five years, and there are no restrictions on field of study as long as prospective intern's qualifications and interests match a project. Each intern will be paired with a mentor, who does not have to be in the same location or institution. 2013 project titles include:

- Next Generation Data Environment: Semantically-Enabling the DataONE Metadata Environment
- Ontology Mappings in the Earth and Environmental Sciences
- Evaluation of Ontology Coverage for Curation
- Integrating Data Stories into DataONE Education and Community Engagement Products
- Data Policies for Public Participation in Scientific Research
- Bi-level Metadata Registry Development
- PBase: Provenance as a First-class Citizen in DataONE
- Build Fundamental Components for Provenance-aware Model Exploration, Evaluation, and Benchmarking Cyber-infrastructure Prototype
- A Visualization Tool for Provenance in DataONE

More detail about these topics and the internships is available at

Application deadline is March 17th 2013. The internship runs from May 27 - July 26 and interns will receive a stipend of $5,000.

The Data Observation Network for Earth (DataONE) is "a virtual organization dedicated to providing open, persistent, robust, and secure access to biodiversity and environmental data, supported by the U.S. National Science Foundation."

Finding the Needle in the Haystack: Discovering Research Data Online

On Tuesday February 26 (that's tomorrow, folks!) from 3-5:30PM EST/12-2:30PST, the Board on Research Data and Information will be hosting a symposium called "Finding the Needle in the Haystack: Discovering Research Data Online." As research becomes more data intensive and the amount of data produced continues to grow, knowing how to find relevant data is increasing in importance. Acknowledging the variety of solutions in use for storing and locating data, the symposium will also be looking at the issues of "pervasive infrastructure, standardization of approaches, and the usual questions of who does what, where, and how?" Speakers will include Clifford Lynch from CNI and Francine Berman from Rensselaer Polytechnic Institute.

The 2.5 hour discussion will be audiocast live; the address will be posted at about an hour before the event.

Monday, February 4, 2013

Digital Curation Conference videos online

For those of us who weren't in Amsterdam last month for the 8th International Digital Curation Conference, video for many of the presentations has been posted online

Included is UW's Stephanie Wright participating in a panel, "What is a Data Scientist?" She spoke as the Data Librarian, along with Data Steward Louise Corti from the UK Data Archive, Data Publisher Scott Edmunds from Gigascience, and Data Analyst Francine Bennett from Mastodon C. Slides of the presentations are also available, though not from Wright, who spoke without slides. 

Thursday, January 31, 2013

Praxis Talk: Demystifying the Digital Humanities

Paige Morgan and Sarah Kremen-Hicks will be talking about their initiative "Demystifying the Digital Humanities" at the next Praxis conversation on Feb. 12th at 12:30 in the Research Commons. The talk will focus on the concept of agency in the digital humanities, and will discuss teaching and modeling agency in an academic context. 

The talk is part of a series of lectures called "Praxis: Doing Scholarship Digitally," designed to create opportunities for researchers from a variety of disciplines to discuss digital tools in their work. The series is co-sponsored by the Simpson Center for the Humanities and the University Libraries.