Data @ Libs: January 2015

Monday, January 12, 2015

Special Journal Issue Focuses on Data Literacy and Librarians

Time for some reading: the latest issue of the Journal of eScience Librarianship focuses on the role of librarians in data literacy. Included are articles on data management education initiatives, designing RDM curriculum for librarians and graduate students, as well as some case studies from different institutions that used the New England Collaborative Data Management Curriculum in order to teach RDM to various constituencies.

Also featured is an "eScience in Action" piece titled Lessons Learned from a Research Data Management Pilot Course at an Academic Library, from the UW's own Mahria Lebow, Jennifer Muilenburg, and Joanne Rich, detailing their experience teaching a research data management course to graduate students in early 2014.

We're hoping to set aside some time to read through these articles in the next few weeks, and will hope to include some reaction here. Stay tuned!

Friday, January 9, 2015

DRUW: a glance under the hood

As promised, here is the blog post about the technologies we are going to be playing with to build our data repository. When we decided we wanted to pursue developing an institutional data repository we evaluated different pieces of software, weighing variables like maturity of system, the presence and type of community behind the system, flexibility for handling different object types and general future-proofedness. There isn’t much of a dramatic pause for me to insert here, as we’ve already written in previous posts that the outcome of this analysis was going with Hydra.

But what is Hydra? Hydra isn’t a single thing - an out of the box solution (though the community around it has set this as a future goal) - rather it’s a framework of different pieces of software, that come together to create an institutional repository. A Hydra installation can be used as a single interface to many different repositories, if we wanted to expand beyond the current scope of research data. Hydra is based on Fedora, the repository platform from DuraSpace, a nonprofit that supports a number of open source technologies related to digital assets (like DSpace and VIVO). Fedora is short-hand for Flexible Extensible Digital Object Repository Architecture and as its long-form name implies, Fedora is a digital asset management system capable of handling content regardless of type (GIS, A/V, images, text, data, etc). Of note, DuraSpace recently has released Fedora 4, which has some significant changes from Fedora 3, including being happier about ingesting larger files and by default providing RDF representation of content and relationships. The Hydra community is energetically working away at getting all of the pieces of the Hydra environment to play nicely with Fedora 4, and has advised that new adopters of Hydra to plan on using Fedora 4 from the get go, rather than create a situation that requires migration at a later date. So, we’ve had a bit of good luck here on our timing for jumping in!

So, Fedora is in charge of managing the objects, the other core components of a Hydra build include Solr and Blacklight. Solr is an open source search platform from Apache that indexes the repository content. Blacklight is the discovery interface that plugs into Solr and provides features like (customizable) faceted browsing, exporting results and saving search history. Now, those are just the core technologies, there are many other packages of code (referred to as gems in world of Ruby - the programming language behind Hydra) necessary to get an instance of Hydra up and running. The community has developed several different flavors of Hydra that leverage this framework of technologies in deployable web applications (technically, Rails engines), the one we’ve elected to go with is Sufia.

We’ve been working on use cases for our repository and our next steps are to define project phases, with realistic timelines and set milestones for each of these phases.

Tuesday, January 6, 2015

Data Librarianship Workshop for UW Libraries staff: Archives & Repositories

There are so many archives and repositories out there it can be difficult to know where to start looking to help someone in your field (or especially a field you’re not familiar with). This workshop, to be held Wednesday, January 28 from 2-3:30pm in the Allen Auditorium, will look at some of the categories of archives and repositories, and we’ll have time to share some of the similarities and differences across disciplines. We’ll also talk about some of the usage and ethics considerations that come into play when researchers share their data.

The workshop is open to all Libraries staff. Prior to the workshop, please identify 1-2 repositories in your subject area. Take 5-10 minutes and explore:

How easy it is to search for data
How easy it is to deposit data
What the depositor policies are
What kind of metadata the repository collects
Other general impressions

A good place to start (other than google) is www.databib.org.

This workshop is the second of three workshops on data librarianship. The third will be held Wednesday, April 29^th from 2-3:30pm in Allen Auditorium, and will focus on data management plans.

Questions can be left in the comments below.

Search This Blog