Monday, February 23, 2015

Research Data Management & Physics/Astronomy Librarian Office Hours

The eScience Institute and the UW Libraries are pleased to announce Research Data Management & Physics/Astronomy Librarian Office Hours in the WRF Data Science Studio.

Physics/Astronomy Librarian Hours: 1-3p Mondays and 10a-12p Thursdays
Research Data Management Hours: 11a-1p Tuesdays and 1-3p Thursdays

Location: WRF Data Science Studio, 6th floor Physics/Astronomy Tower (map)

During each two-hour slot, librarians will be on hand to provide support and guidance in their relative areas of expertise.  This includes support for finding and accessing data, data management planning, data organization, reuse of data, data sharing and storage, data citation, instruction, literature review, publications, citation management tools, physics / astronomy / mathematics research, and more.

Some representative questions we have helped with in the past:
  • The funding agency for my grant requires me to share my data.  What are my options?
  • Can you help me prepare a data management plan for a grant proposal?
  • Are there standards in my field I should be using to describe my data?
  • I’d like to get a DOI for my dataset to include in a journal publication.  Can you help?
  • What can I do to keep track of my HEP citations? I need to keep projects separated.
  • I’m looking for a cosmology paper presented at a conference last year. Does the library have it?
  • How can I access Journal of Physics G: Nuclear and Particle Physics from home?

Tuesday, February 3, 2015

Data Librarianship Educational Resources

Last year I had the opportunity to take several online training courses related to data librarianship and data science, several of which are being repeated this year or are ongoing. For those looking for beginner-level information, these resources can be very helpful in understanding what data management is, how the library can and should be involved, and what it means to be a data librarian (a difficult-to-define term at best). I've also included a few non-course resources that may be of interest. If you have additional resources you'd like to see on this list, let me know in the comments.

So, to kick off 2015 with some educational resources, here's what's covered below:

  • Research Data Management, Library Juice Academy
  • What You Need to Know About Writing Data Management Plans, ACRL
  • Essentials 4 Data Support, Research Data Netherlands
  • Data Scientists Toolbox, Coursera
  • Data Information Literacy, book by Carlson and Johnston
  • The Mendeley group Data Management for Librarians

Class: Research Data Management
Source: Library Juice Academy
Instructor(s): Jillian Wallis, UCLA
Format: scheduled online class
When: March 2-27, 2015
Cost: $175

Taken from the course description, the purpose of this class is to "explore the processes of data production and data management, and the role of LIS professional and institutions in supporting data producers." The class is geared toward academic librarians, but is open to anyone. It covers the following topics:
  • The role and lifecycle of research data
  • Stakeholders and stakes in data management
  • Data sharing and data reuse
  • Data selection and appraisal
  • Repositories and registries
  • Data management standards
  • Tools for writing funder-required data management plans
  • The role of institutions and institutional libraries
Participants read up on current policy and research, and prepare a DMP or data policy or something similar as a final project. There are a lot of readings, and although the Library Juice website says there is approximately 15 hours of work for a four-week course, the instructor's introductory email said to expect each week's work to take at least 8 hours, taking the expected workload from 15 to 32 hours. I found that I was indeed spending 8-10 hours a week to complete the readings and assignments, and stay on top of the course forums. It's possible they've lightened the reading load for this year, but be prepared.

The 2014 technology was a bit buggy: sometimes readings were popups, sometimes a download, sometimes you were taken to a new page. Technical support is iffy -- when I asked for assistance locating two PDFs that were referenced but not linked, I was told I should be able to find them online. I think a class geared toward working professionals should have all the readings immediately available.

There is a strong background given in technical information about workflows and the data lifecycle and all its variations, much of which is looked at from the academic side of things (the instructor has a PhD in information science and teaches in the Information Department at UCLA), rather than that of a practicing researcher. Some of this may be old to a practicing data librarian, or it may be that the theoretical underpinnings of data management are of lesser importance to a librarian who is trying to help develop DMP consolations for researchers, but the background is helpful to understand current policy and practice for various funding agencies and archives. And working on the final project with peers and the instructor available for help is very useful to someone new to DMPs.

Class: What You Need to Know About Writing Data Management Plans
Source: ACRL
Instructor(s): Dee Ann Allison, Professor, University of Nebraska-Lincoln; Kiyomi Deards, Assistant Professor, University of Nebraska-Lincoln
Format: scheduled online class
When: April 27 - May 15, 2015
Cost: varies, $60 for a student, up to $195 for non-members 

This course is focused specifically on DMPs, with a little background on data management concepts in general. Learning outcomes from ACRL: 
  • List specific data depository resources in order to formulate recommendations for researchers to securely deposit and share their data.
  • Learn about how different funding agencies, and departments within those agencies, have different requirements for data management plans in order to determine how to effectively advise each researcher according to the requirements for their specific plan.
  • Analyze sample data management plans in order to develop an understanding of what constitutes a thorough data management plan.

Topics covered include data and metadata definitions, open data formats, dark archives, repositories, long-term preservation for data and sharing strategies. The course forums for this class were active, and strongly relevant to the weekly readings and assignments. The final project (for my group) was to develop a DMP for a project one of us had been working on or with, and it was very useful to be able to see a real-life example, rather than a case study. Sample DMPs were also evaluated from various disciplines, giving some good examples of variety across fields.

This class is much more aligned with the needs of practicing librarians who need education on what a DMP is and how to construct one. Most in my cohort were other academic librarians with varying levels of experience; this was helpful when we were put in groups for our final project, as each student brought different skills to the table, and we could all benefit from each other's expertise.

There were again some bugs: lots of typos throughout materials, PDFs that opened but disrupted the navigation of the class, a few problems the first week with assignments that couldn't be uploaded. As 2014 was (I believe) the first year this particular course was offered, I would hope that some of these issues have been worked out.

The final group chat for the course was a good place for last thoughts, as well as for shared resources either discovered during the class, or information people use in their own work. This final chat was shared out via email to students, which was great for those who couldn't attend the last virtual class meeting.

Class: Essentials 4 Data Support
Source: Research Data Netherlands
Format: self-paced online class
When: anytime/ongoing
Cost: three levels: free for online class, free with registration for class + forums, $ for in-person workshops (if you're close to Delft)

This class is perfect for those who need to know more about supporting those working with research data, but don't necessarily need or want a class with readings and homework (which, btw, can be necessary sometimes to make yourself do something!). It's particularly geared toward data librarians, IT staff, and researchers -- anyone with responsibility for data management. A list of competencies the course is meant to address is available here, but in general, the class was developed to teach "the basic knowledge and skills (essentials) to enable a data supporter to take the first steps toward supporting researchers in storing, managing, archiving and sharing their research data." 

For practicing librarians who need to get up to speed on data management, this is the place to go. It assumes a common background knowledge, yet presents information on data management in a simple and direct way, with additional resources and readings if needed. No special software is needed, it's a very simple and well-designed website, and it's easy to dip into the topics you need to know, leaving the rest for later.

There are six sections to the course, each of which provides an overview and objectives, the content of the section, and additional resources and readings. If you provide your email address and register, you're also able to participate in the forums (though most of the comments are in Dutch). Activities are included for some sections, all reading links are provided in the text, and no single page is too long, meaning students can come in and out of the course as time allows. It's a great source to provide information for librarians new to RDM and/or DMPs, and is useful as background before additional in-person discussion or instruction at your local institution. 

Class: Data Scientists Toolbox
Source: Coursera
Instructor(s): various, Johns Hopkins Bloomberg School of Public Health
Format: scheduled online class, part of the Data Science Specialization series
When: many start dates, usually monthly 
Cost: free unless you want a certificate ($29)

This class is the first of 9 classes (plus a capstone) that are part of the Coursera/Johns Hopkins Data Science Specialization. This first class is a good introduction to what "data, questions and tools that data analysts and data scientists work with." The class is divided into two parts, the first of which is a basic introduction to what a data scientist does. The second is an introduction to the some of the tools of the trade, including markdown, git, R, GitHub, etc. If you're new to data librarianship are need to be able to understand what your researchers are doing, this will give you a broad understanding of what data scientists do, and will help you understand a bit more about data sharing and open science.

A few additional educational resources: 

  • Data Information Literacy: Librarians, Data, and the Education of a New Generation of Researchers, by Carlson and Johnston. Published in late 2014, this book looks at what role librarians can play in helping a new generation of graduate students in STEM disciplines develop the competencies needed to manage research data. Material in the book comes from the work done by the authors and others for an IMLS-funded Digital Information Literacy project. 
  • The Mendeley group Data Management for Librarians, owned by Kevin Read, is a place to share literature and resources about data management, curation, citation, sharing, etc.
  • is a collaborative blog started in late 2014 aimed at sharing "resources, tips, conversations and strategies so that all of us can more effectively bring data resources to the people in our library communities." It's had a handful of posts from data librarians at different stages of their career, and is hoping to draw a larger audience via contributed posts. With the right participation, this could become a useful resource.

UW All-Campus Reproducibility Seminar: 2/10 @ 1:30pm

Ben Marwick, Assistant Professor of Archaeology, joins the All-Campus Reproducibility Seminar Series on February 10 at 1:30pm in the WRF Data Science Studio Meeting Room, 6th floor Physics/Astronomy Tower. He will give a talk titled: 

"Doing Reproducible Research with Docker"


A key obstacle to reproducible research that I frequently encounter when working with students and collaborators is keeping the toolkit simple, with managing dependencies being an especially time-consuming challenge. Virtual machines are one solution to these problems, but remain less than ideal because of relatively long start-up and shut-down times, their large size and performance demands, limited portability, and the need for the user to be familiar with a different desktop environment, amongst other concerns. In this talk I introduce Docker, a free and open source Linux container tool recently popular amongst commercial DevOps workers that provides lightweight virtual environments on Windows/OSX/Linux systems and has several advantages over regular virtual machines. I will describe the key elements of doing reproducible research with Docker and demonstrate dockerfiles, containers, images and registries (bring your laptop and follow along! If your using Windows/OSX then be sure to install in advance). I will show how these help with dependencies and keeping things simple, especially when working with R or Python.

Please join us!