Tuesday, May 19, 2015

IASSIST2015 Research Data Management Sessions

IASSIST2015 logo
The International Association for Social Science Information Services and Technology (IASSIST2015) upcoming conference (Bridging the Data Divide: Data in the International Context) is highly focused on data, with tracks on research data management, data services professional development, and data infrastructure and applications. It draws an international crowd that skews social science, but other types of data librarians/curators/managers are also in attendance in large numbers. There are typically multiple sessions in each block that a data librarian might want to attend, and trying to pick through based simply on track might have you missing some excellent offerings in another area.

That said, my job is "data curriculum and communications librarian," so my interests lie primarily in data services, policy, marketing and communications, and teaching and education. I've picked through the schedule and am sharing here the sessions I plan on attending. And though there's a large number of posters and pecha kucha talks I'm interested in, I highlighted the ones that best matched my primary interests.

Check out the list here goo.gl/BQ5Zvw, and feel free to share your own list in the comments below.

Monday, May 18, 2015

ALA 2015 Research Data Management/Curation Programming

ALA hasn't historically been known for a heavy dose of data librarian-related sessions or presentations, but it's worth taking a look at this year's annual conference (#alaac15). There's a small but topical group of sessions that will be of interest to data librarians/curators, including a data management plan preconference, a two-part session on data visualization in the library, and a panel presentation from DCIG titled "Conversations with Digital Curation Practitioners," with talks from three speakers and a chance for Q&A.

If you're heading to San Francisco, check out the list of data-related sessions here https://goo.gl/PStgkZ. If I've left anything off, please let me know in the comments. And make sure to tweet as you conference! We'll be watching the hashtag for data-related comments.

Monday, April 27, 2015

Upcoming data speakers on UW Campus

Several upcoming speakers on the UW campus will be of interest to local data folks:
  • Tuesday, April 28 @ 1:30, Data Science Studio, 6th floor Physics/Astronomy Tower. Tony Hey will speak on Physics and Computing: Open Science Decoded. Abstract: "The talk will start with the OSTP memo on open access, and then go on to discuss executable papers and best practice for reproducibility of computational physics research. After looking at computing for Big Physics (e.g. the ATLAS collaboration at the CERN LHC), for Medium-scale Physics (with the UK's Collaborative Computational Projects), and for Long Tail Physics, the paper ends with some comments about open source, scientific software quality and career paths for scientific software developers."
  • Wednesday, April 29 @ 2pm, Allen Auditorium, Allen Library. Data Librarian Jenny Muilenburg will address Data Management Plans: Reading, Writing, and Sharing. Topics will include some of the different agency requirements around DMPs, some local resources to help create DMPs, and some examples from different disciplines.
  • Wednesday, April 29 @ 3:30pm, Data Science Studio. Cesar Hidalgo from the MIT Media Lab, Why Information Grows: The Evolution of Order, from Atoms to Economies. Abstract: "The universe is made of energy, matter and information; but information is what makes the universe interesting. Without information, the universe would lack the shapes, structures, and order that gives the universe both its beauty and complexity. But where does information comes from and what are the natural, social, and economic mechanisms that help information grow? In this talk I will describe the growth of physical order—or information—from atoms to economies by explaining the physical mechanisms that allow order to exist, and the social and economic mechanisms that allow order to prevail in our society and economy."
  • Tuesday, May 5 @ 4pm, Data Science Studio. There will be an Integrative Graduate Education and Research Traineeship (IGERT)Info Session and Reception, useful for those wanting to know more about the Big Data IGERT PhD fellowship or the PhD program. Brief presentations from several IGERT students on current research will be features, as well as a Q&A session.

Monday, April 13, 2015

Data Management Plan Learning Session: 4/29, 2-3:30pm

On Wednesday, April 29, data librarian Jenny Muilenburg will lead a learning session titled "Data Management Plans: Reading, Writing and Sharing," from 2-3:30pm in the Allen Auditorium (Allen Library, University of Washington). During this 90 minutes, attendees will spend time learning about the different disciplinary and/or agency requirements for data management plans (DMPs), and will look at some examples from different disciplines. Tools and resources available to UW patrons will also be introduced, including DMP consults by librarians and DMPTool.

If you're unfamiliar with data management plans (or research data management in general), these very short videos from the University of Minnesota are a great introduction, and will be good preparation for the session.

This workshop is the last in a series began last fall. The first two were "Data Librarianship: Skills and Definitions," and "Archives & Repositories." See the workshop links here: http://staffweb.lib.washington.edu/units/Research-data-services/news/monday-april-20-2-3-30pm-data-management-plans-reading-writing-and-sharing.

Thursday, April 2, 2015

Responses to NSF's (and Other Agencies') OSTP Response (Got That?)

Last month, the National Science Foundation released a report titled "Today's Data, Tomorrow's Discoveries: Increasing Access to the Results of Research Funded by the National Science Foundation." (We summarized yhehighlights in our blog post on March 18, 2015.) This plan is the first piece of the NSF goal to provide increased public access to NSF research outputs; more from NSF is expected this month.

There was a flurry of twitter-tivity following the report's release; you can follow the continued discussion via the hashtag #OSTPresp. You'll also be able to follow a discussion about other agencies' OSTP responses that were released in the few weeks prior to the NSF report. After several agencies updated or released new policies in close succession, Amanda Whitmire at Oregon State University updated her libguide describing Federal Public Access Plans, and also created a crowd-sourced document to keep track of agencies and their plans, available here: http://bit.ly/FedOASummary. Primarily maintained by academic library-based data specialists, the document is open to additions and edits, which will help everyone stay current as new plans come out (and old ones are edited). The document includes whether an agency's policy covers data as well as traditional research outputs, embargoes, data management plan (DMP) details, preferred repositories (if stated), and more.

The UW Libraries has created a simplified version of this document that is geared toward whether agencies have DMP requirements and what their preferred repository is, with an intended audience that is not in the librarian world. Here you'll find a list of agency requirements, where data and articles are made available, and whether or not there's a DMP requirement. It's a little easier for the layperson to digest than the full list, and can be helpful in presenting this information to faculty and researchers.

Continue to watch twitter and this blog for further information about OSTP responses and what they mean for researchers. And feel free to add anything we've missed in the comments.

Monday, March 23, 2015

UW Data Librarians to Present at ACRL

UW data librarians will be presenting in both a panel and poster session at ACRL 2015, both of which will be on the topic of research data management instruction.

At poster session 2 (Thursday, 3/26, 2-3pm in the Convention Center Exhibit Hall), Mahria Lebow and Jenny Muilenburg will be presenting results from their data management-focused session at 2014's Science Boot Camp West. "Using Active Learning Techniques to Engage Academic Librarians in Research Data Management" will illustrate the techniques they used to engage librarians in a non-introductory, 200-level research data management workshop meant to introduce attendees to RDM concepts in a hands-on way. Live polling and group work was used to generate questions, conversations and learning about various RDM topics.

The poll questions were a great way to both engage attendees and spark conversation, by letting audience members respond anonymously, while at the same time seeing how others in the audience were responding. Workshop attendees were quite positive in their feedback of the techniques used in the session, and in particular the polling section was effective. Poll questions are online at tinyurl.com/m9hvrue.

On Friday morning (3/27, 8:30-9:30am, Room A105 in the Convention Center), Jenny Muilenburg, Amanda Whitmire and Heather Coates will present on a panel titled "Promoting Sustainable Research Practices Through Effective Data Management Curricula." This session will detail how each librarian developed a strategy for teaching research data management in different contexts. Each will address how they created their content, assessed their effectiveness, and plans for future directions.

And in case you missed it in a previous post, a full list of data management planning programs at ACRL2015 is available at http://goo.gl/KUlI6y.

Wednesday, March 18, 2015

Today's Data, Tomorrow's Discoveries: NSF's OSTP response released today

The NSF OSTP response came out today. Here are a few choice tidbits from a quick reading:

"All data resulting from the research funded by the award, whether or not the data support a publication, should be deposited at the appropriate repository as explained in the DMP. Metadata associated with the data should conform to community standards and the requirements of the host repository. At a minimum, data elements should include acknowledgement of NSF support as well as the award number and appropriate attribution." pg 7

"NSF investigators typically have multiple funding sources. Since a given item may be based on funding from more than one agency, NSF expects to allow submissions of articles and papers to public access repositories operated by other Federal agencies that meet the standards of the OSTP February 22, 2013, memorandum and for which the investigator can provide a persistent identifier as an element in annual or final reports." pg 13

"In collaboration with other Federal agencies and interested parties, NSF will develop criteria for eligible repositories, based on the criteria set forth in the OSTP memorandum, and will provide appropriate guidance for awardees and investigators on the website.
NSF may initiate these discussions as early as FY 2016." pg 14.

"Rarely does NSF expect that retention of all data that are streamed from an instrument or created in the course of an experiment or survey will be required." pg. 15

"Over the next three years, NSF will consult with the community and with other Federal agencies and facilitate the establishment of standards for metadata and repository systems." pg 16

"NSF is aware that individual publishers and library systems are experimenting with new approaches to presenting information, linking publications to data, and providing pointers to repository systems. NSF proposes to foster these developments and their use by ensuring consistent and predictable access to the underlying information, thus providing a platform for creativity and innovation." pg 18.

There's much more in the full text, available here:  http://www.nsf.gov/pubs/2015/nsf15052/nsf15052.pdf. It deserves a read if you have time!  NSF's Executive Summary of the plan, which is only two pages, is here: http://www.nsf.gov/pubs/2015/nsf15051/nsf15051.pdf.

Tuesday, March 10, 2015

ACRL 2015 Research Data Management Programming

ACRL 2015 is coming up fast, and it's never too early to plan out your conference schedule. While research data management is not a heavy focus of ACRL (as compared to, say, Teaching & Learning), there are still several panels, poster sessions, and roundtable discussions on RDM and related issues, as well as a full-day preconference on setting up data management services. Unfortunately, of the four panel sessions on RDM, two are concurrent, but there is definitely enough to keep you busy.

Items here fall under several topical categories, including Scholarly Communication, Teaching & Learning, Assessment, Technology, and others. We tried to capture all data-management related items here, but if you notice something missing, please let us know in the comments.

The full list is available here: http://goo.gl/KUlI6y.

Monday, February 23, 2015

Research Data Management & Physics/Astronomy Librarian Office Hours

The eScience Institute and the UW Libraries are pleased to announce Research Data Management & Physics/Astronomy Librarian Office Hours in the WRF Data Science Studio.

Physics/Astronomy Librarian Hours: 1-3p Mondays and 10a-12p Thursdays
Research Data Management Hours: 11a-1p Tuesdays and 1-3p Thursdays

Location: WRF Data Science Studio, 6th floor Physics/Astronomy Tower (map)

During each two-hour slot, librarians will be on hand to provide support and guidance in their relative areas of expertise.  This includes support for finding and accessing data, data management planning, data organization, reuse of data, data sharing and storage, data citation, instruction, literature review, publications, citation management tools, physics / astronomy / mathematics research, and more.

Some representative questions we have helped with in the past:
  • The funding agency for my grant requires me to share my data.  What are my options?
  • Can you help me prepare a data management plan for a grant proposal?
  • Are there standards in my field I should be using to describe my data?
  • I’d like to get a DOI for my dataset to include in a journal publication.  Can you help?
  • What can I do to keep track of my HEP citations? I need to keep projects separated.
  • I’m looking for a cosmology paper presented at a conference last year. Does the library have it?
  • How can I access Journal of Physics G: Nuclear and Particle Physics from home?

Tuesday, February 3, 2015

Data Librarianship Educational Resources

Last year I had the opportunity to take several online training courses related to data librarianship and data science, several of which are being repeated this year or are ongoing. For those looking for beginner-level information, these resources can be very helpful in understanding what data management is, how the library can and should be involved, and what it means to be a data librarian (a difficult-to-define term at best). I've also included a few non-course resources that may be of interest. If you have additional resources you'd like to see on this list, let me know in the comments.

So, to kick off 2015 with some educational resources, here's what's covered below:

  • Research Data Management, Library Juice Academy
  • What You Need to Know About Writing Data Management Plans, ACRL
  • Essentials 4 Data Support, Research Data Netherlands
  • Data Scientists Toolbox, Coursera
  • Data Information Literacy, book by Carlson and Johnston
  • The Mendeley group Data Management for Librarians
  • Databrarians.org

Class: Research Data Management
Source: Library Juice Academy
Instructor(s): Jillian Wallis, UCLA
Format: scheduled online class
When: March 2-27, 2015
Cost: $175
Website: http://libraryjuiceacademy.com/082-data-management.php

Taken from the course description, the purpose of this class is to "explore the processes of data production and data management, and the role of LIS professional and institutions in supporting data producers." The class is geared toward academic librarians, but is open to anyone. It covers the following topics:
  • The role and lifecycle of research data
  • Stakeholders and stakes in data management
  • Data sharing and data reuse
  • Data selection and appraisal
  • Repositories and registries
  • Data management standards
  • Tools for writing funder-required data management plans
  • The role of institutions and institutional libraries
Participants read up on current policy and research, and prepare a DMP or data policy or something similar as a final project. There are a lot of readings, and although the Library Juice website says there is approximately 15 hours of work for a four-week course, the instructor's introductory email said to expect each week's work to take at least 8 hours, taking the expected workload from 15 to 32 hours. I found that I was indeed spending 8-10 hours a week to complete the readings and assignments, and stay on top of the course forums. It's possible they've lightened the reading load for this year, but be prepared.

The 2014 technology was a bit buggy: sometimes readings were popups, sometimes a download, sometimes you were taken to a new page. Technical support is iffy -- when I asked for assistance locating two PDFs that were referenced but not linked, I was told I should be able to find them online. I think a class geared toward working professionals should have all the readings immediately available.

There is a strong background given in technical information about workflows and the data lifecycle and all its variations, much of which is looked at from the academic side of things (the instructor has a PhD in information science and teaches in the Information Department at UCLA), rather than that of a practicing researcher. Some of this may be old to a practicing data librarian, or it may be that the theoretical underpinnings of data management are of lesser importance to a librarian who is trying to help develop DMP consolations for researchers, but the background is helpful to understand current policy and practice for various funding agencies and archives. And working on the final project with peers and the instructor available for help is very useful to someone new to DMPs.

Class: What You Need to Know About Writing Data Management Plans
Source: ACRL
Instructor(s): Dee Ann Allison, Professor, University of Nebraska-Lincoln; Kiyomi Deards, Assistant Professor, University of Nebraska-Lincoln
Format: scheduled online class
When: April 27 - May 15, 2015
Cost: varies, $60 for a student, up to $195 for non-members 
Website: http://libraryjuiceacademy.com/082-data-management.php

This course is focused specifically on DMPs, with a little background on data management concepts in general. Learning outcomes from ACRL: 
  • List specific data depository resources in order to formulate recommendations for researchers to securely deposit and share their data.
  • Learn about how different funding agencies, and departments within those agencies, have different requirements for data management plans in order to determine how to effectively advise each researcher according to the requirements for their specific plan.
  • Analyze sample data management plans in order to develop an understanding of what constitutes a thorough data management plan.

Topics covered include data and metadata definitions, open data formats, dark archives, repositories, long-term preservation for data and sharing strategies. The course forums for this class were active, and strongly relevant to the weekly readings and assignments. The final project (for my group) was to develop a DMP for a project one of us had been working on or with, and it was very useful to be able to see a real-life example, rather than a case study. Sample DMPs were also evaluated from various disciplines, giving some good examples of variety across fields.

This class is much more aligned with the needs of practicing librarians who need education on what a DMP is and how to construct one. Most in my cohort were other academic librarians with varying levels of experience; this was helpful when we were put in groups for our final project, as each student brought different skills to the table, and we could all benefit from each other's expertise.

There were again some bugs: lots of typos throughout materials, PDFs that opened but disrupted the navigation of the class, a few problems the first week with assignments that couldn't be uploaded. As 2014 was (I believe) the first year this particular course was offered, I would hope that some of these issues have been worked out.

The final group chat for the course was a good place for last thoughts, as well as for shared resources either discovered during the class, or information people use in their own work. This final chat was shared out via email to students, which was great for those who couldn't attend the last virtual class meeting.

Class: Essentials 4 Data Support
Source: Research Data Netherlands
Format: self-paced online class
When: anytime/ongoing
Cost: three levels: free for online class, free with registration for class + forums, $ for in-person workshops (if you're close to Delft)
Website: http://datasupport.researchdata.nl/en  

This class is perfect for those who need to know more about supporting those working with research data, but don't necessarily need or want a class with readings and homework (which, btw, can be necessary sometimes to make yourself do something!). It's particularly geared toward data librarians, IT staff, and researchers -- anyone with responsibility for data management. A list of competencies the course is meant to address is available here, but in general, the class was developed to teach "the basic knowledge and skills (essentials) to enable a data supporter to take the first steps toward supporting researchers in storing, managing, archiving and sharing their research data." 

For practicing librarians who need to get up to speed on data management, this is the place to go. It assumes a common background knowledge, yet presents information on data management in a simple and direct way, with additional resources and readings if needed. No special software is needed, it's a very simple and well-designed website, and it's easy to dip into the topics you need to know, leaving the rest for later.

There are six sections to the course, each of which provides an overview and objectives, the content of the section, and additional resources and readings. If you provide your email address and register, you're also able to participate in the forums (though most of the comments are in Dutch). Activities are included for some sections, all reading links are provided in the text, and no single page is too long, meaning students can come in and out of the course as time allows. It's a great source to provide information for librarians new to RDM and/or DMPs, and is useful as background before additional in-person discussion or instruction at your local institution. 

Class: Data Scientists Toolbox
Source: Coursera
Instructor(s): various, Johns Hopkins Bloomberg School of Public Health
Format: scheduled online class, part of the Data Science Specialization series
When: many start dates, usually monthly 
Cost: free unless you want a certificate ($29)
Website: http://libraryjuiceacademy.com/082-data-management.php

This class is the first of 9 classes (plus a capstone) that are part of the Coursera/Johns Hopkins Data Science Specialization. This first class is a good introduction to what "data, questions and tools that data analysts and data scientists work with." The class is divided into two parts, the first of which is a basic introduction to what a data scientist does. The second is an introduction to the some of the tools of the trade, including markdown, git, R, GitHub, etc. If you're new to data librarianship are need to be able to understand what your researchers are doing, this will give you a broad understanding of what data scientists do, and will help you understand a bit more about data sharing and open science.

A few additional educational resources: 

  • Data Information Literacy: Librarians, Data, and the Education of a New Generation of Researchers, by Carlson and Johnston. Published in late 2014, this book looks at what role librarians can play in helping a new generation of graduate students in STEM disciplines develop the competencies needed to manage research data. Material in the book comes from the work done by the authors and others for an IMLS-funded Digital Information Literacy project. 
  • The Mendeley group Data Management for Librarians, owned by Kevin Read, is a place to share literature and resources about data management, curation, citation, sharing, etc.
  • Databrarians.org is a collaborative blog started in late 2014 aimed at sharing "resources, tips, conversations and strategies so that all of us can more effectively bring data resources to the people in our library communities." It's had a handful of posts from data librarians at different stages of their career, and is hoping to draw a larger audience via contributed posts. With the right participation, this could become a useful resource.