Monday, November 30, 2015

A Day of Library Conversation on Open Public Data


[Note: this article appears concurrently in the ACRL-WA/OR newsletter for Fall 2015, n76]

On October 13, Seattle Public Library hosted a day of library conversation on open public data. Attendees came from around the Puget Sound area and included libraries such as Seattle Public Library, Pierce County Libraries, King County Libraries, Sno-Isle Libraries, the University of Washington and the UW Libraries, the City of Seattle, and OCLC. Representatives from Socrata, a company that provides solutions for governments to put their data online, were also in attendance. The idea behind the workshop was to facilitate a discussion regarding the role of the library in open and government data initiatives. The hope was to brainstorm ways in which public libraries can partner with local data initiatives, how to provide library staff with the skills and resources they need to participate in local data, and how to support the library's community of users.

The session included folks from Socrata presenting examples of some of the government data they provide solutions for, as well as a brainstorming session on what a library's potential role is regarding open data. In the Puget Sound there is already a bounty of online government data available (for example, http://data.kingcounty.gov/, https://data.seattle.gov/, https://data.wa.gov/, http://www.census.gov/data.html, and many, many more), and libary user communities are accessing that data from library computers. The question is: could or should libraries be doing more to support what users are doing?

Though no magic-bullet solutions were found, everyone involved agreed it was a good initial conversation -- it was the first time all of us had collected together to learn about how we are or would like to be supporting open data, what our staff needs to be able to continue or begin supporting open data initiatives, and what the future might look like as far as library support for these endeavors goes.

I'm sure this was the first of many meetings on the topic, and everyone looked forward to potential collaboration on future projects, and/or to more discussion on particular concepts.

Some additional open data-related resources that were highlighted at the event included OpenSeattle (a civic technology group, including weekly civic hacking nights), Municipal Research Services Center (has a data request service for municipalities), Puget Sound Regional Council (also has a data request service), NextDoor (private social networks connecting neighborhoods).

Following the afternoon workshop, an evening event at SPL was held. Titled "From Data to Action: Open Data and You," the event included a panel presentation and audience discussion. Panelists were:
  • Ryan Biava, ‎Senior Policy Advisor, Mayor's Office of Policy & Innovation
  • Abe Diaz, Mobile Program Manager at NBC-Universal, Inc. and Independent Developer
  • Amy Laurent, Assessment, Policy Development and Evaluation Unit, Public Health, Seattle & King County
  • Domonique Meeks, Masters of Science Information Management graduate student at the University of Washington and the co-organizer of Hack The CD
  • Jenny Muilenburg, Data Curriculum and Communications Librarian, University of Washington Libraries Research Commons
Facilitated by Jim Loter, Director of Information Technology at SPL, discussion topics included resources for open data (with a focus on data.seattle.gov), best ways to learn about data if you're starting out on a project, where to acquire data-related skills, and examples of the creative ways people have used open data. The best part of the night was the Q&A, with questions ranging from Seattle policy decisions, skill training, ideas for how to use open data, and more.

The presentation was recorded, and will be available online via SPL soon.

Tuesday, November 10, 2015

November 18 = UW GIS Day

Campus GIS Users:

Wednesday, November 18th is GIS Day and the University of Washington will highlight and celebrate the transformational role of Geographic Information Science (GIS) by hosting a day-long event in the UW Libraries' Research Commons.


UW GIS Day Agenda:

10:15 a.m. - 10:30 a.m.Welcome & Light Refreshments (coffee and doughnuts)
10:30 a.m. - 11:15 a.m.Lightning Talks 1: Non-Students - Submit a Proposal!
11:15 a.m. - 12:15 p.m.Speaker Session #1: Mapping for Social Good
·         Marty Schnure, mapsforgood.org
·         Andrew Powers & Christy Heaton, MaptimeSEA
11:15 a.m. - 12:15 p.m.: Lunch
1:15 p.m. - 2:00 p.m.: Lightning Talks 2: Students - Submit a Proposal!
2:00 p.m. - 3:00 p.m.: Speaker Session #2: The State of GIS Education Today
·         Nick Chrisman, RMIT University
·         Michael Goodchild, UC Santa Barbara/UW

Interested in presenting at GIS Day? We could still use a few more people to give lightning talks. There are going to be two sessions: one for student presenters and one for non-students. Submit your proposal for a lightning talk here: http://www.lib.washington.edu/commons/events/gisday/2015/proposal

UW Students: Win a $75 UW Bookstore gift card at this year’s UW GIS Day! All you need to do is give a 5 minute lightning talk at UW GIS Day. Audience members will vote on their favorite based on style and content. The winner will receive a $75 UW Bookstore gift card!

Contact Matt Parsons at parsonsm@uw.edu for more information.

We hope you will join us!

-Matt Parsons, on behalf of the GIS Day Planning Group

Friday, October 30, 2015

UW Students: Win a $75 UW Bookstore gift card at this year’s UW GIS Day!

UW GIS Day is coming up fast, with a full day of events planned for Wednesday, November 18. GIS Day is an international event that showcases how GIS can be used to make a difference in society. UW students have the chance to win a $75 UBookstore gift card by participating in the student presentation session.

All you need to do is give a 5 minute lightning talk at UW GIS Day. We're looking for 8 students who have used GIS in a project or as part of their undergraduate research to give lightning talks. Audience members will vote on their favorite based on style and content. The winner will receive a $75 UW Bookstore gift card!

Submit your proposal for a lightning talk here: www.bitly.com/uwgisdaytalks.

Wednesday, November 18th is GIS Day and the University of Washington will highlight and celebrate the transformational role of Geographic Information Science (GIS) by hosting a day-long event in the UW Libraries' Research Commons. Updated event information is posted here: www.bitly.com/2015uwgisday. It's a fun and fast-paced day of GIS goodness that covers a wide variety of disciplines.

Contact Matt Parsons at parsonsm@uw.edu for more information.

We hope you will join us!

Wednesday, September 23, 2015

Data Sharing Issue of Journal of Librarianship & Scholarly Communication

Hot off the presses: a new special issue of the Journal of Librarianship and Scholarly Communication (Vol 3, Iss 2, September 2015) is dedicated to research data sharing. Articles include topics such as accessible research data, institutional data polices, data sharing practices, data citation and data management plans, as well Amanda Whitmire's write up "Implementing a Graduate-Level Research Data Management Course: Approach, Outcomes and Lessons Learned." In addition, data from the article is included in the JLSC's dataverse. Looking forward to checking out many, if not most of these articles in the very near future.

Wednesday, August 12, 2015

Data Webinars, Live and Recorded

Needing a refresher on data your data reference skills? Looking for background information on a particular type of data? Check out the following webinars focused on data and data reference:

On August 18, data librarians Hailey Mooney and Jen Darragh will present a webinar aimed at helping you answer patrons' data and statistics questions. "Data for the Non-Data Librarian" will be held from 11PST/2EST, and will hit topics such as the difference between data and statistics, search strategies for both, ways of finding local area data, and how to leverage free and paid data resources. Register at http://ow.ly/QMAOR.

Earlier today, the Government Resources Section of the North Carolina Library Association hosted the latest in their "Help, I'm an Accidental Government Information Librarian" webinars. Kristin Partlo presented "Accessing Datasets for the Data Curious," which included information on helping patrons download data, exploring the relevance of a dataset, and alerting patrons to common pitfalls and patterns. This series of webinars is archived online: other webinars covered court records, environmental data, the National Archives, the Bureau of Labor Statistics, geocoding, and many (many!) more, most of which have the slides and recorded session available.


Tuesday, May 19, 2015

IASSIST2015 Research Data Management Sessions

IASSIST2015 logo
The International Association for Social Science Information Services and Technology (IASSIST2015) upcoming conference (Bridging the Data Divide: Data in the International Context) is highly focused on data, with tracks on research data management, data services professional development, and data infrastructure and applications. It draws an international crowd that skews social science, but other types of data librarians/curators/managers are also in attendance in large numbers. There are typically multiple sessions in each block that a data librarian might want to attend, and trying to pick through based simply on track might have you missing some excellent offerings in another area.



That said, my job is "data curriculum and communications librarian," so my interests lie primarily in data services, policy, marketing and communications, and teaching and education. I've picked through the schedule and am sharing here the sessions I plan on attending. And though there's a large number of posters and pecha kucha talks I'm interested in, I highlighted the ones that best matched my primary interests.

Check out the list here goo.gl/BQ5Zvw, and feel free to share your own list in the comments below.

Monday, May 18, 2015

ALA 2015 Research Data Management/Curation Programming

ALA hasn't historically been known for a heavy dose of data librarian-related sessions or presentations, but it's worth taking a look at this year's annual conference (#alaac15). There's a small but topical group of sessions that will be of interest to data librarians/curators, including a data management plan preconference, a two-part session on data visualization in the library, and a panel presentation from DCIG titled "Conversations with Digital Curation Practitioners," with talks from three speakers and a chance for Q&A.

If you're heading to San Francisco, check out the list of data-related sessions here https://goo.gl/PStgkZ. If I've left anything off, please let me know in the comments. And make sure to tweet as you conference! We'll be watching the hashtag for data-related comments.

Monday, April 27, 2015

Upcoming data speakers on UW Campus

Several upcoming speakers on the UW campus will be of interest to local data folks:
  • Tuesday, April 28 @ 1:30, Data Science Studio, 6th floor Physics/Astronomy Tower. Tony Hey will speak on Physics and Computing: Open Science Decoded. Abstract: "The talk will start with the OSTP memo on open access, and then go on to discuss executable papers and best practice for reproducibility of computational physics research. After looking at computing for Big Physics (e.g. the ATLAS collaboration at the CERN LHC), for Medium-scale Physics (with the UK's Collaborative Computational Projects), and for Long Tail Physics, the paper ends with some comments about open source, scientific software quality and career paths for scientific software developers."
  • Wednesday, April 29 @ 2pm, Allen Auditorium, Allen Library. Data Librarian Jenny Muilenburg will address Data Management Plans: Reading, Writing, and Sharing. Topics will include some of the different agency requirements around DMPs, some local resources to help create DMPs, and some examples from different disciplines.
  • Wednesday, April 29 @ 3:30pm, Data Science Studio. Cesar Hidalgo from the MIT Media Lab, Why Information Grows: The Evolution of Order, from Atoms to Economies. Abstract: "The universe is made of energy, matter and information; but information is what makes the universe interesting. Without information, the universe would lack the shapes, structures, and order that gives the universe both its beauty and complexity. But where does information comes from and what are the natural, social, and economic mechanisms that help information grow? In this talk I will describe the growth of physical order—or information—from atoms to economies by explaining the physical mechanisms that allow order to exist, and the social and economic mechanisms that allow order to prevail in our society and economy."
  • Tuesday, May 5 @ 4pm, Data Science Studio. There will be an Integrative Graduate Education and Research Traineeship (IGERT)Info Session and Reception, useful for those wanting to know more about the Big Data IGERT PhD fellowship or the PhD program. Brief presentations from several IGERT students on current research will be features, as well as a Q&A session.



Monday, April 13, 2015

Data Management Plan Learning Session: 4/29, 2-3:30pm

On Wednesday, April 29, data librarian Jenny Muilenburg will lead a learning session titled "Data Management Plans: Reading, Writing and Sharing," from 2-3:30pm in the Allen Auditorium (Allen Library, University of Washington). During this 90 minutes, attendees will spend time learning about the different disciplinary and/or agency requirements for data management plans (DMPs), and will look at some examples from different disciplines. Tools and resources available to UW patrons will also be introduced, including DMP consults by librarians and DMPTool.

If you're unfamiliar with data management plans (or research data management in general), these very short videos from the University of Minnesota are a great introduction, and will be good preparation for the session.

This workshop is the last in a series began last fall. The first two were "Data Librarianship: Skills and Definitions," and "Archives & Repositories." See the workshop links here: http://staffweb.lib.washington.edu/units/Research-data-services/news/monday-april-20-2-3-30pm-data-management-plans-reading-writing-and-sharing.

Thursday, April 2, 2015

Responses to NSF's (and Other Agencies') OSTP Response (Got That?)

Last month, the National Science Foundation released a report titled "Today's Data, Tomorrow's Discoveries: Increasing Access to the Results of Research Funded by the National Science Foundation." (We summarized yhehighlights in our blog post on March 18, 2015.) This plan is the first piece of the NSF goal to provide increased public access to NSF research outputs; more from NSF is expected this month.

There was a flurry of twitter-tivity following the report's release; you can follow the continued discussion via the hashtag #OSTPresp. You'll also be able to follow a discussion about other agencies' OSTP responses that were released in the few weeks prior to the NSF report. After several agencies updated or released new policies in close succession, Amanda Whitmire at Oregon State University updated her libguide describing Federal Public Access Plans, and also created a crowd-sourced document to keep track of agencies and their plans, available here: http://bit.ly/FedOASummary. Primarily maintained by academic library-based data specialists, the document is open to additions and edits, which will help everyone stay current as new plans come out (and old ones are edited). The document includes whether an agency's policy covers data as well as traditional research outputs, embargoes, data management plan (DMP) details, preferred repositories (if stated), and more.

The UW Libraries has created a simplified version of this document that is geared toward whether agencies have DMP requirements and what their preferred repository is, with an intended audience that is not in the librarian world. Here you'll find a list of agency requirements, where data and articles are made available, and whether or not there's a DMP requirement. It's a little easier for the layperson to digest than the full list, and can be helpful in presenting this information to faculty and researchers.

Continue to watch twitter and this blog for further information about OSTP responses and what they mean for researchers. And feel free to add anything we've missed in the comments.



Monday, March 23, 2015

UW Data Librarians to Present at ACRL

UW data librarians will be presenting in both a panel and poster session at ACRL 2015, both of which will be on the topic of research data management instruction.

At poster session 2 (Thursday, 3/26, 2-3pm in the Convention Center Exhibit Hall), Mahria Lebow and Jenny Muilenburg will be presenting results from their data management-focused session at 2014's Science Boot Camp West. "Using Active Learning Techniques to Engage Academic Librarians in Research Data Management" will illustrate the techniques they used to engage librarians in a non-introductory, 200-level research data management workshop meant to introduce attendees to RDM concepts in a hands-on way. Live polling and group work was used to generate questions, conversations and learning about various RDM topics.

The poll questions were a great way to both engage attendees and spark conversation, by letting audience members respond anonymously, while at the same time seeing how others in the audience were responding. Workshop attendees were quite positive in their feedback of the techniques used in the session, and in particular the polling section was effective. Poll questions are online at tinyurl.com/m9hvrue.

On Friday morning (3/27, 8:30-9:30am, Room A105 in the Convention Center), Jenny Muilenburg, Amanda Whitmire and Heather Coates will present on a panel titled "Promoting Sustainable Research Practices Through Effective Data Management Curricula." This session will detail how each librarian developed a strategy for teaching research data management in different contexts. Each will address how they created their content, assessed their effectiveness, and plans for future directions.

And in case you missed it in a previous post, a full list of data management planning programs at ACRL2015 is available at http://goo.gl/KUlI6y.


Wednesday, March 18, 2015

Today's Data, Tomorrow's Discoveries: NSF's OSTP response released today

The NSF OSTP response came out today. Here are a few choice tidbits from a quick reading:

"All data resulting from the research funded by the award, whether or not the data support a publication, should be deposited at the appropriate repository as explained in the DMP. Metadata associated with the data should conform to community standards and the requirements of the host repository. At a minimum, data elements should include acknowledgement of NSF support as well as the award number and appropriate attribution." pg 7

"NSF investigators typically have multiple funding sources. Since a given item may be based on funding from more than one agency, NSF expects to allow submissions of articles and papers to public access repositories operated by other Federal agencies that meet the standards of the OSTP February 22, 2013, memorandum and for which the investigator can provide a persistent identifier as an element in annual or final reports." pg 13

"In collaboration with other Federal agencies and interested parties, NSF will develop criteria for eligible repositories, based on the criteria set forth in the OSTP memorandum, and will provide appropriate guidance for awardees and investigators on the website.
NSF may initiate these discussions as early as FY 2016." pg 14.

"Rarely does NSF expect that retention of all data that are streamed from an instrument or created in the course of an experiment or survey will be required." pg. 15

"Over the next three years, NSF will consult with the community and with other Federal agencies and facilitate the establishment of standards for metadata and repository systems." pg 16

"NSF is aware that individual publishers and library systems are experimenting with new approaches to presenting information, linking publications to data, and providing pointers to repository systems. NSF proposes to foster these developments and their use by ensuring consistent and predictable access to the underlying information, thus providing a platform for creativity and innovation." pg 18.

There's much more in the full text, available here:  http://www.nsf.gov/pubs/2015/nsf15052/nsf15052.pdf. It deserves a read if you have time!  NSF's Executive Summary of the plan, which is only two pages, is here: http://www.nsf.gov/pubs/2015/nsf15051/nsf15051.pdf.

Tuesday, March 10, 2015

ACRL 2015 Research Data Management Programming

ACRL 2015 is coming up fast, and it's never too early to plan out your conference schedule. While research data management is not a heavy focus of ACRL (as compared to, say, Teaching & Learning), there are still several panels, poster sessions, and roundtable discussions on RDM and related issues, as well as a full-day preconference on setting up data management services. Unfortunately, of the four panel sessions on RDM, two are concurrent, but there is definitely enough to keep you busy.

Items here fall under several topical categories, including Scholarly Communication, Teaching & Learning, Assessment, Technology, and others. We tried to capture all data-management related items here, but if you notice something missing, please let us know in the comments.

The full list is available here: http://goo.gl/KUlI6y.

Monday, February 23, 2015

Research Data Management & Physics/Astronomy Librarian Office Hours

The eScience Institute and the UW Libraries are pleased to announce Research Data Management & Physics/Astronomy Librarian Office Hours in the WRF Data Science Studio.

Hours:
Physics/Astronomy Librarian Hours: 1-3p Mondays and 10a-12p Thursdays
Research Data Management Hours: 11a-1p Tuesdays and 1-3p Thursdays

Location: WRF Data Science Studio, 6th floor Physics/Astronomy Tower (map)

During each two-hour slot, librarians will be on hand to provide support and guidance in their relative areas of expertise.  This includes support for finding and accessing data, data management planning, data organization, reuse of data, data sharing and storage, data citation, instruction, literature review, publications, citation management tools, physics / astronomy / mathematics research, and more.

Some representative questions we have helped with in the past:
  • The funding agency for my grant requires me to share my data.  What are my options?
  • Can you help me prepare a data management plan for a grant proposal?
  • Are there standards in my field I should be using to describe my data?
  • I’d like to get a DOI for my dataset to include in a journal publication.  Can you help?
  • What can I do to keep track of my HEP citations? I need to keep projects separated.
  • I’m looking for a cosmology paper presented at a conference last year. Does the library have it?
  • How can I access Journal of Physics G: Nuclear and Particle Physics from home?


Tuesday, February 3, 2015

Data Librarianship Educational Resources

Last year I had the opportunity to take several online training courses related to data librarianship and data science, several of which are being repeated this year or are ongoing. For those looking for beginner-level information, these resources can be very helpful in understanding what data management is, how the library can and should be involved, and what it means to be a data librarian (a difficult-to-define term at best). I've also included a few non-course resources that may be of interest. If you have additional resources you'd like to see on this list, let me know in the comments.

So, to kick off 2015 with some educational resources, here's what's covered below:

  • Research Data Management, Library Juice Academy
  • What You Need to Know About Writing Data Management Plans, ACRL
  • Essentials 4 Data Support, Research Data Netherlands
  • Data Scientists Toolbox, Coursera
  • Data Information Literacy, book by Carlson and Johnston
  • The Mendeley group Data Management for Librarians
  • Databrarians.org


Class: Research Data Management
Source: Library Juice Academy
Instructor(s): Jillian Wallis, UCLA
Format: scheduled online class
When: March 2-27, 2015
Cost: $175
Website: http://libraryjuiceacademy.com/082-data-management.php

Taken from the course description, the purpose of this class is to "explore the processes of data production and data management, and the role of LIS professional and institutions in supporting data producers." The class is geared toward academic librarians, but is open to anyone. It covers the following topics:
  • The role and lifecycle of research data
  • Stakeholders and stakes in data management
  • Data sharing and data reuse
  • Data selection and appraisal
  • Repositories and registries
  • Data management standards
  • Tools for writing funder-required data management plans
  • The role of institutions and institutional libraries
Participants read up on current policy and research, and prepare a DMP or data policy or something similar as a final project. There are a lot of readings, and although the Library Juice website says there is approximately 15 hours of work for a four-week course, the instructor's introductory email said to expect each week's work to take at least 8 hours, taking the expected workload from 15 to 32 hours. I found that I was indeed spending 8-10 hours a week to complete the readings and assignments, and stay on top of the course forums. It's possible they've lightened the reading load for this year, but be prepared.

The 2014 technology was a bit buggy: sometimes readings were popups, sometimes a download, sometimes you were taken to a new page. Technical support is iffy -- when I asked for assistance locating two PDFs that were referenced but not linked, I was told I should be able to find them online. I think a class geared toward working professionals should have all the readings immediately available.


There is a strong background given in technical information about workflows and the data lifecycle and all its variations, much of which is looked at from the academic side of things (the instructor has a PhD in information science and teaches in the Information Department at UCLA), rather than that of a practicing researcher. Some of this may be old to a practicing data librarian, or it may be that the theoretical underpinnings of data management are of lesser importance to a librarian who is trying to help develop DMP consolations for researchers, but the background is helpful to understand current policy and practice for various funding agencies and archives. And working on the final project with peers and the instructor available for help is very useful to someone new to DMPs.




Class: What You Need to Know About Writing Data Management Plans
Source: ACRL
Instructor(s): Dee Ann Allison, Professor, University of Nebraska-Lincoln; Kiyomi Deards, Assistant Professor, University of Nebraska-Lincoln
Format: scheduled online class
When: April 27 - May 15, 2015
Cost: varies, $60 for a student, up to $195 for non-members 
Website: http://libraryjuiceacademy.com/082-data-management.php

This course is focused specifically on DMPs, with a little background on data management concepts in general. Learning outcomes from ACRL: 
  • List specific data depository resources in order to formulate recommendations for researchers to securely deposit and share their data.
  • Learn about how different funding agencies, and departments within those agencies, have different requirements for data management plans in order to determine how to effectively advise each researcher according to the requirements for their specific plan.
  • Analyze sample data management plans in order to develop an understanding of what constitutes a thorough data management plan.


Topics covered include data and metadata definitions, open data formats, dark archives, repositories, long-term preservation for data and sharing strategies. The course forums for this class were active, and strongly relevant to the weekly readings and assignments. The final project (for my group) was to develop a DMP for a project one of us had been working on or with, and it was very useful to be able to see a real-life example, rather than a case study. Sample DMPs were also evaluated from various disciplines, giving some good examples of variety across fields.

This class is much more aligned with the needs of practicing librarians who need education on what a DMP is and how to construct one. Most in my cohort were other academic librarians with varying levels of experience; this was helpful when we were put in groups for our final project, as each student brought different skills to the table, and we could all benefit from each other's expertise.

There were again some bugs: lots of typos throughout materials, PDFs that opened but disrupted the navigation of the class, a few problems the first week with assignments that couldn't be uploaded. As 2014 was (I believe) the first year this particular course was offered, I would hope that some of these issues have been worked out.

The final group chat for the course was a good place for last thoughts, as well as for shared resources either discovered during the class, or information people use in their own work. This final chat was shared out via email to students, which was great for those who couldn't attend the last virtual class meeting.

Class: Essentials 4 Data Support
Source: Research Data Netherlands
Format: self-paced online class
When: anytime/ongoing
Cost: three levels: free for online class, free with registration for class + forums, $ for in-person workshops (if you're close to Delft)
Website: http://datasupport.researchdata.nl/en  

This class is perfect for those who need to know more about supporting those working with research data, but don't necessarily need or want a class with readings and homework (which, btw, can be necessary sometimes to make yourself do something!). It's particularly geared toward data librarians, IT staff, and researchers -- anyone with responsibility for data management. A list of competencies the course is meant to address is available here, but in general, the class was developed to teach "the basic knowledge and skills (essentials) to enable a data supporter to take the first steps toward supporting researchers in storing, managing, archiving and sharing their research data." 

For practicing librarians who need to get up to speed on data management, this is the place to go. It assumes a common background knowledge, yet presents information on data management in a simple and direct way, with additional resources and readings if needed. No special software is needed, it's a very simple and well-designed website, and it's easy to dip into the topics you need to know, leaving the rest for later.

There are six sections to the course, each of which provides an overview and objectives, the content of the section, and additional resources and readings. If you provide your email address and register, you're also able to participate in the forums (though most of the comments are in Dutch). Activities are included for some sections, all reading links are provided in the text, and no single page is too long, meaning students can come in and out of the course as time allows. It's a great source to provide information for librarians new to RDM and/or DMPs, and is useful as background before additional in-person discussion or instruction at your local institution. 

Class: Data Scientists Toolbox
Source: Coursera
Instructor(s): various, Johns Hopkins Bloomberg School of Public Health
Format: scheduled online class, part of the Data Science Specialization series
When: many start dates, usually monthly 
Cost: free unless you want a certificate ($29)
Website: http://libraryjuiceacademy.com/082-data-management.php

This class is the first of 9 classes (plus a capstone) that are part of the Coursera/Johns Hopkins Data Science Specialization. This first class is a good introduction to what "data, questions and tools that data analysts and data scientists work with." The class is divided into two parts, the first of which is a basic introduction to what a data scientist does. The second is an introduction to the some of the tools of the trade, including markdown, git, R, GitHub, etc. If you're new to data librarianship are need to be able to understand what your researchers are doing, this will give you a broad understanding of what data scientists do, and will help you understand a bit more about data sharing and open science.

A few additional educational resources: 

  • Data Information Literacy: Librarians, Data, and the Education of a New Generation of Researchers, by Carlson and Johnston. Published in late 2014, this book looks at what role librarians can play in helping a new generation of graduate students in STEM disciplines develop the competencies needed to manage research data. Material in the book comes from the work done by the authors and others for an IMLS-funded Digital Information Literacy project. 
  • The Mendeley group Data Management for Librarians, owned by Kevin Read, is a place to share literature and resources about data management, curation, citation, sharing, etc.
  • Databrarians.org is a collaborative blog started in late 2014 aimed at sharing "resources, tips, conversations and strategies so that all of us can more effectively bring data resources to the people in our library communities." It's had a handful of posts from data librarians at different stages of their career, and is hoping to draw a larger audience via contributed posts. With the right participation, this could become a useful resource.




UW All-Campus Reproducibility Seminar: 2/10 @ 1:30pm

Ben Marwick, Assistant Professor of Archaeology, joins the All-Campus Reproducibility Seminar Series on February 10 at 1:30pm in the WRF Data Science Studio Meeting Room, 6th floor Physics/Astronomy Tower. He will give a talk titled: 

"Doing Reproducible Research with Docker"

Abstract:

A key obstacle to reproducible research that I frequently encounter when working with students and collaborators is keeping the toolkit simple, with managing dependencies being an especially time-consuming challenge. Virtual machines are one solution to these problems, but remain less than ideal because of relatively long start-up and shut-down times, their large size and performance demands, limited portability, and the need for the user to be familiar with a different desktop environment, amongst other concerns. In this talk I introduce Docker, a free and open source Linux container tool recently popular amongst commercial DevOps workers that provides lightweight virtual environments on Windows/OSX/Linux systems and has several advantages over regular virtual machines. I will describe the key elements of doing reproducible research with Docker and demonstrate dockerfiles, containers, images and registries (bring your laptop and follow along! If your using Windows/OSX then be sure to install http://boot2docker.io/ in advance). I will show how these help with dependencies and keeping things simple, especially when working with R or Python.

Please join us!

Monday, January 12, 2015

Special Journal Issue Focuses on Data Literacy and Librarians

Time for some reading: the latest issue of the Journal of eScience Librarianship focuses on the role of librarians in data literacy. Included are articles on data management education initiatives, designing RDM curriculum for librarians and graduate students, as well as some case studies from different institutions that used the New England Collaborative Data Management Curriculum in order to teach RDM to various constituencies.

Also featured is an "eScience in Action" piece titled Lessons Learned from a Research Data Management Pilot Course at an Academic Library, from the UW's own Mahria Lebow, Jennifer Muilenburg, and Joanne Rich, detailing their experience teaching a research data management course to graduate students in early 2014.

We're hoping to set aside some time to read through these articles in the next few weeks, and will hope to include some reaction here. Stay tuned!

Friday, January 9, 2015

DRUW: a glance under the hood

As promised, here is the blog post about the technologies we are going to be playing with to build our data repository.  When we decided we wanted to pursue developing an institutional data repository we evaluated different pieces of software, weighing variables like maturity of system, the presence and type of community behind the system, flexibility for handling different object types and general future-proofedness.  There isn’t much of a dramatic pause for me to insert here, as we’ve already written in previous posts that the outcome of this analysis was going with Hydra.

But what is Hydra? Hydra isn’t a single thing  - an out of the box solution (though the community around it has set this as a future goal) -  rather it’s a framework of different pieces of software, that come together to create an institutional repository.   A Hydra installation can be used as a single interface to many different repositories, if we wanted to expand beyond the current scope of research data.  Hydra is based on Fedora, the repository platform from DuraSpace, a nonprofit that supports a number of open source technologies related to digital assets (like DSpace and VIVO).  Fedora is short-hand for Flexible Extensible Digital Object Repository Architecture and as its long-form name implies, Fedora is a digital asset management system capable of handling content regardless of type (GIS, A/V, images, text, data, etc).  Of note, DuraSpace recently has released Fedora 4, which has some significant changes from Fedora 3, including being happier about ingesting larger files and by default providing RDF representation of content and relationships.  The Hydra community is energetically working away at getting all of the pieces of the Hydra environment to play nicely with Fedora 4, and has advised that new adopters of Hydra to plan on using Fedora 4 from the get go, rather than create a situation that requires migration at a later date.  So, we’ve had a bit of good luck here on our timing for jumping in!  

So, Fedora is in charge of managing the objects, the other core components of a Hydra build include Solr and Blacklight.  Solr is an open source search platform from Apache that indexes the repository content. Blacklight is the discovery interface that plugs into Solr and provides features like (customizable) faceted browsing, exporting results and saving search history.  Now, those are just the core technologies, there are many other packages of code (referred to as gems in world of Ruby - the programming language behind Hydra) necessary to get an instance of Hydra up and running.  The community has developed several different flavors of Hydra that leverage this framework of technologies in deployable web applications (technically, Rails engines), the one we’ve elected to go with is Sufia.  

We’ve been working on use cases for our repository and our next steps are to define project phases, with realistic timelines and set milestones for each of these phases.

Tuesday, January 6, 2015

Data Librarianship Workshop for UW Libraries staff: Archives & Repositories

There are so many archives and repositories out there it can be difficult to know where to start looking to help someone in your field (or especially a field you’re not familiar with). This workshop, to be held Wednesday, January 28 from 2-3:30pm in the Allen Auditorium, will look at some of the categories of archives and repositories, and we’ll have time to share some of the similarities and differences across disciplines. We’ll also talk about some of the usage and ethics considerations that come into play when researchers share their data.

The workshop is open to all Libraries staff. Prior to the workshop, please identify 1-2 repositories in your subject area. Take 5-10 minutes and explore:
·
  • How easy it is to search for data
  • How easy it is to deposit data
  • What the depositor policies are
  • What kind of metadata the repository collects
  • Other general impressions

A good place to start (other than google) is www.databib.org.

This workshop is the second of three workshops on data librarianship. The third will be held Wednesday, April 29th from 2-3:30pm in Allen Auditorium, and will focus on data management plans.


Questions can be left in the comments below.