Wednesday, August 2, 2017

Summer Quarter Research Data Management Workshop @ UW Libraries

Do you create or use data in your research? Looking for tips and tools to better help you manage your research data, and preserve it for long-term use?

From August 14-17, the UW Libraries is offering Data Management Planning, an asynchronous online workshop for UW community members engaged in research with data. Topics will include getting started with data management planning, funder requirements for data sharing, metadata, tips to help keep you organized, sharing, archiving and preservation, and an introduction to tools and on-campus support to aid researchers.

Note: we will be offering two concurrent versions of this class, one for health sciences researchers, the other for multiple disciplines outside of health sciences. When you register for the class, please indicate which version of the class you’d like to attend. Thank you!

Full course information and link to registration is below. Contact us with any questions.

Data Management Planning Workshop
A free, tutor-supported online workshop
August 14 -17, 2017

Duration: Monday, August 14 - Thursday, August 17 (4 days)
Time Commitment: Approximately 30 minutes to 1 hour per day, for 4 straight days
Target audience: UW community members engaged in research with data.
Prerequisites: Access to the internet for each of the 4 days identified. A valid UW NetID is also required.

Description:
  • This module-based workshop consists of activities and peer discussion forums that will provide tips on how to effectively plan for data management over the lifecycle of your research project.
  • By asking students to share experiences with one another, this workshop gives you the opportunity to reflect on your research workflow and to see how various techniques and tools can be employed to most effectively manage, share and preserve your data.

Participation Process:
  • This workshop will take place in Canvas over 4 days, with no fixed participation times (asynchronous).
  • Each day corresponds to one online module, which includes a topic overview, resources, activity, and peer discussion forum.
  • Discussion forums are the workshop's primary means of 'assessment,' so expect to post to forums daily.
  • You will be guided through the course by a team of friendly librarian tutors, who will answer questions and provide feedback.

How to Join:
  • If interested, please register via this Catalyst link no later than Friday, August 11, 2017.
  • Space in the workshop is limited, and participants will be accepted on a first-come-first-served basis. Students who register after capacity is reached may be placed on a wait list.

Comments from previous class participants:
  • This is a great workshop -- exposing me to a lot of considerations about data management that I did not know about. The tutor responses have been really helpful. I was unaware of the data librarians on campus and will definitely reach out to them for more resources. Thank you!
  • Very interactive with tutors appearing to be enthusiastic and eager to pitch in their ideas for any questions the workshop participants had. Thank you!
  • Very helpful at all levels of experience
  • Very helpful and important for anyone working with data.
  •  I was really impressed with this workshop. It had so many wonderful resources and I learned a lot. The tutors were fantastic. ... The materials were great and easy to understand as well. It was good to know I'm heading in the right direction with data management and know how to really improve my data management. I come from an interpretive/qualitative background and often this type of research activity is learned on the job or through learning what not to do the next time around, so having this type of workshop can really help people like me prepare a lot better for the next big project. Thank you for all your hard work!
If you have any questions, please feel free to contact the Data Services Team.

Wednesday, July 12, 2017

Upcoming Workshop 7/24: Exploring Statistical Datasets with Data-Planet

Join us for an exploration of the UW Libraries latest online research tool, Data-Planet Statistical Datasets.  Data-Planet is an interdisciplinary e-resource geared exclusively toward helping users find, display, manipulate, and cite statistical data. We are already seeing the benefit to users as they search for education, labor, transportation, environmental, health, immigration, and poverty data.  In this training, we’ll cover how to search for, analyze, and output data and you’ll get hands-on practice answering statistical reference questions and generating charts, maps, rankings, and other visualizations. Comparisons to other tools such as Social Explorer, SimplyMap and American FactFinder will also be included. The session will be held Monday, July 24 from 10-11am in OUGL 102. Please RSVP here.

Monday, May 15, 2017

Software Carpentry Classes: R and Python

The University of Washington eScience Institute will be holding Software Carpentry classes for Python and R, June 13-16, 2017. Classes run from 9:00am to 12:00pm for four days. The cost is $10 and space is limited.

"Software Carpentry aims to help researchers get their work done in less time and with less pain by teaching them basic research computing skills. This hands-on workshop will cover basic concepts and tools, including program design, version control, data management, and task automation. Participants will be encouraged to help one another and to apply what they have learned to their own research problems." Learn more and register on the class website: https://uwescience.github.io/2017-06-13-uw/.

Monday, April 24, 2017

UW Data Science Seminar: Nathan Baker

Wednesday, April 26, 2017 3:30 Johnson 102

Nathan Baker, Director of the Advanced Computing, Mathematics, and Data Division at Pacific Northwest National Laboratory (PNNL) and a Visiting Faculty member at Brown University, will be presenting "Uncertainty in Biomolecular Solvation" at this week's Data Science Seminar. The Data Science Seminar is free and open to the public.

Abstract

Solvation-related interactions strongly influence a wide range of biomolecular processes. However, both our models and our information for parameterizing those models are imperfect. This talk will describe strategies for quantifying the uncertainty in biomolecular solvation models and optimizing those models to provide the best possible accuracy and performance. The first part of the talk will describe generalized polynomial chaos methods for quantifying solvation energy uncertainty due to conformational noise and errors in atomic charge and radius parameters. The second part of the talk will outline Bayesian methods for addressing model uncertainty through statistical aggregation of predictions from multiple models.

Monday, April 17, 2017

Guest Seminar: Andrew Hufton

Tuesday, Apr. 18, 1:30 p.m., Smith Hall 105

The UW eScience Institute's Repoducibility and Open Science Group is hosting Andrew Hufton, managing editor of scientific data Nature Research. Hufton's talk is entitled "Beyond supplementary material: Sharing data effectively through repositories and data journals."

ABSTRACT
The Nature Research journals understand that effective data sharing supports reproducibility and can increase the impact of published works. Indeed, our policies have long recognized that data sharing is a fundamental part of research publication. The increasing complexity and size of research datasets, however, poses challenges for scientists who wish to share their data in a reusable and transparent manner. Based on my experience at Scientific Data, an open-access data-focused journal from Nature Research, I will provide tips on how researchers can share their data in an effective manner that promotes reuse, supports the credibility of their research, and ensures they get proper credit. This will include advice on writing better data-rich papers, the basics of presenting datasets in a useful manner, and tips on how to find the right repository for your data. I will also explain Scientific Data's editorial policies and share some of our experiences peer-reviewing and publishing data so far.

BIO
Andrew is responsible for the editorial policies of Scientific Data, in consultation with the Honorary Editor and Advisory Panel, and works with the Editorial Board to ensure a fair and thorough peer-review process for all submissions. Andrew received his PhD from Stanford University in 2006, and did postdoctoral work at the Max Planck Institute for Molecular Genetics in Berlin. His research included topics in developmental genetics, computational biology and genome evolution. Before joining Scientific Data, Andrew worked as an Editor at Molecular Systems Biology.

Wednesday, March 29, 2017

UW Data Science Seminar: Sir Philip Campbell

Wednesday, April 5, 3:30 p.m. in Physics/Astronomy Auditorium A118


Sir Philip Campbell, editor-in-chief of Nature, will be presenting “Pressures on principal investigators and their need of support: A consultation” at next week's Data Science Seminar. The Data Science Seminar is free and open to the public.


The role of PIs in sustaining the progress and robustness of research is critically important,and yet the pressures on them - some well advised, some not - seem to keep growing. To help Nature's future coverage of these issues, I will present an overview of some of the key pressures on PIs and invite insights and proposals into how funders, universities and journals might best mitigate them.

Wednesday, March 22, 2017

New Tools for Data Exploration

Data-Planet


The University of Washington Libraries now subscribe to Data-Planet, an "interactive database [that] allows you to create tables, maps, and figures from a variety of licensed and public data sources" (find it anytime in the A-Z list of databases here). Access the database on-campus or log-in with your NetID for off-campus access. For information about the datasets included in the repository or for an introductory video, visit the Data-Planet libguide.

PolicyMap


The UW Libraries are in the trial phase of PolicyMap, an online U.S. national data and mapping tool and analytics platform that does not require any software download. Users can interact with data available on PolicyMap or upload their own spreadsheets to map data using just their browser. The trial period ends April 24, 2017 (find it in the A-Z list of databases here). Currently access is only available on-campus.

Tuesday, February 21, 2017

UW Data Science Seminar: Kelsey Jordahl

Wednesday, February 22, 3:30 p.m. in Johnson Hall 102


Kelsey Jordahl, Mosaics Team Lead at Planet Labs, will be presenting “Mosaicking the Earth Every Day” at tomorrow's Data Science Seminar. The Data Science Seminar is free and open to the public.

Planet Labs currently operates about 60 Earth observation satellites imaging 50 million square kilometers of land area per day. We plan on tripling those figures in coming months, fulfilling our Mission 1 to image the surface of the Earth every day. Global mosaics are created from these images at regular intervals (quarterly, monthly, and weekly) by selecting the best quality scenes (e.g. cloud- and haze-free), color balancing, and seamlessly compositing millions of scenes to create continuous maps of the Earth for each time slice. As our data rate increases, we plan on scaling up the cadence of our mosaics, including a building a continuously updated "dynamic" mosaic of the most recent cloud-free images of the Earth. Daily data at 5 meter spatial resolution will open up new analysis techniques previously limited by the temporal or spatial resolution of existing instruments.

Friday, February 17, 2017

Love Your Data Week, Day 5: Rescuing Unloved Data

Love Your Data Week, Day 5: Rescuing Unloved Data
How do data become unloved? We data users don’t love data that are messy, poorly documented, incomplete, or unwieldy, to name just a few frustrations. However, one important way that data become unloved is that they are just plain old. Older data tend not to be machine-readable, which can pretty much be the kiss of death. Digitization, while it’s improving, is still somewhat labor-intensive and costly, so unless a data set is obviously worth the trouble, it may languish.
However, researchers are starting to explore whether there may be some hidden gems worth rescuing. One area in which this is happening is climate data, and a great example is the Glacier Photograph Collection from the National Snow and Ice Data Center (NSIDC).  Before this collection was digitized, users had to travel to the NSIDC in Colorado, ask staff to find physical images or microfilm for them in the collection, and then deal with those physical artefacts. Not surprisingly, the collection had few users. However, digitizing these photographs -- which can be considered data sources, as they contain information that can be analyzed -- has made them not only accessible, but an important resource for documenting changes in glacier size and coverage. Digitizing some of the old photographs also suggests locations for repeat photographs from the same vantage point, which can indicate changes across time periods.  
PHOTO: Left: William O. Field, 1941; Right: Bruce F. Molnia, 2004. Muir Glacier: From the Glacier Photograph Collection. Boulder, Colorado USA: National Snow and Ice Data Center. Digital media.
But using the above example is cheating a little bit; these photographs were unloved because they were undigitized, but it was clear that they were worth digitizing. In fact, it was so clear that NSIDC was able to get funding and enter into partnerships to get that work done. So what if a researcher has a great idea, but needs sheer person-power to bring it to fruition? These days, crowd-sourcing may do the trick! Check out the Swiss project Data Rescue @ Home, in which citizen-volunteers are entering German climate data collected during WWII, and also have completed entering data from a weather station in the Solomon Islands collected in the early to mid-1900s. By January 2014, they reported having digitized 1.3 million values! They note: “The old data are expected to be very useful for different international research and reanalysis projects…[for example,] historical weather data from the Azores Islands are particularly valuable since the islands are located at the southern node of the most important climatic variability mode in the North Atlantic-European region, the so-called North Atlantic Oscillation (NAO), and there are not much other historical data available from the larger region.”
PHOTO: Example of data collected in the Solomon Islands, entered electronically by citizen-volunteers of the Data Rescue @ Home project (Accessed 2-13-17).
Interested in getting involved in a citizen-science project yourself? Here’s a list of possibilities!  And if you really get hooked, you may want to dive into some collections of older non-digitized data and consider starting your own project, to rescue the unloved data and give them new life.  
OK, I’m off now to figure out how to get on the project where I can hang out on the beach in New Jersey and count horseshoe crabs!
Ann Glusker PhD MPH MLIS
Research and Data Coordinator

National Network of Libraries of Medicine, Pacific NW Region 
University of Washington Health Sciences Library

Thursday, February 16, 2017

Love Your Data Week, Day 4: Finding the Right Data

Welcome to Love Your Data Week, Day 4: Finding the Right Data. Today's theme is about asking the right questions, finding the right sources, and citing accordingly -- all of which will enable you to locate the right data, as well as enable your audience to also see why you chose the data you did.

Our friends at the National Network of Libraries of Medicine/Pacific Northwest Region, have taken today to highlight the new DataLumos initiative from ICPSR at the University of Michigan. This project aims to archive government datasets to ensure their preservation into the future. Check out their post on the Dragonfly blog describing this and other data archiving work happening around the country.



Wednesday, February 15, 2017

Love Your Data Week, Day 3: All’s FAIR in Love and Data Management

Welcome to day three of Love Your Data Week 2017! Today’s topic is Good Data Examples. What makes data “good” or “well managed?” The FAIR Data Principles: —Findability, Accessibility, Interoperability, and Reusability are a good place to start.  Published by Mark Wilkinson and his colleagues in 2016, these principles “put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals.” 1A brief description of the principles, excerpted from Wilkinson’s article, explains:

To be Findable:
  • F1. (meta)data are assigned a globally unique and persistent identifier
  • F2. data are described with rich metadata (defined by R1 below)
  • F3. metadata clearly and explicitly include the identifier of the data it describes
  • F4. (meta)data are registered or indexed in a searchable resource

To be Accessible:
  • A1. (meta)data are retrievable by their identifier using a standardized communications protocol 
  • A1.1 the protocol is open, free, and universally implementable
  • A1.2 the protocol allows for an authentication and authorization procedure, where necessary 
  • A2. metadata are accessible, even when the data are no longer available

To be Interoperable:
  • I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
  • I2. (meta)data use vocabularies that follow FAIR principles 
  • I3. (meta)data include qualified references to other (meta)data 

To be Reusable:

  • R1. meta(data) are richly described with a plurality of accurate and relevant attributes 
  • R1.1. (meta)data are released with a clear and accessible data usage license 
  • R1.2. (meta)data are associated with detailed provenance
  • R1.3. (meta)data meet domain-relevant community standards”2

These guiding principles benefit all stakeholders, including, as Wilkinson states, “researchers wanting to share, get credit, and reuse each other’s data and interpretations; professional data publishers offering their services; software and tool-builders providing data analysis and processing services such as reusable workflows; funding agencies (private and public) increasingly concerned with long-term data stewardship; and a data science community mining, integrating and analyzing new and existing data to advance discovery.”3

Wilkinson identifies several examples of FAIRness, including Dataverse, FAIRDOM, and Open PHACTS, and notes that the FAIR Guiding Principles have been adopted by a wide range of data management organizations across the globe.


1-3Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten JW, da Silva Santos LB, Bourne PE, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016 Mar 15;3:160018. doi: 10.1038/sdata.2016.18. PubMed PMID: 26978244; PubMed Central PMCID: PMC4792175.

Tuesday, February 14, 2017

Love Your Data Week: Day 2 - Documenting, Describing, Defining

Today’s topic is “Documenting, Describing, Defining”, and so we’re taking the opportunity to highlight a platform that can help streamline those processes.

We are happy to announce that UW has become an Affiliate of the Open Science Framework! UW staff, students, and researchers can now create OSF accounts using their NetID through the “Login through your institution” pointer on the Sign Up page. Not only is OSF a fantastic tool for data management and data sharing, it’s also a tremendous resource for keeping organized throughout the research process.

In a nutshell, OSF is like Github for workflows, except it can also serve as command central for all of the bits and pieces of your work that you’ve spread over Amazon S3, Github, Google Drive, Mendeley, and elsewhere.  It is an open source, cloud-based project management platform, designed to help teams collaborate in one centralized location. Teams can connect third-party services that they already use for both storage and reference management directly to the OSF workspace. With version control, persistent URLs, and DOI registration, OSF is a powerful tool for enabling reproducible research practices.

Anyone can create an OSF account, so collaborating with people outside your institution is easy. You have fine-grain control over who has access to your project – or even individual components of your project. So OSF can serve as both the sharing platform you use for externally-focused materials like data sets and preprints, but also the secure workspace you use to keep track of internal materials like analysis protocols and manuscript drafts. (A caveat: OSF is not HIPAA compliant, so you shouldn’t upload or link to your sensitive data.)
OSF is also a great tool for teaching reproducibility, allowing instructors to not only guide the shape of their students’ projects, but also to keep tabs on how successful students are in their workflow and data management efforts. If you’d like more information on how to use OSF in the classroom, this is an excellent presentation.
We are big fans of OSF here in Research Data Services, and we encourage you to check it out. This only scratches the surface of OSF’s capabilities, so if you’d like to learn more you can visit their extensive Help section, or contact us at libdata@uw.edu.