Data @ Libs: 2016

Monday, December 12, 2016

ORCID

Do you have an ORCID iD? ORCID stands for Open Researcher and Contributor Identification and is used to uniquely identify you and your research.

Video made available by ORCID Support.

ORCID iDs make it possible to distinguish your work from researchers with similar names and initials. PLOS now requires ORCID iDs for researchers, and ORCID iDs allow you to track the impact of your work through a variety of online tools such as:

Registering for an ORCID iD is simple and you do not need to be a published author to sign up for an account. In fact, the sooner you get your ORCID iD, the easier it will be to track your work. To register for your ORCID iD:

Visit ORCID's Registration Page and provide the required information
Receive an e-mail from ORCID and verify your e-mail address
Add additional information to your ORCID profile (OPTIONAL)
Use your ORCID iD!

Monday, November 14, 2016

UW Data Science Seminar: Matthew Salganik

November 16, 2016 3:30 in Johnson 075

Matthew Salganik, Professor of Sociology at Princeton University, will be presenting “Social Research in the Age of Big Data” at this week’s Data Science Seminar. The Data Science Seminar is free and open to the public.

Abstract

The digital age has transformed how researchers are able to study social behavior. These new opportunities mean that the future of social research will involve blending together insights from two communities: social scientists and data scientists. In this talk, I'll begin by describing what I think each community has to contribute and what each community has to learn. Then, I'll focus on this social science/data science hybrid in one particular domain where I see a lot of opportunities: survey research. The talk will conclude with some predictions about the future of social research.

Tuesday, November 1, 2016

UW Data Science Seminar: Rob Axtell

November 2, 2016 3:30 in Johnson 075

The UW Data Science Seminar, organized by the eScience Institute, iSchool DataLab, and CSE Interactive Data Lab, is a “university-wide effort bringing together thought-leading speakers and researchers across campus to discuss topics related to data analysis, visualization and applications to domain sciences.” Rob Axtell, Department Chair of the Krasnow Institute for Advanced Study at George Mason University, presents this week’s seminar entitled, “Computationally-Enabled Public Policy Using Comprehensive Data.”

Abstract

The social sciences are being revolutionized today by two distinct forces, data and computing. The ability to perform controlled experiments, both in laboratory (small scale) and web-facilitated (large scale) settings, combine with natural experiments and digital exhaust type click-stream data to provide an unprecedented window into human behavior in a wide variety of social contexts. But just as significant is the increasing availability of administratively-complete micro-data that offer nearly comprehensive portraits of important social phenomena. Computational techniques and tools are essential for managing such data, and for creating models capable of explaining the data. Specifically, agent-based computing is an emerging technology for representing individuals engaged in social behavior and grounding them in micro-data. In this talk I will start with some background material on agent computing, discussing how the approach has been utilized for abstract models of social processes. I will then go on to describe two large-scale agent models that utilize individual-level data. A model of the U.S. housing market bubble that burst c 2006-7 will be described for the Washington, D.C. area. It involves some 2 million housing units overall with more than a million homeowners and some 500K mortgages. The model combines data on the housing stock (county sources), borrowers (Census), and mortgages (from mortgage service providers), and the model output is compared to MLS transactional data. We have investigated alternative policies for attenuating the size of the bubble. Then a model of the U.S. private sector, 120 million employees organized into 6 million firms, will be presented. This model uses data on the entire population of tax-paying firms in the U.S. and closely reproduces firm sizes, ages, growth rates, job tenure, wage distributions, and so on. In these models, aggregate phenomena emerge from the interactions of the agents without any pre-specification of what might happen. That is, social phenomena grow from the bottom up.

Thursday, October 13, 2016

Hacking the Academy: Open in Action

Come celebrate Open Access Week by learning how UW faculty and staff are working to keep their work open. The Hacking the Academy: Open in Action program will begin with four short talks, followed by time for discussion around the theme "Open in Action." Speakers include Rachel Arteaga (public scholarship), Steven Roberts (open science/open data), Dan Berger (public scholarship), and Justin Marlowe (open textbooks). Please join us October 26, 4-5pm in the Research Commons Green A!

Wednesday, October 12, 2016

Big Data to Knowledge Webinars and Discussion Groups

The UW Health Sciences Library and Research Data Services are collaborating with the National Network of Libraries of Medicine/Pacific Northwest Region, to provide a monthly discussion group focused on issues around Data Science, with a focus on biomedical science. The discussion group will provide a venue for those interested in the National Institutes of Health’s Big Data to Knowledge “Guide to the Fundamentals of Data Science,” a series of online lectures given by experts from across the country covering a range of diverse topics in data science.

The online lecture series is an introductory overview that assumes no prior knowledge or understanding of data science, and will run all year, once per week, from 9-10am Pacific Time. The list of speakers through the beginning of 2017 is available online. Upcoming topics include Ontologies, Metadata, Provenance, Databases, Social Networking Data, Exploratory Data Analysis, and lots more.

Academic librarians and others interested in biomedical big data from around the Puget Sound are invited to join a monthly Friday discussion group on October 14, November 18 and December 16. The group will meet at 8:45am to watch the week’s BD2K online lecture, and then from 10-11am share insights or questions about that week’s topic, and previous lectures in the series. All discussion groups will be held in The Health Sciences Pacific Room.

Questions? Email Emily Patridge at ep001 (at) uw.edu.

Tuesday, October 11, 2016

UW Hosting Trial of Data-Planet Statistical Datasets

UW librarians, faculty, and researchers are invited to learn more about the power of Data-Planet Statistical Datasets, the largest repository of standardized and structured data. We have trial access to this database along with the others listed at http://guides.lib.uw.edu/research/db-trial.

Data-Planet founder Richard Landry will highlight subjects and sources covered, along with functionality, features, and visualization tools. You’ll leave with tips for searching, manipulating, and exporting data from over 70+ government and private sources, covering 35 billion data points in 4.9 billion datasets.

Please register for the UW Libraries Data-Planet Statistical Datasets Webinar on Thursday, Oct 13, 2016 12:00 PM PDT at: https://attendee.gotowebinar.com/register/1191767628182916610

Participate remotely or join a group viewing of the webinar: Suzzallo Library, RAD. Thunderbird Conference Rm. (if you don’t work in the UW Libraries, contact cass@uw.edu for info about this location).

Please visit online for additional information:

http://data-planet.com

http://data-planet.libguides.com

Please contact Cass Hartnett at cass@uw.edu or Marcy Rothman at mrothman@data-planet.com with any questions or special requests!

After registering, you will receive a confirmation email containing information about joining the webinar. We look forward to hearing your feedback!

Thursday, September 15, 2016

Software Carpentry Workshop: Oct 10-13 @UWescience

Software Carpentry is a non-profit volunteer organization whose members teach researchers computing skills.

On October 10th-13th, we will hold a four-day (mornings-only) Software Carpentry workshop at the UW eScience Data Science Studio. The workshop is focused on software tools to make researchers more effective, allowing them to automate research tasks, automatically track their research over time, and use programming to accelerate their research, and make it more reproducible.

In the workshop, we will have two parallel tracks: one in which we will focus on the programming language R, and the other in which we will focus on Python.

For details, and to register for the upcoming workshop, please refer to the following web-page: https://uwescience.github.io/2016-10-10-uw/

Wednesday, August 3, 2016

Upcoming Data Management Planning Workshop

Do you create or use data in your research? Looking for tips and tools to better help you manage your research data, and preserve it for long-term use?

On August 22, the UW Libraries is offering Data Management Planning, an asynchronous online workshop for UW community members engaged in research with data. Topics will include getting started with data management planning, funder requirements for data sharing, metadata, tips to help keep you organized, sharing, archiving and preservation, and an introduction to tools and on-campus support to aid researchers.

Full course information and link to registration is below. Contact us with any questions.

Data Management Planning Workshop

A free, tutor-supported online workshop

August 22 - 25, 2016

Duration: Monday, August 22, 2016 - Thursday, August 25, 2016 (4 days)

Time Commitment: Approximately 30 minutes to 1 hour per day, for 4 straight days

Target audience: UW community members engaged in research with data.

Prerequisites: Access to the internet for each of the 4 days identified. A valid UW NetID is also required.

Description:

This module-based workshop consists of activities and peer discussion forums that will provide tips on how to effectively plan for data management over the lifecycle of your research project.
By asking students to share experiences with one another, this workshop gives you the opportunity to reflect on your research workflow and to see how various techniques and tools can be employed to most effectively manage, share and preserve your data.

Participation Process:

This workshop will take place in Canvas over 4 days, with no fixed participation times (asynchronous).
Each day corresponds to one online module, which includes a topic overview, resources, activity, and peer discussion forum.
Discussion forums are the workshop's primary means of 'assessment,' so expect to post to forums daily.
You will be guided through the course by a team of friendly librarian tutors, who will answer questions and provide feedback.

How to Join:

If interested, please register via this Catalyst link no later than Friday, August 19, 2016.
Space in the workshop is limited, and participants will be accepted on a first-come-first-served basis. Students who register after capacity is reached may be placed on a wait list.

If you have any questions, please feel free to contact the Data Services Team.

Tuesday, July 5, 2016

Society of American Archivists to discuss research data management

In the next month, the Society of American Archivists' Records Management Roundtable has planned a series of blog posts to foster discussion on research data management. The roundtable's blog The Schedule will "feature posts describing collaborative efforts to address research data management, resources and outreach initiatives, incorporating research records into a retention schedule, and the question of faculty research as a public record."

Comments are encouraged, so make sure to follow the blog, watch the discussion, and participate!

Tuesday, April 5, 2016

Data Science Studio Office Hours for Spring Quarter

As a reminder, the WRF Data Science Studio offers several types of drop-in office hours to meet the needs of those working in data-intensive science. The program brings together expertise from the eScience Data Scientists, UW libraries, UW-IT, and the Center for Statistics and the Social Sciences (CSSS) to help triage challenges in data-intensive science – including cloud computing – and steer people towards appropriate solutions. Assistance may be in the form of immediate help, a longer meeting with our team to understand the problem more deeply, or a referral you to faculty on campus with relevant expertise.

For more information, see http://escience.washington.edu/dss-hours.

Tuesday, March 29, 2016

Upcoming classes: Community Data Science Workshop, R + Stata

Several upcoming workshops and classes will be held Spring Quarter at the University of Washington, focusing on students needing R or Stata introductions, as well as another round of the popular Community Data Science Workshops. Details are below.

Classes

The Center for Social Science Computation and Research has posted their Spring Quarter classes, which includes Introduction to Stata, Introduction to R with R Studio, and Introduction to R with Commander. Students will learn basics software organization, where to find help, and how to get started with basic analyses. No previous experience in statistical programming is necessary, but basic understanding of statistics will be helpful.

Workshops

The Spring 2016 round of the Community Data Science Workshops are for anyone interested in learning how to use programming and data science tools to ask and answer questions about online communities like Wikipedia, free and open source software, Twitter, civic media, etc. The Spring 2016 series consists of one Friday evening and three Saturday sessions in April and May. The workshops are for people with no previous programming experience and, thanks to sponsorship from eScience and the Department of Communication, are free of charge and open to anyone.

Our goal is that, after the three workshops, participants will be able to use data to produce numbers, hypothesis tests, tables, and graphical visualizations to answer questions like:

- Are new contributors to an article in Wikipedia sticking around longer or contributing more than people who joined last year?

- Who are the most active or influential users of a particular Twitter hashtag?

- Are people who participated in a Wikipedia outreach event staying involved? How do they compare to people that joined the project outside of the event?

Details and dates are online here:

http://wiki.communitydata.cc/CDSW_Spring_2016

If you are interested in participating, please fill out our registration at the link above before Saturday April 2. Register soon!

If you already know how to program in Python, it would be really awesome if you would volunteer as a mentor! Being a mentor involves working with participants and talking them through the challenges they encounter in programming. No special preparation is required. If you’re interested, there’s a link on the page above, or you can send me an email. If you mentored before, it’s still easier if you fill our form again. Thanks!

Regards,

Mako (On behalf of Jonathan, Tommy, Dharma, Ben, Mika, and all the CDSW

mentors.)

Wednesday, March 23, 2016

Next Week! Digital Scholarship Focus Groups

In an effort to develop our digital scholarship program in the Libraries, we will be holding a series of focus groups with faculty and graduate students working in the sciences. Goals of the focus groups are to determine what types of digital scholarship research and teaching is currently being done in departments across campus and to determine what types of barriers (if any) exist in completing digital scholarship work. If you are working on digital projects or data visualization, we would love to hear from you! Faculty focus groups are March 29: 12:30-1:15pm, March 30: 10:30-11:15am. Graduate student focus groups are March 29: 10:30-11:15am, March 29: 2:30-3:15pm. You may sign up for focus groups here. We'll confirm your participation, send you the location and a list of a few questions we'll cover to help start the conversation. Light refreshments will be provided for participants.

Thank you for your participation! Questions can be directed to Verletta Kern, our Digital Scholarship Librarian.

Tuesday, March 15, 2016

STEM Journal Publishing: What’s an Editor to Do?

Join the UW Libraries for a panel discussion from four UW faculty members who are also journal editors. Geared toward graduate students, post-docs and librarians, the panelists will address a variety of issues of interest to current and future authors, as well as librarians. Possible questions for discussion include:

 What do you do as an editor?
 How did you become one?
 Where do you fit in the hierarchy of your journal?
 What does it take to get published in your field today?
 What is the impact of the increase in manuscripts being submitted today?
 How is peer review handled with your journal?
 Have you run into ethical issues, and, if so, how did you deal with them?
 What are some of the most common mistakes made by authors?
 What advice would you give an author preparing to submit her/his first paper?
 How is digital accessibility attained?
 How to manage traditional papers augmented with other content such as video or audio content?

Our panelists include:
Valerie Daggett: Professor, Bioengineering
Jody Deming: Professor, Oceanography and Professor, Astrobiology
Richard Ladner: Professor, Computer Science and Engineering
Randy Leveque: Professor, Applied Mathematics

Session moderator:
Kelly Edwards: Associate Dean for Student and Postdoctoral Affairs, Graduate
School, and Associate Professor, Department of Bioethics and Humanities,
School of Medicine

Tuesday, April 12, 4:00-5:00PM; Reception, 5:00-5:30PM
Research Commons, Presentation Place, Allen Library South

Friday, February 12, 2016

Love Your Data Week, Day 5: Transform, Extend, Reuse

Today we're wrapping up Love Your Data Week by addressing open data and data sharing. This blog post from the University of Michigan Libraries includes a list of ways to share your data. Also worthy are these stories about how data are shared and reused by others:

You can also check out Nine Simple Ways to Make it Easier to (re)Use Your Data. And as always, if you're looking for ways to make your data more sharable, contact the UW Libraries Data Services Team!

Thursday, February 11, 2016

Love Your Data Week, Day 4: Data Citation

As stated on the Love Your Data site: "Data are becoming valued scholarly products instead of a byproduct of the research process. Federal funding agencies and publishers are encouraging, and sometimes requiring, researchers to share data that have been created with public funds. The benefit to researchers is that sharing your data can increase the impact of your work, lead to new collaborations or projects, enables verification of your published results, provides credit to you as the creator, and provides great resources for education and training. Data sharing also benefits the greater scientific community, funders,the public by encouraging scientific inquiry and debate, increases transparency, reduces the cost of duplicating data, and enables informed public policy."

Looking for some pointers on how to share your data? If you're at the University of Washington, you may already be sharing papers in the ResearchWorks Archive. The Libraries is in the process of updating the archive and working out how best to support data archiving on campus, so if you have data you want to preserve in the long-term, contact us to see if we can use your data as a test case as we build a new data repository.

If you're interested in learning more about how data citation impacts research reputation, Robin Chin-Roemer has a new book called Meaningful Metrics that serves as a guide to impact, bibliometrics, altmetrics as well as a few other topics.

For today's activity, consider these "good practice" tips:

share your data upon publication
share your data in an open, accessible and machine readable format
deposit your data in your institution's repository to enable long-term preservation
license your data so people know what they can do with it
tell people how to cite your data
when choosing a repository, ask about the support for tracking its use. Is a handle or DOI provided? Can the depositor see how many views and downloads the data has? Is the cite indexed by google, google scholar, the data citation index?

Wednesday, February 10, 2016

Love Your Data Week, Day 3: Help Your Future Self

By Help Your Future Self, we mean Write It Down: document, document, document! Your documentation provides crucial context for your data. So whatever your preferred method of record keeping is, today is the day to make it a little bit better! Some general strategies that work for any format:

Be clear, concise, and consistent.
Write legibly.
Number pages.
Date everything, use a standard format (ex: YYYYMMDD).
Try to organize information in a logical and consistent way.
Define your assumptions, parameters, codes, abbreviations, etc.
If documentation is scattered across more than one place or file (e.g., protocols & lab notebook), remind yourself of the file names and where those files are located.
Review your notes regularly and keep them current.
Keep all of your notes for at least 7 years after the project is completed.

Things to avoid:

Writing illegibly.
Using abbreviations or codes that aren’t defined.
Using abbreviations or codes inconsistently.
Forgetting to jot down what was unusual or what went wrong. This is usually the most important type of information when it comes to analysis and write up!

Today's Activity: If your documentation could be better, try out some of these strategies and tools:

Readme files are a simple and low-tech way to start documenting your data better. Check out the sample readme.txt (filename = readme_template.txt) from IU.
Cornell University RDMSG also has a guide with tips for using read me files
Check out Kristin Briney’s post on taking better notes
Cornell University RDMSG has some tips for writing metadata
Data dictionaries are an easy way to document spreadsheets. Check out some examples on the Pinterest resource board.

Take a few minutes to think about how you document your data. What’s missing? Where are the gaps? Can you set up some processes to make this part of the work easier?

Search This Blog