Today's Topic: Defining Data Quality
Loma, Kaeli, and Jorge from the Avian Conservation Laboratory in the UW's School of Environmental and Forest Sciences kindly agreed to answer a few questions about data quality in their field of research. Let us know your experiences with data quality by tweeting with the hashtag #LYD17 to @UWLibsData.
Provide a brief introduction to yourself and your lab/team:
Kaeli: "I study the behavior of crows around dead crows (ethology/thanatology). Most other people in my lab also work on birds, but our individual studies, areas of research and methodologies vary greatly."
Jorge: "I'm an international student from Chile working on the Avian Conservation Lab of John Marzluff at the School of Environmental and Forest Sciences."
What does data look like in your area of research?
Kaeli: "My data is generally measurements of time (x seconds spent doing a particular thing or in a particular place) binary measurements (did or didn't something occur) and count data such as the number of birds present or the number of times an action occurred."
Jorge: "I have many different kinds of data. I have spatial data that includes locations and attributes of certain aspects of what individual animals I studied did on such places. I also have data on abundance of different bird species on the greater Seattle area."
The message for today is: "Data quality is the degree to which data meets the purposes and requirements of its use. Depending on the uses, good quality data may refer to complete, accurate, credible, consistent or “good enough” data." How would you define quality data in your field? Are there any standards for assuring data quality? How do you and your fellow researchers distinguish between quality data and questionable data?
Loma: "I've never thought of this before. I would assume that directly observable quantitative data would be considered better quality than qualitative data."
Kaeli: "This is actually a really hard question. It would probably be really difficult for me to just look at someone's data and determine if it was of poor quality. Perhaps if I was looking at their raw data sheets and noticed a lot of missing information, but otherwise the devil is in the methodological approach not necessarily the data itself. So I would question the data if say all but two data points were collected at a very specific time of day. Any standards for collecting quality data really come from both your field of study and what statistical methods you plan to use."
Jorge: "For me, quality data is representative and unbiased. The typical standards have to do with the quantity of data to be able to perform relevant statistical tests, and the training of the people that collected the data."
"For me, it's not intuitive to detect bad data. Sometimes you see patterns emerge that don't match what is expected, and that may help, but otherwise it is not that easy."
How did you decide what to measure and how to gather the data in your research?
Loma: "I created a hypothesis for the question I was trying to answer, then thought about what I could measure that would allow me to refute or fail to refute that hypothesis. For example, I'm currently trying to figure out what certain vocalizations mean to a crow, so I measured a number of behaviors that are indicative of agitation, fear, aggression, and curiosity. That way, I can compare how often a crow gives those behaviors both before and after I play a certain call through a loudspeaker."
Kaeli: "I mostly make it up as I go along. Which is kind of a joke and kind of not. Often I design and experiment based on what I think the most meaningful or robust measure of my question will be, but then once I get into the field I find out that doing it that way is actually impractical or impossible so I need to change it. So often in wildlife studies the answer to that question is that we try our best to guess what will work but ultimately we're at the mercy of our study animal and the elements."
Jorge: "I asked what would it be relevant to measure for the biological questions I was going to ask and what was it feasible to collect, given my logistical and budgetary constraints."
Loma: "I have my data backed up on two hard drives and the department cloud storage, and I'm willing to share it to anyone who asks so long as they convince me that I'd be included as an author/contributor on whatever they're working on."
Kaeli: "Yes but I don't really use them, [to be honest]. I back up all my data on my computer, 3 hard drives and in dropbox. We're supposed to also back them up on our lab's server but I hardly ever do this!"
Jorge: "I keep my data on several places (like the data sheets where I collected it, and different hard drives) to ensure it's safety. I'm not planning on sharing my data at this time."
Thank you Loma, Kaeli, and Jorge for sharing your experience with data quality. We wish you the best of luck in your research!