Tuesday, December 11, 2012

Big data vs data mining vs statistics vs etc. FAQ

Love this link from Flowing Data to an article by William Briggs (he bills himself on his blog as Statistician to the Stars!). In the article, Briggs answers questions FAQ-style to talk about the differences between big data, data mining, statistics, probability, etc. He's got a good sense of humor, and is clear about what he sees as distinguishing characteristic of each field.

Given that the hype about "big data" lately seems about ready to jump the shark, I love his definition. While he acknowledges that vast amounts of data are interesting for the facts contained within, Big Data is not likely to save us from ourselves:
What is big data?
Whatever the labeler wants it to be; data that is not small; a faddish buzz word; a recognition that it’s difficult to store and access massive databases; a false (but with occasional, and temporary, bright truths) hope that if characteristics down to the microsecond are known and stored we can predict everything about that most unpredictable species, human beings. See this Guardian article. See also false hope (itself contained in the hubris entry in any encyclopedia).
Big data is a legitimate computer science topic, where timely access to tidbits buried under mountains of facts is a major concern. It is also of interest to programmers who must take and use these data in the models spoken of above, all in finite time. But more data rather than less does not imply a new or different philosophy of modeling or uncertainty.

Jer Thorp had similar things to say in a Harvard Business Review blog recently; he would like to see people have a better understanding of data ownership, along with more conversations about data and ethics. Oh, and he'd like to see data understood as an entirely new societal resource by bringing artists into the mix. Let the conversation begin.

No comments:

Post a Comment