The limits of big data analysis

Big Data
Big Data (Photo credit: Kevin Krejci)

 
 
 
 
 
 
 
 
 

Big data is seen by many as a panacea, while many are skeptical of its ability to solve our most pressing problems. What is the real problem with big data? Blind empiricism. All science that is worth its salt is semi-inductive, or better said, a mix of deduction and induction. Without precursor deductive models, all interpretation is a fishing expedition for tidbits of “interesting” results, which are usually filtered by common sense, which is typically rife with prejudice and fallacious thinking. This New York Times editorial puts the dot on the i.

Is big data really all it’s cracked up to be? There is no doubt that big data is a valuable tool that has already had a critical impact in certain areas. For instance, almost every successful artificial intelligence computer program in the last 20 years, from Google’s search engine to the I.B.M. “Jeopardy!” champion Watson, has involved the substantial crunching of large bodies of data. But precisely because of its newfound popularity and growing use, we need to be levelheaded about what big data can — and can’t — do.

The first thing to note is that although big data is very good at detecting correlations, especially subtle correlations that an analysis of smaller data sets might miss, it never tells us which correlations are meaningful. A big data analysis might reveal, for instance, that from 2006 to 2011 the United States murder rate was well correlated with the market share of Internet Explorer: Both went down sharply. But it’s hard to imagine there is any causal relationship between the two. Likewise, from 1998 to 2007 the number of new cases of autism diagnosed was extremely well correlated with sales of organic food (both went up sharply), but identifying the correlation won’t by itself tell us whether diet has anything to do with autism.

Sorin Adam Matei

Sorin Adam Matei - Professor of Communication at Purdue University - studies the relationship between information technology and social groups. He published papers and articles in Journal of Communication, Communication Research, Information Society, and Foreign Policy. He is the author or co-editor of several books. The most recent is Structural differentation in social media. He also co-edited Ethical Reasoning in Big Data,Transparency in social media and Roles, Trust, and Reputation in Social Media Knowledge Markets: Theory and Methods (Computational Social Sciences) , all three the product of the NSF funded KredibleNet project. Dr. Matei's teaching portfolio includes online interaction, and online community analytics and development classes. His teaching makes use of a number of software platforms he has codeveloped, such as Visible Effort . Dr. Matei is also known for his media work. He is a former BBC World Service journalist whose contributions have been published in Esquire and several leading Romanian newspapers. In Romania, he is known for his books Boierii Mintii (The Mind Boyars), Idolii forului (Idols of the forum), and Idei de schimb (Spare ideas).

Leave a Reply

Your email address will not be published. Required fields are marked *