Some interesting points kleinbl00 though I have a different take on some of them. First what I agree with... Yes, knowing your data absolutely yields better and more predictable/useful results. This is particularly true with understanding the data integrity, without which no good analysis can be definitive. Also, if you want your analysis to be understood by you and your audience, I'll bang the less is more drum all day. However, assuming one is analyzing data they don't know much about other than it has integrity, it's here a divurge. In this scenario, untold and surprising discoveries can occur. Discoveries that would not have been found if analysis was taken under the curse of knowledge (read: analyzed under the constructs of what you know (or assume to be absolute). This to me is what is so exciting about the world of big data. While scary for sure, and unmanageable for many lots of data can transform industries.
Not a quibble - If you have a dataset and you explore it, you're likely to find interesting correlations. That's what large-dataset survey experiments are about - you collect all the data you can, then you comb through it afterward to see what you can see. It's a useful and valid method of scientific discovery. The failure of "big data" happens when that "scientific" step - the part where you go "hmmm - I see a correlation between A and B, let's investigate further" - gets bypassed in favor of expediency. Correlation does not imply causation and it's important to discover whether or not you have causation at hand or not. Silver's argument is that much "big data" analysis presumes causation every time you see correlation, which this article is full of. For example, an AFC win in the Superbowl has an 80% correlation with a stock market decline. Scientific method calls for investigating whether or not the Miami Dolphins actually have the power to bring down the DJIA. "Big Data" simply says "eighty percent! Woo hoo!"However, assuming one is analyzing data they don't know much about other than it has integrity, it's here a divurge. In this scenario, untold and surprising discoveries can occur. Discoveries that would not have been found if analysis was taken under the curse of knowledge (read: analyzed under the constructs of what you know (or assume to be absolute).
Definitely valid points. There are certainly misleading correlations and I absolutely agree with you about the laziness of some pop psychologist writers that make this blind leap too often. But given the option of facing a decision made with strong correlation and under decent levels of confidence thresholds, I'd be more inclined to base my decisions on these correlations to inform some of my decisions. Of course this assumes that mitigating factors in the study are accounted for and my gut is not actively steering me another direction... PS: The stock market falling with AFC winning the Superbowl stat is awesome!