Sweet jesus. Where to start? Low-hanging fruit: That's not a topic sentence, that's a remix. Less the drama and obfuscating language, we have Billy Beane, the general manager of the Oakland A's, became a star in 2003 thanks to Michael Lewis and his best seller "Moneyball." Which follows throughout the rest of the essay - it's basically "Malcom Gladwell, COMMA, William Whyte, COMMA, big data, COMMA and human resources departments are all conspiring to damage your hiring prospects, COMMA, leisure time, COMMA, pursuit of happiness, COMMA, and general well-being." A few things with that. 1) 80% of hires beyond first jobs are referrals from friends and colleagues. In other words, choices HR supervises but has no input into. 2) Nobody cares about Malcolm Gladwell except Malcolm Gladwell. 3) The Oakland A's use scouts now, just like everyone else. Nate Silver's book, The Signal and the Noise, is about exactly this. The argument put by Silver is, in four words, "garbage in, garbage out." He's the anti-Taleb and, unlike the other hand-wavey "visionaries" talking about "big data" he makes the point that his predictions are as accurate as they are not through magic but through dispassionate evaluation of the metrics that matter. Datasets matter in weather prediction because finite element analysis requires a geometric increase in data points for a decimal point increase in accuracy. Datasets do not matter in baseball scouting because there's no reasonable way to determine which data matters and which doesn't and simply throwing more at it increases the workload without increasing the salary. Silver ran his own predictive software because of Moneyball and managed to get some useful (and sellable) results, but the good scouts beat his predictions then and still do. He points out that Billy Bean uses scouts now because a scout has a better idea when to pay attention to the model and when to ignore it. It really comes down to how well you know the data and what you can do with it - the conclusion has a progression of analysis that's worth reprinting in full, to illustrate the nature of "data-driven" prediction: What you're left with is the truism that large analytical approaches haven't taken over the world not because no one has never tried them before, but because once you refine them to the point they're useful you're generally left with something that is of marginal utility unless you already had a pretty goddamn good idea what you were doing in the first place. So. The article boils down to "HR departments are resorting to phrenology to keep you out of a job" (+9000 other words). Thing is, HR departments have long resorted to phrenology to keep you out of a job. If your hiring is reliant on the HR department, you aren't getting the job anyway. That's been true since HR was invented. Back then, however, they knew how to use commas.In 2003, COMMA, thanks to Michael Lewis and his best seller Moneyball, COMMA, the general manager of the Oakland A’s, COMMA, Billy Beane, COMMA, became a star.
a) No investor can beat the stock market.
b) No investor can beat the stock market over the long run.
c) No investor can beat the stock market over the long run relative to his level of risk.
d) No investor can beat the stock market over the long run relative to his level of risk and accounting for transaction costs.
e) No investor can beat the stock market over the long run relative to his level of risk and accounting for his transaction costs, unless he has inside information.
f) Few investors can beat the stock market over the long run relative to their level of risk and accounting for their transaction costs, unless they have inside information.
g) It is hard to tell how many investors beat the stock market over the long run, because the data is very noisy, but we know that most cannot relative to their level of risk, since trading produces no net excess return but entails transaction costs, so unless you have inside information, you are probably better off investing in an index fund.
The first approximation – the unqualified statement that no investor can beat the stock market – seems to be extremely powerful. By the time we get to the last one, which is full of expressions of uncertainty, we have nothing that would fit on a bumper sticker. But it is also a more complete description of the objective world.
Some interesting points kleinbl00 though I have a different take on some of them. First what I agree with... Yes, knowing your data absolutely yields better and more predictable/useful results. This is particularly true with understanding the data integrity, without which no good analysis can be definitive. Also, if you want your analysis to be understood by you and your audience, I'll bang the less is more drum all day. However, assuming one is analyzing data they don't know much about other than it has integrity, it's here a divurge. In this scenario, untold and surprising discoveries can occur. Discoveries that would not have been found if analysis was taken under the curse of knowledge (read: analyzed under the constructs of what you know (or assume to be absolute). This to me is what is so exciting about the world of big data. While scary for sure, and unmanageable for many lots of data can transform industries.
Not a quibble - If you have a dataset and you explore it, you're likely to find interesting correlations. That's what large-dataset survey experiments are about - you collect all the data you can, then you comb through it afterward to see what you can see. It's a useful and valid method of scientific discovery. The failure of "big data" happens when that "scientific" step - the part where you go "hmmm - I see a correlation between A and B, let's investigate further" - gets bypassed in favor of expediency. Correlation does not imply causation and it's important to discover whether or not you have causation at hand or not. Silver's argument is that much "big data" analysis presumes causation every time you see correlation, which this article is full of. For example, an AFC win in the Superbowl has an 80% correlation with a stock market decline. Scientific method calls for investigating whether or not the Miami Dolphins actually have the power to bring down the DJIA. "Big Data" simply says "eighty percent! Woo hoo!"However, assuming one is analyzing data they don't know much about other than it has integrity, it's here a divurge. In this scenario, untold and surprising discoveries can occur. Discoveries that would not have been found if analysis was taken under the curse of knowledge (read: analyzed under the constructs of what you know (or assume to be absolute).
Definitely valid points. There are certainly misleading correlations and I absolutely agree with you about the laziness of some pop psychologist writers that make this blind leap too often. But given the option of facing a decision made with strong correlation and under decent levels of confidence thresholds, I'd be more inclined to base my decisions on these correlations to inform some of my decisions. Of course this assumes that mitigating factors in the study are accounted for and my gut is not actively steering me another direction... PS: The stock market falling with AFC winning the Superbowl stat is awesome!
As an aside, the more I hear the term 'big data' used in the media the more the term seems to have drifted from its original meaning and connotation. I could be wrong, but I doubt baseball scouts and HR are analyzing petabytes of data. Everyone wants to think they have big data, so few do.