The hottest buzzword of the decade is “analytics” – the ability to gain heretofore unattainable intelligence and insight by mining piles of data. As with the Internet in the 90’s, what was once the sole domain of geekdom is becoming mainstream. Whereas once we employed a lone statistician somewhere in a broom closet to “do the numbers” for us, now we have our own dashboards at our disposal to tell us our business status up to the second. You can see the seductiveness of this promise, can’t you? Imagine the control you could exert with perfect intelligence. There was a story in Charles Duhigg‘s The Power of Habit about a coupon customizing system that was so insightful, it inadvertently revealed to the family of a 17-year-old girl that she was pregnant before she had a chance to break the news herself. Imagine what you could do with that insight…
The increased quantification of our lives and our businesses is going to bring us many benefits. It brings us intelligence, insight, better forecasts, and proper understanding of the effects of randomness. Services will be more able to customize their offerings to suit our needs. But they have a dangerous mystery about them, and we are adopting them faster than we are learning to understand their implications.
Analytics and statistics, as well as being the bearers of social insight, are also the instruments of institutional dysfunction. They are the primary refuge of those looking to justify their own incumbency and promotions. How often have you heard one political party tout their effect on, say, levels of unemployment? And then how often to you hear the other party say, “Well, those figure don’t take into account those workers who have stopped looking for jobs.”? And then how often do you hear the first party say, “Well, these figures have been the basis of our economic appraisals for decades.”? Not only can people not agree on the interpretation of a certain statistic, but they can’t even agree on which statistics and assumptions are legitimate.
There is already much made of the ethical use of statistics. The HBO drama The Wire coined the outstanding phrase “juking the stats” – tweaking the underlying observations so that the statistics tell the story you want to tell. Statistics-playing strategies occupy a moral gray area: for example, teachers who teach test questions to their students so that their school won’t lose funding under the No Child Left Behind program. Any time you distribute resources using quantitative measures, people have infinite incentive to figure out how to work the system.
What I’m talking about, however, comes before any ethical questions are asked. Analytics have a deceptive certainty to them; a way of appealing to our own cognitive biases while at the same time seeming concrete. People tend to believe that statistics denote certainty. They generally have no idea how much subjectivity they introduce the moment they interpret a chart or graph. “Of course this is what this means…it’s obvious!” Increasing the amount of analytics does not provide more certainty; as the volume of analyzed data increases, so too will the opportunities for cognitive bias increase. This is especially true if statistical understanding does not increase at the same rate as one’s daily involvement with data analysis…we’ll increasingly be called upon to give interpretations that seem certain to us, but that we’re ultimately untrained to give.
Here are a few principles of human reasoning (in no particular order) which may cause a problem:
1. As soon as one commits a statistic or forecast to writing, its underlying assumptions will be promptly forgotten.
This is seen all the time in financial forecasting. Someone comes up with a number…let’s say a projected revenue number for the next year. They then hand that projection to someone else. Let’s say that next person uses the revenue projection as a basis for their own budgeting. In this, they have assumed the first person’s revenue projection to be sound, without understanding why it might or might not be sound. Most likely they have not asked how that number was derived. They don’t care; they just want to complete their budget. It doesn’t matter that the revenue number in question was based on a model using assumptions that might as well have been guesses or flips of a coin. Someone came up with a number. In the arrival of the number, that number is clung to, and all underlying processes and assumptions forgotten. This brings us to:
2. Statistical evidence promotes reductionist thought.
We crave the simplest answers, and analytics – with the way it provides such “concrete” answers – gives us something to cling to. How tempted are we to avoid nuance when we can? How tempted are we to say, “Just give me the bottom line…I want a number!” Statistics and analytics play into our desire to remove nuance from concepts and situations.
To this day, many politicians advocate returning to the policies of a bygone era because that era saw incredible economic growth. The comparisons we hear most frequently are those to the end of the Reagan administration (real GDP per capita grew nearly 23% between 1981 and 1989). We feel that, if we can just do that bit of history over again, the results will be the same. Here’s the reality: there have been 13 presidential administrations in the modern era (since WWII). This means we are trying to make judgements on a low sample size number of 13. Additionally, the economic and social dynamics of the U.S. are constantly changing throughout the years, meaning that we now live in a much different economic reality from past administrations. Think how much has changed since even the Clinton administration. When you add to that the outside factors that also effect economic activity during an administration: stock market activity, housing prices, debt levels, military activity, etc., there’s simply no way you can draw many useful economic prescriptions from the results of past policies.
None of that subtlety matters to anyone. Once they saw the boom numbers from the eighties, they drew their own reductionist conclusions about its causation and repeatability.
3. Many will see patterns where none exist.
Nassim Taleb addresses this problem in the excellent Fooled By Randomness. Humans are often unaware of the effect of randomness on a certain cause-and-effect structure. At the most basic level, our brains function by ascribing meaning to observations. We have an animal craving to see patterns (see the earlier post on The Personal Myth).
This problem does not only address those ignorant of statistical significance and how to calculate it. Even professional scientists at times describe patterns where none actually exist. Think about the pressure a professional scientist feels to publish their results – something they can only do if those results are shown to be statistically significant (i.e. unlikely to have been caused by purely random effects). In 2005, John P. A. Ioannidis from Tufts University published a now-famous paper called Why Most Published Research Findings Are False. He calls into question the validity of many research findings without necessarily assailing the ethics of the researchers. Most of the cited flaws are due to human bias that enters into the design of the experiments, the inclusion or exclusion of certain observations, and the fact that the standard measure for significance – the p-value – is ultimately an arbitrary number that doesn’t provide the certainty we hope for.
This phenomenon of seeing patterns where none exist will only increase as Business Intelligence starts playing a bigger role in society. Imagine: you have your marketing team send out a communication, and on that same day your dashboard shows an uptick in the number of sales leads. What the marketing effective? Ninety-nine out of a hundred people will say yes. The correct answer is that there is no way to tell without more information and deeper investigation.This brings me to my final point:
4. People fail to realize the subjectivity and bias that they bring to analysis, because conclusions seem self-evident.
Cognitive biases don’t go away once you have a consolidated dataset that you can easily slice and dice. The fact that we are doing more analysis, and especially more untrained analysis, gives rise to more opportunities for subjectivity and bias to distort findings. And nobody recognizes when they are introducing bias. Here are some examples (for more information see this previous post).
Extrapolation Bias (“Hot Hand” Bias): This bias suggests that the good conditions I’m experiencing now will continue forever. For example, positive revenue trends and market conditions are more likely than negative trend to be forecast forward without cause.
Confirmation Bias: This is the tendency to overweight information which confirms prior views, and underweight information which disconfirms those views.
Illusory Patterns: We discussed this in point three, but may favorite visual representation of this bias is Mike Urbonas’s description of the Texas Sharpshooter Fallacy: “A Texas cowboy fires several rounds at a barn. Looking at the holes riddled across the barn wall, he observes there are lots of holes clustered together. He excitedly paints a bulls-eye centered over the biggest cluster of holes and proudly shows it off as proof he is a sharpshooter.”
All books are cooked. Finance people understand this. Reported earnings statements, though their certainty will be taken for granted by market analysts and the like, are always an interpretation based on assumptions and framing techniques. The same is true of the bar charts and trend lines that result from BI analysis. At the end of the day, we want data analysis to tell our story, and we shape it, even unconsciously, to our preexisting narratives. We therefore run the risk of creating great tools that compile and synthesize immense amounts of data, only to reach conclusions that are just as flawed as they were 20 years ago. It is not enough to improve analytical skills and understanding of randomness and bias. We must fundamentally understand that analytics and statistics do not carry the certainty we so desperately believe we are seeing.
Related articles
- Is Data Science Your Next Career? (spectrum.ieee.org)
- Warning over statistics ‘spinning’ (standard.co.uk)
- Can Science Be Trusted? (bigthink.com)
Great blog. Have tried to explain this to peers for years, now I can refer them to your post.
This blog is as lucid and as informative as a research. But I always wonder that even though a greater, emerging, mass of people have partially shunned the importance of statistics, why are we still the slaves of numbers? The points enlisted are true to the core. But then it has to be believed that the managers, rolling out from prestigious institutes, are no less that hypnotists, who juggle the data (read, statistics) in front of everyone and send them in concussion. Many of my teachers have said that humans grasp the best when they see anything quantised or enumerated. I thing, it is not only the problem with statistics in management or economy but also in education. Again, an excellent read.