A couple of months ago I was at an American Marketing Association event on predictive marketing analytics. One of the sponsors was demonstrating the powerful capabilities of their prediction software in a rather creative way: the set up a fortune-telling booth. I gave them my name and zip code, and the fortune-teller accurately predicted, amongst other things, that I had just bought or was about to buy a new car.
It’s stunning the extent to which our behavior as consumers is utterly predictable, and many marketing companies and retailers are becoming much more efficient in grouping us not only by who we are and what we like but by what we’re about to do. This is how they do it.
By now, everybody’s heard the term “data mining” and its kissin’ cousin “predictive analytics” being thrown around. But they still have a black-box quality to them. I give you my customer transaction data, you run it through your “predictive analytics”, say the magic words, and the computer spits out exactly how to manipulate him or her into spending twice the money he was originally going to spend. Right? Well, not quite. So it might not be a bad idea, without getting too bogged down in the math, to demystify this process a bit and pull back the curtain on mathematical mind-reading.
Let’s start with simple business dashboards. Most people are familiar with the concept because they’ve been around for a while. Business Intelligence software like Tableau, Qlikview, Hyperion and Cubeware do nothing but extract data from a company’s transactional systems, and pull it together into neat little displays of trend lines and bar charts. It’s very easy to see the value in knowing (in real-time) exactly how your company’s sales are trending, whether its operations are optimized, where any bottlenecks might be, etc. This is the level of inquiry at which most operations run: nothing more sophisticated than summing and counting up units and displaying those as simple charts.
But a surface-level look at data only leads to surface-level understanding. Thomas Redman, in his blog article for the Harvard Business Review, tells us that we need to dig deeper and think more analytically. For example, a simple plot can tell you if two things are correlated, but it takes judgment to determine which thing caused the other. Or there could be a missing third variable causing both.
Here’s a more specific example: you’re managing a marketing campaign on Google, and you’re doing A/B testing with ads. The first ad gets you 50 clicks in a month. You try the second ad next month, and you get 55 clicks. A simple trend line would seem to indicate that the second ad is slightly better. But were those five extra clicks caused by the quality of the ad, or did they happen randomly?
This is where drawing conclusions about data gets more interesting. We move from a simple aggregation of data into data mining. Data mining means using statistical techniques to find insights that couldn’t otherwise be seen with simple sums or counts. It moves from the realm of a straightforward engineering project with simple, correct answers into the grey area of data scientists, a complex mixture of factors, and stronger or weaker conclusions.
Data mining, at its core, is about grouping things more effectively using statistical modeling. It is useful across all functions of a company, but there is a specific interest in it from marketers. Among other applications, data mining can be used to group or segment customers so that they can be more effectively targeted. Here is a quote from The Modeling Agency on what data mining means:
…the process of developing mathematical models that identify groups of individuals who display [a] behavior of interest at differing rates. This group identification allows us to discriminate in the allocation of our resources more effectively. In short, we are looking for a better way to break our relationships into groups so that we can allocate more resources to the groups that benefit us, and minimize the resources to those groups that have a negative impact our specific performance metrics.
Even though the means get more complex and nebulous at this point, the fact remains that our consumptive behavior is still heavily predictable. Loyalty Builders is a remarketing and loyalty program marketing services company that makes heavy use of analytics. In a newspaper interview, CEO Mark Klein talks about using customer transaction data to predict which customers are going to make near-term repeat purchases, and which other customers are going to defect.
Klein makes use of predictive analytics, which is a subset of statistical data-mining techniques that speak to future events instead of past events. For example, what if I notice that your purchases from my company are becoming less frequent? My data shows me that customers who slow their purchases are likely to defect. So when you call in, I’m going to make sure the person talking to you knows that this is a risk with you, and acts to prevent that defection.
Let me give you a quick idea of how this is done. One technique used very frequently in this field is called scoring. Let’s say that I own a bike company and I want to predict whether you will buy a bike. I have a bunch of data from past customers, and that includes variables like marital status, family status, commute distance, geographic region, etc. I want to find out which of those factors will help me predict your purchase, and how strong each factor is relative to one another.
I use a statistical technique called Logistic Regression and assign a point value to each variable choice based on the strength of its predictive power. A commute distance under 2 miles might be worth 75 points, for example, but a commute distance of over ten miles might only be worth 10 points. Then I add up your total score, so a married male from the pacific northwest with no kids and a one-mile commute might have a score of 380 out of 400 points, indicating that he might be a very likely purchaser. But a single female from the midwest with three kids and a 10-mile commute would only add up to a score of 150 out of 400 points, indicating that a purchase is less likely. I then concentrate my resources on converting the married male into a customer. This technique allows me to group people by a very complex mix of factors, rather than relying on traditional demographic demarcations.
The article from The Modeling Agency goes on to warn us about the potential pitfalls of using techniques like these. Because they are so powerful and yet not well understood by business stakeholders, they’re spoken about with a magical quality. It’s tempting to get some data-mining software, pop numbers into it like a calculator, and then build assumptions on top of the output without really even understanding it.
It’s important to remember that these techniques do represent a grey area in between right answers and wrong answers; getting closer to accuracy depends on the quality and amount of data as well as the strategic thinking of the data scientists who set up these analytics. They are only effective in the context of a strategically aligned analytics project with clear questions and goals. Models have to be optimized and re-optimized, and might not even be effective until more or different data is collected.
We have only seen the tip of the iceberg as to the predictive power and financial windfall involved in predictive analytics. Therefore whether you’re a business owner, marketer, department head, or another stakeholder, it’s important to get past the intimidating, black-box mystique surrounding analytics-based behavior prediction and start becoming educated. This knowledge represents the future of doing business, and our future market leaders will all have mastered these techniques.
Interesting. However, it’s all very well finding things out about customers in such an indirect manner, but I do wonder why the more obvious method of simply asking is not used more. Is it too basic, too simple, too old fashioned?
As a consumer I prefer targeted ads so I am not bombarded with products in which I have no interest, but I do wish sometimes that there was some kind of survey actually asking me my preferences rather than just making assumptions based on demographics and stats. For example, I hate moving ads on the internet. I’m trying to read something and this thing in the corner of my eye keeps moving to attract my attention. It’s annoying and does not make me want to click on it, it makes me want to cover it with something so I don’t even see it. The only thing worse is ads which make a noise. Nothing will drive me away from a site faster than some ad shouting at me. It makes me panic. I detest it. I would keep the pc muted, but I share it and my husband often leaves it on loud. I don’t want to be forced to use an ad blocker because for one thing that is how some websites support their existence and for another thing I have no objection to being informed of products I may wish to purchase. But I do wish there was a way to control how those ads appear.
Also, I would hope that the trying harder to get the man with the shorter commute to buy the bicycle would not mean the mother with the long commute was discouraged from doing so, as she might be looking for a means of exercise rather than getting to work, or buying it as a gift. It might be good to enable eager purchasers and not pressurise those who might not need the thing into buying a thing they have no use for, but making assumptions which can make a customer feel unwelcome is problematic. I suppose those are the pitfalls you mentioned.
Thank you for the in depth reply! This kind of application for customer and purchase information is not without controversy, and there are many strong feelings about how it should and should not be used.
A quick thought on your specific point about simply asking customers about their preferences. Survey research is and will be for some time a crucial part of marketing analytics. The act of asking customers what they like and don’t like is not going anywhere. But interestingly, customer action and purchase information is proving to be a much more accurate predictor of preference than is the customers’ survey responses. For a variety of reasons, people don’t answer surveys in ways that end up reflecting reality.
One of the interesting benefits of data analytics is that we can get closer to predicting what people actually like or want than we could by simply asking them.