Media companies need these 3 things to get started with predictive analytics
Smart Data Initiative Blog | 26 September 2022
INMA had set up a few “Ask Me Anything” sessions for me and a few great publishers in attendance at INMA’s Media Innovation Week Copenhagen earlier this month, and this actually had an unexpected benefit for me. While these were pretty different profiles of publishers, some topics of interest reliably cropped up — a kind of Venn diagram of topics of interest, if you will.
The topic that surprised me the most for appearing often — not because it’s wild but rather because it’s specific — is predictive analytics.
And, in these conversations, one thing was clear: This means very different things to different people. So, just so we are talking about the same thing here: Predictive analytics is an area of analytics that produces not just insights from the data at your disposal but also models this data (which, by definition, describes the past) to make, errr, predictions about the future.
So when we talk about predictive analytics, we really talk about machine learning applied to analytics.
What do you need to get started?
Good data: clean, accessible, consistent. You know, good data.
A good problem (in the mathematical sense — your topic of interest may not be a “problem”’ but could also be about recognising an opportunity). A good problem is one that’s well defined: defined in goals, but also well understood in terms of how it presents itself from a data perspective.
The ability to train models and do so with enough speed of execution that the problem you are trying to identify is identified at the earliest possible moment. It’s not much of a useful prediction if the fire alarm is able to anticipate the fire about one minute before it breaks, even if, technically, the alarm was a prediction of a fire about to break out.
Good data
I reread myself here and I think: INMA’s editor is going to want to delete a few instances of the word “good,” which I repeat a good too many times here. But beside lazy writing, the reason I am repeating “good” is because as you set your sights to this next stage of your analytics journey — from descriptive analytics to prescriptive — the quality of what you can do will markedly suffer from any one of these three things being just OK.
Have messy, inconsistent data? You won’t be able to train on it.
Pick an ill-defined problem to work on? You won’t be able to reliably model it.
Have only so-so technical resources to process your data and train? You won’t make it to the finish line.
I don’t mean to sound discouraging, but this is all to say that as you advance in your analytics journey, the requirements become more rigid. And when you consider all of this takes time and investment, there is certainly good reason to wait until there is maturity in the data team, the data itself, and the infrastructure to get going on this prescriptive journey.
Good problems
Good problems to consider for your prescriptive analytics programme are those that have these characteristics:
A late-stage funnel problem, whose attached data is unambiguous. For example, a good candidate problem is one where you will be observing conversions (which are, in analytics terms, “goals”) rather than a problem where you observe engagement. The reason these make for better problems to sink your teeth into isn’t actually because conversions are tied to revenue and engagement is soft n’ squishy. It’s because, generally, engagement is triggered in very high volume by many different levers in your product experience.
A problem where the data is mostly from logged-in users. This is somewhat connected to “good, clean data.” Also it’s because when you look at running models, you want to be able to run A/B tests where you have good control of who gets into the test, who is in your control pool, and possibly follow these tests over a significant period of time. It’s difficult to do this with logged out users.
Does this mean you could never endeavor to get into prescriptive analytics for top of the funnel problems? Sure you could. You could, for example, try to look at how coming from Google predicts certain kinds of sessions or engagement patterns — and try to predict whether this user looks like they could become a loyal user or not, and what seems to increase the likelihood of such an outcome.
But everything else being equal, I wouldn’t choose this kind of problem for some of my early foray into this space. Instead, I’ll look to a great first candidate: Predicting the likelihood to churn.
This topic is a late-stage funnel problem and involves all logged-in users by definition. If you want to get inspired, here are some examples from INMA’s archives:
- Schibsted, as early as 2015.
- The Los Angeles Times, Corriere della Sera, and The Times of India earlier this year.
- The Economic Times (India).
Good modeling
I don’t like futurists because there’s little accountability built into the job. “In 50 years, we will be able to watch Netflix from the inside of our eyelids” — that futurist on stage didn’t stake her speaking fee against this one, so, you know, cool stuff.
But for our training models, there’s a bit more accountability: observable real life.
So if we take the example of a predictive model for churning, some of the evaluation for the model is going to control how a group originally predicted to be likely to churn did in fact perform in real life. There is some probability work that goes into this (intervals of confidence). But in general — with a problem whose data set is clean and known, and users who are logged in — we should be in a good place to be able to assess the quality of the model.
In general, this points to a very important part of choosing a good candidate problem for your predictive analytics work: the ability to evaluate the model against observable data gathered over time.
If you’d like to subscribe to my bi-weekly newsletter, INMA members can do so here.