Der Spiegel uses data to predict long-term subscribers
Smart Data Initiative Blog | 26 April 2023
When Der Spiegel wanted to grow its subscription base, it decided to focus on how to predict what would drive those new subscribers.
During Wednesday’s Webinar, Alex Held, a data scientist at the German news media company, walked INMA members through the steps and processes his team used to make such predictions.
Held explained the journey began a couple of years ago when the company realised it was getting a large number of daily active users on its Web site, but they weren’t subscribing:
“They were coming to us every day, reading a lot of content, spending multiple hours a week or even sometimes on a day, but not [subscribing],” he said. “If you think about other platform-oriented businesses, like Netflix or Spotify, it would be crazy to have such a huge amount of users on the Web site but not having subscribed them.”
Der Spiegel saw “huge potential” and started brainstorming how to reach the more than 1 million unsubscribed visitors. It also decided to use machine learning to help solve the problem.
First, it was important to identify possible reasons people weren’t subscribing. This included such things as not wanting to pay a higher price after an introductory offer, not receiving the right offer, or not understanding what the product offers them.
The team set a goal to identify the most interesting potential subscriber amongst the huge base of users, so it used data to train the machine learning model to find that information. But first, they had to determine what kind of data to use to predict subscriptions.
The team created four categories to look at, Held said:
- Engagement: This went beyond clicks, he said, using time-based features such as average time spent, number of articles read, paywall loads, and whether or not they visited the subscription page.
- Location: “In Germany we have federal states, so our countries split up in federal states and we use that information to feed it into our machine learning approach.”
- Referral: This helped determine where users were coming from (Google, Bing, push notification, etc.).
- Editorial: This was important to identify what kind of departments someone is interested in: “Is it more like politics or is it more sports someone is consuming? Is it more text-based content? Is it audio? Is it video content?” Held explained.
It’s not always easy to take all the data and train a machine learning model because some of the data won’t be of good enough quality and may need to be discarded, he pointed out. In fact, he noted, at least 70% of the journeys that led to conversion were not included in the model training because of poor data quality.
“That’s really important to understand,” Held said, noting that it led to Der Spiegel’s most important learning: “Our model’s not performing well and predicting the total sum of daily subscriptions, which was expected and is not the goal for our model. But on the other end it becomes really accurate, accurate in predicting explicitly lasting subscriptions.”
Knowing what to look for
That is an important distinction; the team was only looking to find people who would become long-term customers. Training the model with that goal meant finding and using only data that meets those criteria.
Der Spiegel created a “look ahead” window to measure what it will do in the 40 days following a purchase and a “lookback” window that uses data from the past to predict if someone will purchase or not. The data also evaluates how long someone will stay after they purchase.
“With that data at hand, we train a machine learning model,” Held said. It used random forest model that provided all users with the model score between zero and 100. The higher the value, the more likely the user is to subscribe.
Continuing education
Der Spiegel retrains its machine learning model monthly, Held said.
“We have a complete running production server … and we use that model to serve predictions every day. So every user gets classified every day, and with each reach, with each reshaped model, the model is going to be versioned. It’s going to be saved alongside training data sets and feature selection,” he explained, adding that it’s important to be able to go back to an older version when a newer one doesn’t work well.
Der Spiegel uses Adobe Analytics daily and sends segments for users with high scores to Adobe Target, a tool it uses to run A/B tests or personalise the Web site experience for the user. It also has created a survey for targeted users, relying on data from the survey combined with its model scores.
“We actually use our model scores and we send out the survey, and along with the survey answers we have the model scores,” Held said. “This is quite interesting because now you can ask people some different questions about, for example, how likely it is [you will] subscribe with us in the future? Or you can ask someone if he is familiar with the product or if he’s all right with the price point or you think the product is overpriced or underpriced. And you can merge that with our model score.”
Then, the data team can analyse the survey results and create offers accordingly, which may even include an annual higher discount — which could be worthwhile to attract and retain long-term subscribers: “They’re going to stay along with you because the model already identifies them to be people that wanted to stay along with you.”
A/B testing is also giving insights into the product information that is being offered. Some people, he found, might just need different information on Der Spiegel’s product. “Maybe they’re really early in their journey to a subscription, and maybe they need just basic information on the opportunity to install our app or get our latest newsletter or whatever,” he said. “People [who are] really close to subscription maybe just need different information on our product banners”
To test that, Der Spiegel is testing the personalisation of its ads and banners using the machine learning model.
What’s next
Moving forward, he said the company is looking at analysing cookie lifetime and long-term visit count:
“We want to incorporate that into our machine learning model because we see that people, if they started using our product and use it more extensively for a couple of weeks, are really getting a higher propensity to subscribe.”
Incorporating that information into the machine learning model will help make more precise predictions, he said. Der Speigel also uses machine learning to look at churn prediction and RFV engagement scoring: “There’s a lot of buzz around [machine learning] at the moment.”
If you’d like to subscribe to my bi-weekly newsletter, INMA members can do so here.