When readers are on our platforms, what do they read? What keeps them coming back, and what inspires them to subscribe? We wanted to answer these questions to get to know our readers better, to find out what is important to them and what they need. We also wanted to find out what we can do to achieve greater reach, more subscriptions, or to strengthen subscriber loyalty to our products.
We developed our own analysis to identify which topics work well or less well and whose output we should increase or review in the areas of reach, paid content, and retention through full-text topic modelling of our articles. The methodology is based on various text clustering techniques and finds finer topic groups than internal re-sorts, sections, or tags allow.
We then evaluate the performance of these topic groups in the areas of reach, paid, and retention in comparison to the costs, meaning the amount of published articles on the topic. In this way, we can identify topics for which we generate a lot of content but that are not well received by our readers, or topics for which we have little content but are of great interest to our users.
Analysing stories for future coverage
Which topics work well on reach, which ones are users willing to pay for, and which ones do subscribers read regularly? One way to analyse this is to look at the page views and subscriptions for each category.
Each article is assigned to a category by the journalist who wrote it, for example sports or politics. The problem with these sections is that they are very broad. Is a reader who reads an article about Angela Merkel interested in politics in general? Or is she interested in the topic of rent, on which Angela Merkel is quoted in the article? Without a more detailed tagging of the texts in terms of sub-topics or even emotions reflected in the article, the challenge is to identify the specific topics of the articles. We used different topic modelling and text clustering algorithms to achieve this goal.
The figure shows a simple example of a network of marker words that often appear together in articles and can therefore be combined to form topic clusters. For example, many articles that contained the word “goal” also contained the word “soccer” or “team.” There are occasional cross-references between the clusters because there are marker words that occur in several clusters, such as the word “fan,” which belongs to both the soccer and music clusters.
Text clustering algorithms can be used to identify these marker word clusters. The program evaluates all articles and divides them into groups so that all articles in a group have as much in common as possible (frequently occurring marker words), while the groups differ from each other as much as possible (have as few links to other clusters as possible).
Refining the search
We applied this topic modelling in several steps by searching again for clusters within the found topic groups. Instead of sorting our articles only according to whether they deal with the topic of soccer, we found, for example, groups of articles that explicitly dealt with emotional events on the soccer field instead of just reporting on matches.
Being able to assign the articles to the most detailed topic groups possible was the basis for analysing our topic performance. We compared the output — the number of published articles per topic cluster — with the number of page views, subscriptions, or page views of subscribers. To interpret the results, we used a grid that divides the topic groups into successful and less successful.
We were inspired by a Content Portfolio Framework from Amedia. This allows us to identify topics on which we publish few articles, but that are read a lot, thus showing untapped potential. An increase in output could also mean an increase in page impressions. Likewise, topics can be identified where the interest of the readers remains very low despite high outputs. A closer look is worthwhile here: Should the output be reduced or the type of reporting changed?
Using a standardised analysis to individually analyse the digital content portfolio of each newspaper title made it possible to draw up concrete instructions and tips for adapting the portfolio personalised for each individual news platform as well as separated by target area (more reach, more subscriptions, better retention). The visualisations and on-site presentations helped communicate these instructions and tips to the newsroom, and in combination with our daily newsletter and paid content trainings, we believe it has had a big impact on how newsrooms now evaluate topics before deciding on how to cover them.