ALERT: Early (discounted) registration deadline for Helsinki Media Innovation Week is today

Relative rankings can equal missed opportunities when assessing article performance

By Ariane Bernard

INMA

New York City, Paris

Connect      

Hi everyone.

Three things on deck:

  • I just turned in my INMA report on personalisation, months in the making. Look for it shortly on INMA.org. I’m excited for you to read it.

  • We’re just three weeks away from the start of our Product and Data for Media Summit that INMA Product Initiative Lead Jodie Hopperton and I are curating. Check it out, and I hope you’ll join us.

  • A new Taylor Swift album comes out on Friday, so I’m getting mentally ready for this. OK, this has nothing to do with our topic here, but Rules of Three said I needed a third — I’m excited for me to listen to it.

All my best, Ariane

The problem with relative rankings

In my previous newsletter, I wrote about the concern of whether we were chasing the “right” thing when we were chasing after simple engagement KPIs (pageviews, scroll depth etc). Whether you’re a product manager who is looking to understand habituation (but, of course, there’s no single metric for habituation) or a newsroom editor who is trying to gauge interest for your organisation’s coverage, the ambiguity of looking at your data is, well, is that even the right data?

But then, there is the question of how we analyse the data itself, which is what this week’s newsletter is about.

Ranking the most-read articles may not be the best way to judge performance.
Ranking the most-read articles may not be the best way to judge performance.

To explain: We spend a lot of time looking at rankings when it comes to analytics. Top 10 lists, Most Read/Best Performing, etc. 

Two reasons: 

  • Lists cram a lot of simple information. Understanding things relatively is a lot easier than placing the importance of things against an absolute. Anyone understands “A is bigger than B,” and you need to understand neither A nor B to understand this sentence. However, “A got a score of 75” requires that we understand what 75 means. 
  • We all have an interest in leveraging the Pareto principle, which you may know as the 80/20 rule. We know that focusing a fraction of our effort on just a few things can have maximum impact because not everything has the potential of equal impact in the first place. So we look at a Top 10 list, and we think “that’s really where most of our money is, anyway.” And that’s true.

Yet there are false equivalencies and missed opportunities in assessing things through that relative lens — which is that we often overlook how the data has been handicapped in the first place.

Handicaps can be:

  • The amount of promotion your content received. Did it make the homepage? Did it get heavily promoted in social?

  • The clarity of the headline. For a news-y article, the headline may contain a significant amount of the information. Readers may just be satisfied with scanning this headline rather than giving you a click to further explore.

  • How trendy the article is (flash-in-the-pan traffic or long-term traffic contributor that will be part of your long tail)?

“I no longer want to show Top 10s,” said Janis Kitzhofer, head of editorial analysis at Axel Springer in Germany. “I show people individual articles and I ask them, ‘Do you think this is a good performance or not?’ and we look at individual metrics.” 

What Janis is doing in shifting the conversation to the individual performance of an article is that he is really relying on baselines. When we start to recognise “this is good,” it’s because we’ve internalised what good even is. And we are also going to factor in what we know about the handicaps of this article: “Yes, this is good performance, but considering this article sat high on the homepage for 24 hours, maybe it’s only just OK.”

A love letter to baselines

The opposite of relative rankings is evaluating performance against a baseline.

Of course, there is a bit more work in baselines because baselines have to be thoughtfully designed. 

There are several factors in building an appropriate baseline.
There are several factors in building an appropriate baseline.

And thoughtfully designed really covers two different exercises: 

1. The factors we consider in building an appropriate baseline:

  • The type of article (news, non-news) or even the topic/section of publication (technology vs. dining).

  • Factorizing what kind of external promotion the article received and measuring performance relative to the common distribution of promoted articles.

  • Factorizing what kind of on-platform promotion the article received: roughly speaking, how much homepage time and in what position relative to your usual scroll-depth? Homepage promotion “downpage” isn’t at all the same as being ranked in a prized spotlight position that’s reliably in view of all homepage pageviews.

2. Paying attention to the standard distribution of your articles against this baseline. 

You usually would think of this as relative percentiles. This is particularly important when you consider that depending on the type of content you measure, you may find, for example, that most articles tend to perform within a small spread of performance level — whereas for other baselines, content may be more evenly spread across the scale. 

Say you’re a general news publisher with a strong Facebook presence, and you promote five articles via your page per day. Your mix of Facebook stuff is generally trending toward lifestyle content (which does better on Facebook), but there’s occasionally a more news-related item in there.

The lifestyle content tends to perform across a smaller performance range: Basically, everything tends to perform well, and outliers (blockbusters and utter crashes) are rare. Your social teams know what they are doing.

With news, interest can have a lot of variation of interest. Think about the early days of COVID or the death of Queen Elizabeth II — high peaks of interests. So reporting on the performance of an article on Facebook should have its own baselines with a factor looking at the profile of the article to further subset that baseline. 

Then, the referring platform itself — Facebook, here — can have its own baselines or become the baseline for another calculation. That’s what Axel Springer does with their alerts that inform the newsroom about certain traffic patterns. 

“We push alerts when we see articles coming strong from Google Discover but with low social,” Janis Kitzhofer, the head of editorial analysis at Axel Springer in Germany, told me. 

The calculation here is that if an article is able to get a certain amount of traffic from Google, it should be able to get a proportionate amount of traffic from social — and that proportion may be different depending on the type of content or the organisation. In this case, the baseline is actually one trend line of performance (Google). 

You can think of it as “articles that get at least N PVs for Google” — and calculate other baselines like “number of PVs from social.” So an article may find itself in the group of Google Discover high performances, and then its performance across other dimensions is evaluated against baselines for this group of Google high performers.

Think of baselines as being why you have sports league. Is it that interesting to only force rank item? Usain Bolt is always going to be running at the front of any race — but that doesn’t mean there is not a lot of potential in a kid who is running laps around their local park.

Baselines are about comparing things on a realistic basis of homogeneous items. And, if used properly, they should help us identify the undiscovered supermodels in our midst.

Further afield on the wide, wide Web

Why your company needs data-product managers” in the Harvard Business Review is worth a read.

An argument for the kind of person who makes a good data-product manager: someone who has the traversing qualities of a good PM (cares about usage, cares about business) but understands enough about data that they can conceive of a product that’s useful and valuable. 

Not the right candidate: the data scientist who cares more about the perfection of the product itself rather than how the product performs in the world. 

To be honest, this is a distinction that, in general, applies to software engineers versus product managers — I should know.

About this newsletter

Today’s newsletter is written by Ariane Bernard, a Paris- and New York-based consultant who focuses on publishing utilities and data products, and is the CEO of a young incubated company, Helio.cloud

This newsletter is part of the INMA Smart Data Initiative. You can e-mail me at Ariane.Bernard@inma.org with thoughts, suggestions, and questions. Also, sign up to our Slack channel. 

About Ariane Bernard

By continuing to browse or by clicking “ACCEPT,” you agree to the storing of cookies on your device to enhance your site experience. To learn more about how we use cookies, please see our privacy policy.
x

I ACCEPT