Relevo uses ad hoc metrics to measure product impact on users
Product and Tech Blog | 21 February 2024
In February 2022, I met with 10 colleagues for five consecutive days in a meeting room in Madrid. These colleagues — journalists, designers, developers, and sales people — and I sought to define the differential feature that would make Relevo, a Spanish sports news Web site we would launch six months later, known to the world.
The decision was that the product’s home page would have a user interface inspired by Instagram Reels and TikTok video posts.
It was a bold decision but not a capricious one. We had conducted extensive user research and observed that swipe-by-swipe mobile interfaces were quickly adopted by users, especially Gen Z and senior Millennials, who were our strategic beachheads.
We had no idea then about the impact our choice would have on various fronts, like video production and editor training. Arguably, the biggest challenge was deciding how to measure this new interface.
The right stuff
The digital news sector has been dominated for years by metrics like “uniques” and “visits.” The issue with them is that they only reflect volume and not necessarily quality.
Even with the recent progress of engagement metrics, the absolutist rule of “commodity indicators” has been a deterrent to innovation because they can’t capture the specific impact of what makes each product unique and valuable.
The issue was apparent in the discussion the team has had since launch: What is the right length of a headline for a home page such as this one? The newsroom gravitated toward longer titles. The product team felt this put too much cognitive load on users, in contradiction with the fast-swiping appeal of the interface.
A counter argument emerged: Fast swiping could discourage clicking on headlines, and this could damage the most sacred metric of all, which was pageviews.
How could we break this argument loop? A famous adage attributed to British economist Charles Goodhart says that when a measurement becomes a target, it ceases to be a good measurement.
We stopped arguing about the measurement and turned our attention to the target: What was the ideal use case we envisioned for our new product? Only then could we meaningfully discuss metrics.
Slicks: Mixing swipes and clicks
The use case we agreed on was one of a user navigating every day to our home page on her mobile phone, swiping until finding a headline that grabs her attention, clicking to read the article, going back, swiping again, then clicking, and so on. In doing this exercise, we quickly understood that “swipes vs. clicks” was a false dilemma. They go hand-in-hand. One leads to the other.
There is no out-of-the-box unit to measure both swiping and clicking, so we crafted our own hybrid metric by combining swipes and clicks. We called it slick.
We modeled the formula with our editors in mind, so it had more to do with cognitive psychology than hard math. To keep things simple to recall, we multiply both figures. This avoids floating-point numbers. In the case of clicks, the factor is bigger because they are less frequent than swipes, and we wanted the two components to be roughly similar in magnitude.
Putting our new metric to work
Data was key to settling the debate, so in July 2023 we started measuring the average character length of headlines. With some help from ChatGPT, we developed a Python script that scrapes the home page every 15 minutes, measures headlines, and saves the data to a file.
The typical value for the first five months of measuring was 90 characters per headline. The trend was increasing by about one or two characters per month, with a minimum of 87 in July and a peak of 93 in November. (Words tend to be longer in Spanish than English.)
Meanwhile, the slick index for the period was 119. The best mark (132) was recorded in September, when Relevo jumped to international relevance because of scoops in the coverage of the Rubiales case.
We retroactively calculated the slick index down to launch date, and discovered it was even lower: only 93, on average, during the first half of 2023. And although we didn’t have headline length data prior to July, we all had the impression that it was even longer.
Here, a sensible hypothesis emerged: the shorter the headline, the more the slicks.
Nudging the newsroom with Pringles
Data needs to be visible to be acted upon, so we coded (again, with a little ChatGPT help) a Chrome extension and installed it in our editors’ browsers. The extension (code-named “Pringles” because we wanted our headlines to be ideally small and of equal shape, as the famous potato chips) injects a purple box in the page with current data about headlines, including character length.
We also created a mobile dashboard in Adobe Analytics to put the data on slicks at hand for everyone.
With both tools up and running, an editorial policy to post shorter headlines was enacted in December. Almost three months have passed since then, and the figures seem to confirm the hypothesis so far: The slick index improved as headlines became shorter each month.
On a scale from -1 to 1, there is a Pearson Correlation Coefficient of -0.7 between headline length and slicks. Although statisticians always remind us that correlation does not imply causation, with no other evident variables at play, the trend strongly suggests the shorter the headline, the better the engagement.
Standard metrics have also significantly improved, with a 19% increment in visits to the home page per visitor, and 26% more home page views per visitor.
By focusing our discussion on the metrics instead of the goals, we were inadvertently making Goodhart’s statement true. Metrics are not an end but just a proxy to verify human interaction with digital products. It’s OK to craft your own measurement to make sure you are hitting your target.