Media companies should understand the benefits and limits of analytics tools
Smart Data Initiative Newsletter Blog | 21 April 2022
First, I should say there is a particularly fun link in the “Further afield on the Internet” section at the bottom of this newsletter. I wouldn’t be offended if you started there, and it’s worth sticking around until the end of the clip.
But then, when you’re back here — hello. This week, we continue looking at analytics tools, focusing this week on a set of opinionated analytics tools: content analytics tools, the likes of which are likely in your newsroom and well used and loved.
Are you working with data-curious folks in your organisation, working to democratise usage and access? If so, please reach out. I’d love to hear these stories.
And, of course, if there’s anything on your mind or if you have recently done something interesting at your organisation and want to tell me about it, my e-mail is email@example.com, just a tap of a button away.
On with the rest …
Recapping the last newsletter
We took a high-level look at what differentiated an unopinionated analytics tool (think: Google Analytics) versus how opinionated analytics worked.
In a nutshell, unopinionated tools are:
Useful across the whole organisation.
Don’t come with too many biases for what kind of metric is useful where, to answer what question.
Require technical and analytical know-how to use and leverage.
Have a higher learning curve as no two installs are alike.
Only answer a particular question if someone has spent time configuring for that goal.
By contrast, opinionated tools are:
Useful for a smaller, well-known group.
Highly biased to certain use cases or certain types of questions, such as content tools for news teams.
Have an easy learning curve because their user interface (UI) and user experience (UX) are conceived for a discrete group of users.
Come more or less ready to use and are consistent across companies.
Trends, baselines, and references in analytics
Today we look more closely at approaches of many opinionated analytics tools, specifically content analytics tools. These are popular with a group of data-curious but usually not data-specialist professionals in your organisation: the newsroom. What I’d love for you to gain is that the next time you take a look at your favourite dashboard, you see shades of grey you didn’t see before.
First, let’s talk about referentials. In analytics tools, they often fall along these lines:
Averages or moving means.
And baseline isn’t itself a clear referential. In data, the word “baseline” doesn’t mean a fixed, static piece of statistic. Baseline designates a value that the designers of the data framework feel “means” something that speaks to normal operations and performance. This can be a direct value or a computed value.
So, for one product, baseline may be the mean. For another, baseline may mean absolute deviation. For another still, it may be the median absolute deviation.
Whatever it is for a given product, it’s rare that the referential does not contain some sort of inherent gotcha.
1. Know the frame of reference of the top baselines of your analytics tool
If I take some of the more popular content analytics tools you’ll find at media publishers, one of the top baselines is going to be an average of the same day of the week over the past N weeks. So some of the key takeaways of your dashboard will speak to your performance being higher or lower than [a standard Tuesday].
Who should really think about the adequacy of this baseline?
Smaller publishers with uneven publishing volumes or … slower publishers, by which I mean non-news publishers — magazines, in particular. What does a day of the week mean to you as a stable referential?
Organisations with less consistent publishing schedules.
Any publisher where the social promotion calendar isn’t particularly standardised because, if anything, your promotion will have more impact on distribution than just the act of publishing.
For any organisation where publishing and/or promotion of your content isn’t at a high enough consistent volume week-on-week to smooth out variance or for organisations where there is flexibility in how much is published or promoted when … the notion that [day of the week] is a consistent sample is actually not particularly helpful.
In the general world of news, that’s a meaningful unit of time to baseline. We all recognise there is more news in the world at 10:00 a.m. on a Monday than at 7:00 p.m. on a Friday. But is that something your organisation is actually pegged against?
2. Know how your favourite tool determines what an active session is
Most of the popular content analytics tools have a purview to help you understand “what’s popular right now.” This falls into two categories of information:
How many entries into the content come from where. This is a standard content analytics tally.
And, for these tools with a real-time component, a notion of what pages have the actual attention of users.
The grey area of this latter paradigm isn’t in the baseline this time around — it’s in the measuring of the article being highly trafficked “right now.” And it ties to some bits that I could hardly imagine you would have looked into unless you have a fairly technical background.
First, a quick look at how your Web browser works. There are not a lot of efficient ways to constantly measure the activity of a user on a page. There are inefficient ways to do this — for example, some analytics tools used in product analytics will record entire sessions of a user’s activity. But this is used in very specific cases and usually sparingly because of the high resources involved: Tools that make heatmaps do this, for instance. These tools are expensive in terms of computing resources and your publisher wallet, and they slow down Web browsers. So they are not a good option to keep tabs regularly on how your users are interacting with your pages.
So now, where your preferred content analytics tools tell you “this many users on this page right now,” they will also readily explain to you that the proxy of “users on the page right now” is derived by what are called “events” — an analytics concept that takes certain kind of user interactions such as clicks, scrolls, and certain pieces of your page coming into view and tracks them. Think of it a bit like a “proof of life.”
Varying a bit across some of the more popular tools, then:
Tool A will consider that an event recorded every N seconds means a continuous session.
Tool B will have a different number of seconds.
The problem isn’t that different tools have decided on a different number of seconds. It’s a problem potentially if you try to compare the numbers of both these tools, but that’s hardly what regular analytics users do. See Chartbeat’s very clear explanation from how their approaches and Google Real Time’s vary.
In fact, the problem is the content configuration of pages will greatly affect how certain methods will classify a session as active or not. The biggest place where this occurs is around videos.
3. How your content configuration will handicap certain pages in looking like winners in your favourite content analytics tools
Does your page contain embedded clips? Your analytics tools may be able to track the video plays (if they are your own player and your player is supported). But when Video ABC is being played on Page 123 right now, that’s not necessarily helping Page 123.
As the user clicks into the video, and quietly watches, she may actually appear to be unengaged with the page. Chartbeat explains their view on this one — and it’s quite defensible. I’m picking ChartBeat here because it’s popular, but all of your favourite analytics tools in this group have a version of this quandary.
Part of it is technical, but part of it also ties to precisely what Chartbeat’s opinion is on what “engaged” means. As Chartbeat explains, they had to make a decision on how to treat video plays — and no way was going to be reliably always correct all the time. The way we measure events is not continuous (unlike the heat mapping tools), so there’s a bit of a blind angle on whether that video play is really “active” or not. Your favourite analytics tool has to decide on how to classify this play.
So you can easily see what this may mean for your article with a great 20-minute embed: This article is now competing with others on your site and looks, everything else being equal, less engaged. It may very well be that the video is actively being watched! But your analytics tool may not classify this as an engaged session.
Whether yours is a site with infrequent videos but the occasional blockbuster, this can affect you. Your blockbuster clip – particularly if it’s longer – may create a weird edge case for the page. If yours is a site with frequent use of videos, this can affect you even more. But you’ll also have baselines that are likely to smooth out outliers – namely, your engaged session baselines will consistently be a bit lower.
Not exactly breaking news, but analytics tools are imperfect like all of us. And the more opinionated they are, the more the users who are using them tend to be non-specialist or technical, and the more their edge cases are harder to recognise.
Should you forgo this type of analytics products just because there are built-in assumptions that are hard to parse, assumptions you may not agree with, or situations where the paradigm of the product will actively be derailed by what happened on your property?
These types of tools have their place. Some great conversations have been started thanks to these tools giving data to professionals who really were never going to bring your big analytics tool suite at the center of their lives. They are happy with their opinionated content analytics tools, built with a good understanding of their specific needs and non-specialist background. And this is worth a lot.
But I would argue there are enough “gotchas” that they ideally should be combined with other referentials that are closer to your specific business. Home-brewed metrics can help bridge the gaps of your out-of-the-box analytics tools and how to position them to provide additional context that off-the-shelf solutions can’t provide.
We will take a look at this in the next Smart Data newsletter.
Further afield on the wide, wide Web
One good read from the wider world of data. Actually a video this time, and what a video. I considered writing no newsletter at all and just embedding that player and calling it a day.
If you’ve already seen it, it stands to be watched again. And if you haven’t, watch John Oliver’s deep dive into data brokers above. I suspect the clip has some North America limitation on it but nothing a VPN can’t solve. Otherwise, it is <chef’s kiss>.
Date for the diary: May 10
Our next programme will be the the Smart Data Initiative module at INMA World Congress of News Media on May 10.
Meet the community
For each installment of this newsletter, I am hoping to introduce one member of the community in this space. Want to be featured here? A few questions to get to know you better here. Thanks!
About this newsletter
Today’s newsletter is written by Ariane Bernard, a Paris- and New York-based consultant who focuses on publishing utilities and data products, and is the CEO of a young incubated company, Helio.cloud.
This newsletter is a public face of the INMA Smart Data Initiative. You can e-mail me at Ariane.Bernard@inma.org with thoughts, suggestions, and questions. Also, sign up to our Slack channel.