Homebrewed analytics may be the missing pieces for media companies
Smart Data Initiative Newsletter Blog | 05 May 2022
Hello everyone. In the past two installments of this newsletter, we looked at the difference between un-opinionated analytics and the tools that take a more specialised (or opinionated) look at analytics. We have quite a few things in audience analytics. They do certain things very well, and, like a lot of things in life, have a few gotchas that are sometimes hard to detect precisely because the tools come with their filters and world views built in.
We left the previous newsletter looking ahead to the third class of analytics tools: the homebrewed kind.
Now, I know that the label “homebrewed” may feel disparaging to some (there’s a bit of moonshine vibe to the affair). But in this case, think of homebrewed more as “tailor-made.” Homebrewed really means, “Done at home and brewed to the taste of the locals.” So, there’s lots to appreciate: We know what goes into it; and we know it’s going to be highly relevant to what we’re looking to learn.
In this post, we’re going to look at two angles of homebrewed analytics.
First, identifying the place where homebrewed analytics are attractive and what you can get from a homebrewed tool that you generally cannot get from any commercial tool.
And then we’re going to look at some discrete questions you can take on with some homebrewed analytics tools.
As always, my e-mail is open, so tell me if there are things on your mind. Future topics for this newsletter maybe? Or topics for community meetings? See you in my inbox!
Ariane
Augmenting and reframing out-of-the-box analytics products with your own tools
The art (and science) of analytics is measuring things. And the art of contextualising them lies in statistics — understanding what the numbers mean against other numbers that we understand and control.
Cassie Kozyrkov, the chief decision officer at Google (yes, that’s a title) — who has an awesome series on data science, analytics, and statistical concepts on YouTube (sidebar: I really recommend it and it’s very accessible) — says: “Analytics is what helps you ask better questions, whereas statistics is what gets you better answers.”
So look at your current crop of tools, and ask yourself: What are things that you either don’t measure at all or where the statistical understanding of what is observed is lacking.
I’ll give you a couple of ideas that may be relevant for your organisation (because these things aren’t well covered by commercial tools):
What is the value of visiting a particular piece of content in a user’s journey (does visiting article A do more than usual in moving our visitor toward their next funnel event)?
Is an article a sleeper hit? (Which translates to: “Given the amount of promotion the article has been given, does it manage to get more clicks than you’d expect”)?
What these ideas have in common (and I’m not even scratching the surface of the questions you could throw there) is that answering these questions relies on tying up data that’s going to come from at least two of your publishing systems. In other words, the reason you cannot usually get answers to these questions from commercial tools is that commercial tools live in one dimension (your Web site). And to answer the questions above, you need to cross at least two systems together.
The question, “What is the value of visiting a particular piece of content in a user’s journey?” (ie, does visiting article A do more than usual in moving our visitor toward their next funnel event?), can be answered by bringing up data from your paywall and your CRM. Because, for metered models, the paywall may not have been triggered on a particular article, yet you may find that a given article is represented above average looking back in the sessions of folks who converted over a recent period. You still need data from your regular content analytics tool because you need to have a statistical sense of the chances of any random URL to be present in the session of a converting user.
The question, “Is an article a sleeper hit? (which translates to: “Given the amount of promotion the article has been given, does it manage to get more click than you’d expect?),” can be answered by crossing the data from your social publishing tool (or your CMS if this happens directly from your CMS) and your regular content analytics. In addition, if you can pull ranking information from your CMS about any on-site promotion (like homepage play), this further refines such a number.
This can get pretty fancy.
The Telegraph in the UK created a score called STARS (Simple Telegraph Attraction and Retention Score), which tracks articles and scores them by tying up engagement metrics and commercial performance (subscriptions).
Over at The Times in London, where some similar worries about the lack of context for raw metrics led the company down the path to build its own data stack altogether, Dan Gilbert, then-director of data for News UK (now their SVP), provided background in a blog post for their homebrewed tool that helped augment that contextual understanding of performance.
One place where The Times looked to smooth out comparables was around play (placement) or length of articles. This data doesn’t come from your analytics per say (it would come from your CMS). The Times tools lean on both aspects of the quote from Cassie Kozyrkov — it’s better analytics but also better statistics because it cares deeply about meaningful referential for the analysis.
Identifying discrete opportunities for homebrewed analytics
So you may not have the vast resources of News UK to go build yourself INCA (their analytics toolsuite) or build yourself a STELA (the NYT’s own toolsuite). But a smart publisher can allow commercial tools to provide some good, everyday utility and focus their effort on money questions (literally and figuratively).
For example, La Nación in Argentina presented at our last master class series this spring how they created a score that could speak to the quality of an article to understand how it would contribute to further habituating users in their journey to a subscription.
Victoria Riese, the company’s chief data officer, said that beyond helping the newsroom understand the immediate value of articles, it also helped the newsroom identify new opportunities in areas where they could tell appetite was untapped.
Now, I love when analytics provide good insight on what happened, but I like them even more when they point the way forward.
At Groupe Les Échos-Le Parisien in Paris (disclosure: I was once the chief digital officer of Le Parisien), Violette Chomier, their chief data officer, explained what led them to focus on building a score for their subscribers — not so much focusing on content analytics, but rather, bringing content analytics back to their CRM data to identify users who needed to be sustained in building healthy habits so they would stay on as subscribers.
Violette identified one important characteristic of a good opportunity for a homebrewed metric or tool: working on something where enough data is known. The reason Violette and her team went after churn first, she explains, is that all users are known and in the CRM.
Of course, a similar score for users who may just be on their way to subscribe would be interesting, but many such users are not logged in and working with anonymous cookies is full of pitfalls. But cross-referencing CRM and content analytics were two reasonably clean inputs and a good base to build on.
This type of effort can come together in a reasonable amount of time.
For example, Gazeta Do Povo in Brazil worked on a propensity to churn score as part of the Meta-supported audience analytics accelerator that INMA ran for a good part of 2021 with publishers based in Latin America. Much like Les Échos-Le Parisien, they used their Salesforce data, with their content analytics to modelise what may signal a future churning user. The whole project, once well defined and the team assembled, took four weeks to build end-to-end.
Of course, as with any new useful insight acquired, your “problems” begin once you’ve eaten the Fruit of Knowledge. It’s likely going to be a good deal harder to put in place the various remedies and plans to try and buttress these flagging users. But Gazeta Do Povo now has a way to identify what behaviours tend to be associated with a future churning user.
These types of metrics are, of course, the most opinionated of all. The specific blend of signals that worked for La Nación to determine quality is specific to their content offering, voice, and audience. Both in terms of the design of the blend (what goes into it) but also the calibration of it (how we refine models), there is something that is going to be hard to put into a box for all to use and sell commercially.
So if you think of how to orient your data resources to identify where homebrewed analytics could be most useful to you, there are at the intersection of:
[where you care most to impact] For example, subscription; for example, for a publisher with a significant ad business, the trade-off between ads and subscriptions.
and
[where your own qualities as a publisher are the most unique or the most tied to your brand] If you’re a long-form publisher, you know that is hardly the most common type of publisher out there. What are metrics that would speak to how users relate to your flagship content? What are edge cases of behaviour that are uniquely prevalent in long-form content (likely: article reading completion may look quite different than for general news publishers).
and
[where you know multiple systems in your company have partial information on the question, but you need to see the intersection of this information]
There are a million questions one can ask when you start to look at the specifics of any one publisher. But your understanding of your own product will necessarily lead you to identify that where you are the most unique is where you’d have the most opinions for how analytics should be obtained, parsed, and contextualised.
As it happens, where you are most unique is probably your competitive advantage, too. This is the place to look for discrete questions you could not only want to do data analysis on, but imagine what it would mean if such information was readily available by heading to an always-on tool?
If you keep such questions finite enough for a first pass, I’m sure you’ll find some great opportunities for some simple homebrewed tools to shine light where commercial tools couldn’t go.
Further afield on the wide, wide Web
One good read from the wider world of data: This week, we look outside of publishing to one of our cousins: e-commerce. Allow me, if you will, to drag a little chair and tell you a little tale.
When media publishers look for adjacent industries for inspiration, they usually look at other media-related industries. Right now, often, that’s streaming. Sometimes, it’s book publishing (not too often though).
But I think our closest cousin is actually e-commerce. Now, I think this because I have deep expertise building CMSs, and I can tell you that e-commerce CMSs and publishing CMSs are incredibly similar. Which kind of tells you something about the presenting problems of both industries.
But so — and the rest of this will be a topic for another day — I’m always keen to look at developments in the tech that powers e-commerce because so much of this could apply to us.
Which is why I took a nerdy interest in the deep dive that Shopify gave into its new Machine Learning stack. This came via the newsletter TheSequence, which will give you the highlight, but do read on to the link to Shopify’s own deeper look
(Feedback time: I was hesitant to share this because I realise this is leaning to engineering and may feel off the mark. If you feel strongly about this — pro or against — please drop me a note so I can better calibrate. Feedback is a gift, as they say …)
Date for the diary
Our next programme will be the Smart Data Initiative module at INMA World Congress of News Media on May 10. The World Congress started today and continues on Tuesdays and Thursdays throughout May.
About this newsletter
Today’s newsletter is written by Ariane Bernard, a Paris- and New York-based consultant who focuses on publishing utilities and data products, and is the CEO of a young incubated company, Helio.cloud.
This newsletter is a public face of the INMA Smart Data Initiative. You can e-mail me at Ariane.Bernard@inma.org with thoughts, suggestions, and questions. Also, sign up to our Slack channel.