3 themes emerge as news media leaders share their data journey
Smart Data Initiative Newsletter Blog | 01 June 2023
Hi everyone.
Last week was the INMA World News Congress of News Media in New York. Super fun to catch up with many of you and to sit on so many excellent presentations from no less excellent speakers. To be clear, the speakers of my own data workshop were the best ones, of course. #proudmom.
It’s a bit like a marathon though, in that you sort of have to suspend most of your everyday life in order to do all the things. So this is all to say I’m ready to talk about all things data with you again, so drop by my inbox for a chat.
Ariane
Swift data takeaways
I thought I’d look back on some of the takeaways from our Smart Data Workshop at INMA’s World Congress in New York in the last week of may. Also, I offered INMA to write up my extremely detailed review of the Taylor Swift concert at MetLife Stadium, but somehow, INMA’s editor turned me down. Remember that you were shortchanged.
At any rate, the data workshop did offer plenty of highlights.
1. As organisations mature in the complexity of their data organisation or their data infrastructure, making strategic decisions for these things requires increased discipline for how we make investment decisions and how we use our internal resources.
To explain, I’ll take what two of our speakers, Evan Sandhaus at The Atlantic and June Dershewitz of the Digital Analytics Association, spoke about.
Evan Sandhaus, vice president of engineering at The Atlantic (USA), told us about his team’s journey building their customer revenue data stack.
Aggregating your data in your warehouse is a long-term project but necessary for building a “single customer view.” Your approach to consolidation may change over time as new tools become available.
Understanding there are switching costs in data when we change vendors. This has to play a part in the calculation of whether to change vendors. It has to be something the engineering side underscores to the stakeholders who may be driving that vendor change.
Buy vs. build. The task is large and it’s worth challenging that you should build a custom data platform. You’ll be running after the issue for a long time, too. Sometimes, the maturity of an organisation lies in recognising circumstances that once led to a strong strategic decision in one direction have changed enough to require reexamining the strategy in the first place. In the case of The Atlantic, it was building their own data platform for their customer data. As we rebalance our approach in the face of new information and new options in the marketplace, the updated strategy may look very different — possibly even like a contradiction of decisions of the past. Maturity in the organisation is appreciating that “correct” is a moment in time.
June Dershewitz, board member of Digital Analytics Association (USA), explained her framework to help the analytics team better prove and explain the actual value they deliver to the business.
Analytics teams can make a stronger case for their usefulness to the business by demonstrating how they deliver on three angles: adoption and usage, satisfaction, and actionability.
Proving value for each angle isn’t straightforward, but in combination with each other, they paint a clearer picture of value being delivered, how much, and for whom. This can be done by looking at usage of tools, user surveys, and examining what business outcomes map to recommendations made by the team.
2. Tackling large, valuable problems with data science is a long game. Even the best resourced publishers take time and need multiple rounds of iteration to land on algorithmic methods to solve some of their largest, most complex problems.
This is another way to say that if you’re the CEO of a publisher who is making a commitment to data and data science, you have to understand that this is a long play. There will be a high investment before there is a return.
It sounds obvious, but if it were easy, everyone would be doing it. Not only this, but the more complex the problem, the more extensive the testing and iteration. The solution design may be comparatively less important, but the testing phase may have to account for many edge cases or effects that you can only see in your tests over time.
Rohit Supekar, senior data scientist at The New York Times (USA), talked about his organisation’s work in building an algorithm to personalise the amount of free articles a user may receive from the paywall, looking to maximise both the amount of free engagement and paid conversions.
In a metered paywall, data can help model where the trade-off point between engagement and conversion may lie for specific cohorts of users.
The model allows adjustments overall to serve business goals but also differentiation among users who aren’t at the same place in their journey. It identifies the causal effect of free allowance.
Because there is no perfect way to play back the data, this work requires a large user base to diminish the effects of randomness in the sample of each user group. This phase alone is complex and takes a long time to analyse.
Christian Leschinski, data science lead at Axel Springer National Media and Tech (Germany), spoke about his organisation’s work on an algorithm to clarify actual good performance versus performance of an article that got boosted by external promotion.
De-ambiguating what is actually performing versus what is handicapped by promotion is a valuable part of identifying candidates for automated promotion (personalisation included).
This is another way to describe what a baseline is: a number that accounts for elements of variance in the environment so other metrics can be fairly compared between each other.
Fine-tuning a model to do this uncovers complexities because numbers can be very large or very small, which has a tendency to skew the model, or, on the other hand, be very intense in computing power.
3. Proceeding with automation and generative AI in your content creation is a great opportunity — if you proceed with thoughtfulness and caution
Two speakers, from Ippen Digital in Germany and Gannett in the U.S., emphasised the need for feedback loops from their newsrooms into the technical buildout, transparency to readers, and tooling to make it easy (and sane!) for newsroom teams to manage the automated content.
Alessandro Alviani, natural language processing product lead at Ippen Digital, shared the work his team has done bringing assistance in their CMS from generative AI tools to support their newsrooms in various discrete tasks.
Generative AI tools can deliver value while respectfully staying within certain bounds so values, editorial guidelines, and transparency can be maintained.
To keep the AI from hallucinating (too much), it is useful to provide it with system prompts that limit its scope to narrow bounds. Even so, finding ways to help humans check facts and figures remains necessary.
Supporting this human-in-the-loop workflow can be done in the CMS so the workflow is supported, repeatable — and ends with specific markers for users explaining the piece was computer-generated and checked by a human.
Jessica Davis, senior director for news automation and AI product at Gannett (USA), spoke about the work her team has been doing bringing automation to weather stories, looking to connect them to their overall coverage about the climate crisis.
Opportunities for automation are at the intersection of high-volume content with a good feed of quality data … on topics where there is user interest, like the weather.
But even with a good opportunity to innovate, the cultural changes have to be taken into account in rolling out the new capabilities.
Starting small, and gradually adding complexity (from automation to generative AI), help reduce doubt and minimise risk.
A favourite quote from World News Congress, outside of the data workshop
There were several data-related or data-adjacent talks on the big stage at World Congress, but I’m sort of running out of steam. So I’ll turn you over to the blog coverage, and in fact, AI helped us create all the transcripts and summaries for attendees. So, while lacking in relentless Swiftie propaganda and therefore being less useful than my own commentary here, I’ll turn you over to this fantastic resource for the rest of the conference
But I’ll leave you with this fun quote about personalisation experiments from Thomas Schultz-Homberg, CEO of KStA Media (Germany):
“We did a test where we put the editor’s picks in front of 50% of the users and the personalised headlines in front of 50%. The results were that the personalised results received 80% more click-through rate and 13% more fully read articles.
“The conversation was over after that.”
Further afield on the wide, wide Web
A few good reads from the wider world of data this week:
• The editor of the Financial Times, Roula Khalaf, wrote a letter about her current perspective on using generative AI in her own organisation (paywall link). She said she wants her organisation to be able to leverage these tools where they can be useful: “It has the potential to increase productivity and liberate reporters and editors’ time to focus on generating and reporting original content.”
On the other hand, she emphasises the need for transparency where AI plays a part in content creation, as well as the oversight of humans in the use of these tools, which is necessary to maintain quality and editorial standards.
My take here: I have read {countless} such memos and charters in the past few weeks and months (wrote a report on generative AI, if you haven’t downloaded it already). I’ll bow to this memo: It’s clear, to the point, and summarises the view where I have seen most responsible publishers coalescing. It might just be the only one you need to read!
• In somewhat related news, The Washington Post announced it formed an AI task force and an AI Hub to federate its various AI-related projects. I’ll end this bullet by noting it may be the last time I share news about these AI task-force being formed because it’s becoming more and more common at this point. But it will be appreciated in these parts all the same!
• I don’t usually link to press releases (because press releases), but this seemed harmless enough. The community app Nextdoor announced it is launching a new generative AI feature to encourage more civil discourse in the comment section. Nextdoor has post-moderated comments, handled by group admins, so the level of civility will, shall we say, vary.
Various sociology studies have been done on comment sections over the years, and angles like anonymity and language register have all been remarked to play a big part in how online communities develop. Until generative AI, the approach to handling the issue of language could only be using the binary approach of moderation: comment allowed or comment banned. It’s also not realistic to use moderators to provide long, thoughtful feedback on why a comment is banned or to make it better.
Auto-moderation tools can sometimes surface the cause for banning a comment if they are built to have groups of rules for certain triggers. But generative AI represents a potential third way with comments, where the machine’s proposal of alternative language not only attempts to “save” a comment from being moderated by bringing it to the standards accepted by the community but also by gently teaching the user what these standards are.
Obviously, a user who writes a comment laced with bad language is aware that they are doing so, but if comments get moderated out, it does them no good since no one reads their (strongly worded) view. So it will be interesting to see how this type of usage for generative AI may affect these community spaces.
• In its typically efficient style, Axios takes a look at the future of media in the age of AI. It feels a bit recursive to summarise Axios. But if should try to out-Axios Axios, then let’s say the reason you’ll find this interesting is that it makes several good points that converge to this idea: With generative AI powering a likely to be enormous amount of bad content, it actually creates a new opportunity for media brands that focus on quality and trust. In data, we have a name for these two tendencies by the way: the signal and the noise.
About this newsletter
Today’s newsletter is written by Ariane Bernard, a Paris- and New York-based consultant who focuses on publishing utilities and data products, and is the CEO of a young incubated company, Helio.cloud.
This newsletter is part of the INMA Smart Data Initiative. You can e-mail me at Ariane.Bernard@inma.org with thoughts, suggestions, and questions. Also, sign up to our Slack channel.