Tried-and-true uses of AI in media shouldn’t be forgotten amid ChatGPT obsession

By Ariane Bernard

INMA

New York City, Paris

Connect      

Hi everyone.

I’m back from lovely Copenhagen and back in sticky New York City. And beside the upsides of a northern European city with beautiful weather, limited car traffic, and tasty salmon dishes, there was also a very interesting conference, so I’m coming back with some useful and interesting stuff for you all. 

Before I get into this, just a reminder that the INMA World Congress of News Media is next week in New York. The data programme got upgraded to a bigger room so if you were on the waitlist, excellent news: I’ll see you there. 

Until then, all my best,

Ariane

In Copenhagen, a reminder of where machine learning is reliably delivering value

I made a little trip to Copenhagen last week, where the Nordic AI Alliance was holding its first conference. They are a young community-based association of publisher and publisher-adjacent folks who work with and on AI in northern European countries. And though I am not Scandinavian, nor do I, in fact, work for a publisher, they generously allowed me to attend.

I’ll dip into some of the cases in future newsletters (we will probably also host some of the folks I met there, stay tuned for this), but I wanted to bring you a bit of a synthesis of what went on.

While not making headlines like ChatGPT, recommenders are an important use of AI by media companies.
While not making headlines like ChatGPT, recommenders are an important use of AI by media companies.

The first is that while generative AI is all the headlines (including in this newsletter, if I’m being honest), a lot of the muscle for technologists working with AI and machine learning in news publishing is still directed at tried-and-true value plays with AI, like recommenders. 

We are all variously affected by the hype cycle that says all the value of Artificial Intelligence is buried in ChatGPT, but the kind of upsides you can get from a well-built personalisation engine just cannot be argued with.

Ekstra Bladet, for example, working on its recommendation project they call PIN, worked on several different use cases for recommendations, finding they could increase the consumption of non-premium articles read by 2.7 times while still respecting the DNA of the brand and a strong editorially led proposition. Even for paid news, their experiments could increase the number of articles read 2.4 to 3.1 times. 

Other topics from the tried-and-true pile included using machine learning entities and metadata. The teams at SVT, the Swedish national broadcaster, have set about leveraging their vast video archive looking to augment discoverability in clips that lack most metadata. What they do is try to leverage lower-thirds, credits, and any manner of captions to add any kind of new metadata to clips that don’t have them. 

If anything, this reminds us that we can find a lot of value to making incremental improvements to very large problems. The decades-deep archive of SVT has many clips that aren’t leverageable at all because no data at all is associated to these clips.

But if machine learning can help augment the data of at least a fraction of these, then the overall problem has been made smaller. Because our technical capabilities continue to advance, there’s likely another round of technical improvements in the next few months or years, which will give SVT another shot at rescuing more clips from the bottom of the archive. 

Now, this type of work isn’t filled with buzzwords, and I worry that folks with budget planning responsibilities who may — with the best of intentions — want to direct capital at the possible riches of generative AI would do so at the expense of other programmes. This doesn’t mean you shouldn’t explore the value you can unlock with large language models supplementing or scaling up the work of your teams, but we shouldn’t get so high from the fumes of the hype cycle that we forget where we are also certain to find value for our data teams.

Previewing INMA World Congress of News Media for the Smart Data Initiative

Because I’d like to convince you to join our excellent programme in New York next week, I thought I’d share some of the thinking behind the programming.

Now, most of this programming came together early in the year, but some folks had been “on my list” for quite some time. 

Data and its applications in news media are key parts of this year's INMA World Congress of News Media.
Data and its applications in news media are key parts of this year's INMA World Congress of News Media.

There were three different areas I wanted for us to tackle:

  • Data and its application in supporting performance.
  • Data and its application in how we manage our business.
  • And generative AI. 

The programme balances somewhat equally among these three topics. 

Data and its application in supporting performance

The bleeding age of the conversation in data may be generative AI, but I don’t want us to lose track of the fact that we’ve been using data for years in increasingly useful and concrete ways to support revenue and performance. This, until proven otherwise, is where we have value in data today.

  • In this respect, using data in how we understand and improve paywall systems is definitely one of the top-of-mind applications. But, of course, it’s not so simple. So having Rohit Supekar, a senior data scientist at The New York Times, to share with us some of the work he and the team have done to make their paywall more efficient was just perfect for this programme.

  • Also under performance, I wanted us to look at how we improve our understanding of performance. In analytics, there are many types of ways our numbers get muddied up by the rather complex user journeys in play in our properties. Christian Leschinski, the data science lead at Axel Springer National Media and Tech in Germany, will share with us how his team clarifies some of these issues to get to a purer “truth” of what performance looks like.

Data and its application in how we manage our business

The judicious use of data in the organisation says something about the culture of these organisations and our ability to understand and analyse the picture that data paints for us. But even when a healthy culture exists around data, we still have to make data available to the various teams that want to use it. This one is an engineering challenge, and we’ll talk about this too.

  • Somewhere, there are still folks who look at the data team and say, “Well, they don’t build the product, they don’t make money, and they don’t make journalism” — and are not so sure that the data team is worth the expense. Hopefully, there are fewer such folks than there used to be, but you can also argue that it’s up to analytics experts to make numbers useful and the value of having these numbers inarguable. June Dershewitz, a top data strategist and board member at the Digital Analytics Association, will focus on the essential question of connecting analytics with improved outcomes. 

  • As the need and use of data grows around the whole organisation, so do the feeds of data that pour into our data infrastructure. But to make this data truly useful, and fully leverage it, means being able to surface it to many different parts of the business. It means that data about content performance is available in relation to a user’s lifecycle or page performance is available to serve segmented advertising. The business we manage is very dependent on our engineering infrastructure and having a vision for how to build it — and being able to manage it and afford it over time — is a long-term effort and commitment. Evan Sandhaus, the VP of engineering at The Atlantic in the U.S., will present on the work to bring a unified view of the user’s lifecycle, so every team in the organisation is able to benefit from a multifaceted understanding of the user.

Applications of generative AI

There’s no shortage of headlines (or, in fact, a whole report I wrote for you,), but I wanted us to focus on cases where generative AI was actually being implemented in our organisations. I also wanted to specifically focus on cases that took generative AI from small experiments to being scalable for large parts of the organisation.

There are many projects of the former kind in flight — understandably, we should all be reasonable in how we roll out this young technology — and fewer of the latter kind. But we will have two speakers who can tell us more about larger generative AI projects in their organisations.

  • Covering small communities is a challenge in finding scale in what is inherently small-scale. At Gannett, whose large regional network of local organisations covers communities coast-to-coast, the team is exploring how to give weather stories their due while leveraging different kinds of automation. Jessica Davis, the senior director for News Automation and AI Product, will share with us some of the insights from building these new capabilities.

  • When we talk about generative AI, we quickly head to very abstract issues around accuracy or usefulness. But the real scale of generative AI is also in how we may integrate it in  our set of daily tools. So I was very curious to hear about cases that focused on the toolbox angle. Alessandro Alvani, the product lead for natural language processing at Ippen in Germany, will tell us about their work to bring generative AI tools to the fingertips of their newsroom, in their CMS. 

There’s still time to catch that plane to New York. You’ll have access to the presentations/material online shortly after the conference, so you could also join virtually that way. Though, well, that would be less fun. And so for folks joining us, I can’t wait to see you soon!

Further afield on the wide, wide Web

A few good reads from the wider world of data this week: 

• This one came a couple weeks ago via Ioana Sträter, INMA’s director of events: Reid Hoffman, the former co-founder of LinkedIn and its former CEO, who wrote an e-book about GPT-4, with GPT-4. Now, I was a bit weary of the premise — which looked like it had a high potential for self-indulgence in the first place — but my prejudice wasn’t warranted. This is actually fun and easy to read! 

In particular, I enjoyed the chapters on education. I’m a geriatric Millennial, and I always felt there was a dissonance to how we were taught math (for example) when computing had already advanced enough to really recast what kind of math skills we should have been taught over, simply, algebra or geometry. And there’s a similar question for today’s youth: We can see a future in which learning may change again, so what does it mean to equip a young person with education? 

Having said this, the chapter on journalism is, ehhhh, not all that and a bag of chips. Nothing wrong with it, but it’s not a highlight. So, I’d say read this for the other chapters.

• Axios looked at recent academic research into the accuracy of AI-powered search tools: Bing Chat, Neeva AI, perplexity.ai, and YouChat. “These tools generally delivered fluent and useful answers — but roughly half contained ‘unsupported statements or inaccurate citations.’ Of the citations that were provided, an average of 1 in 4 did not support their associated sentence.” This isn’t terribly good news, except maybe to say that human intelligence isn’t totally outdated yet.  

• I’ve been trying to keep up with the news re: intellectual property and AI (or licensing and AI because these are connected). In the context of the labour actions by the Writers Guild of America, which represents Hollywood writers, The New York Times reported on how these creatives are trying to stay ahead of possible intellectual property challenges brought about by generative AI. (Gift link). 

Selected from this article: SAG-AFTRA, the actors’ union, says more of its members are flagging contracts for individual jobs in which studios appear to claim the right to use their voices to generate new performances. A recent Netflix contract sought to grant the company free use of a simulation of an actor’s voice by all technologies and processes now known or hereafter developed, throughout the universe and in perpetuity.’

Uh-oh.

• In Quanta Magazine, a very interesting look at how and why large language models don’t seem to do well with concepts of negation. Because a significant amount of their learning filtered out the word “not” as a “zero word,” the languages seem to stumble on prompts that require an understanding of negation. 

But I mean, I get it, ChatGPT. One of the complex nuances of my native French when I have to explain this to English speakers is how “pas mal” in French can express a whole range of positive emotions — far beyond what “not bad” in English actually means. And it’s entirely contextual, so, you know, good luck with that one robots.

About this newsletter

Today’s newsletter is written by Ariane Bernard, a Paris- and New York-based consultant who focuses on publishing utilities and data products, and is the CEO of a young incubated company, Helio.cloud

This newsletter is part of the INMA Smart Data Initiative. You can e-mail me at Ariane.Bernard@inma.org with thoughts, suggestions, and questions. Also, sign up to our Slack channel.

About Ariane Bernard

By continuing to browse or by clicking “ACCEPT,” you agree to the storing of cookies on your device to enhance your site experience. To learn more about how we use cookies, please see our privacy policy.
x

I ACCEPT