3 GenAI use cases and their learnings for news companies
Generative AI Initiative Newsletter Blog | 24 April 2025
What are some of the most widely respected names in journalism doing with GenAI these days?
Today’s newsletter brings you three examples. None of the use cases is particularly flashy. All of them have been undertaken carefully and thoughtfully, as you will see, and one of them is actually an experiment that didn’t work out.
They also address a common question many INMA members ask: How does one go about evaluating and integrating GenAI products into newsrooms? Read on for advice that is working for others.
Sonali
Advice from The New York Times: Listen more than you talk
Rubina Madan Fillion, associate editorial director of AI initiatives at The New York Times, has clear advice: You have to ensure humans are part of the process from early on, and you should make sure they are journalists.
She directly asked about 100 reporters and editors how they would use AI, through focus groups and demonstrations and also engaged them through a “very active” Slack channel — all part of a pilot that ran for two months.
“We listen more than we talk,” she said. “Two-thirds of the requests and frustrations that we were hearing about had to do with summarisation.”
By summarisation, she does not necessarily mean bullet-point summaries but the many tasks an editor has to do before a story can be published or promoted, such as formulating an SEO headline or deck, blurbs for newsletters, or the last few minutes of a podcast where they say, “These are the news headlines you’re missing today and here is a very brief summary of other articles.”
The NYT built a tool called Echo, which takes their articles and summarises them in any way a journalist wants. Its interface includes a list of articles they are looking at. Journalists can use preset prompts or write their own. Echo then takes the information from articles and writes it in another form.
Evaluation was tricky, Fillion said. How does one create quantitative metrics for writing, an activity that is inherently qualitative?
She also found “the outputs are often quite mediocre because they were not trained on high-quality journalism.” How could they get it closer to what the Times would actually have an editor write? And that proved to also be a difficult question to answer because even two editors sitting next to each other could have different opinions about what made a summary good. It was hard to articulate.
The team undertook an iterative process to understand quality. What were the editors’ requirements? They found many wanted to avoid jargon, acronyms and long sentences, which helped the machine understand what made a good summary.
But even asking for feedback was a delicate process. Fillion wanted to make sure it felt like editors were saving time, so she could not make any evaluation process too onerous.
The Guardian tries to teach the machine news sense
Chris Moran, head of editorial innovation at The Guardian, and his team wanted to give AI an ambitious task: live blog summaries.
“Most people who run live blogs have the same thing: Every four hours or so, your live blogger has to stop for a 20-minute period where they have to write a summary of what has happened over the previous four hours” because there are new people landing on the live blog who need to catch up on a developing situation, Moran said.
Successful live blog summaries offered two possibilities: If the summary was good, it would be offered to the live blog editor to cast an eye over and insert in the coverage. And if the summary turned out to be reliable, The Guardian could offer it to the reader directly — a reader could land on the live blog, press a button, and generate a summary of what has already happened, in real time.
The Guardian selected 3,700 live blogs from 25 “very different tones and topics” to train the system. To evaluate its success, they asked:
Does it read in the style of a Guardian live blogger?
Is it fundamentally accurate?
Are the points that are included important enough to be included?
“We were trying to teach the machine news sense” because it wasn’t trying to summarise from an inverted-pyramid-style news article, which typically contain many different signifiers of importance. Instead, it was trying to pick out what was important from a reverse chronological list of events, “which is a radically different challenge,” Moran pointed out.
The team had to find people to undertake this evaluation but did not want it to be too time consuming.
They found that initially, almost half the bullet points were marked as being inaccurate and about a quarter of them were considered unimportant. This improved with fine tuning to one inaccurate point and one unimportant point.
This was a radical improvement from the starting point. But difficulties remained.
“It is super hard to spot a single error in 400 words when you are live blogging and be confident about catching it,” Moran said. “Ultimately, we came to the conclusion that providing live bloggers with this was not as efficient as their writing it themselves.
“We were trying to teach it good news judgment. Journalists have a sense of this that is rooted in real-world judgment.”
His advice? Educating staff about how LLMs work and their probabilistic nature is super important. Journalists can get better outputs if they ask LLMs their questions in a certain way so training on prompts is important.
Note The New York Times’ example is purely a language task, whereas The Guardian one includes an element of knowledge or judgment. That is what makes it trickier. GenAI is generally quite competent at language tasks, whereas its track record on anything requiring judgment is spottier.
Wall Street Journal taxbot answers questions (while readers try to break it)
But the technology is already working better than it did several months ago. For example, The Wall Street Journal built a chat product, Lars, the AI taxbot, to answer readers’ questions on filing U.S. taxes.
The Journal’s reporters write guides to how to file taxes “and every time they write one of these articles, they get hundreds and hundreds of questions with very specific and very personal details. We obviously can’t answer every one of them. But AI can. This is a great application to get really personal, custom answers,” said Tess Jeffers, director of newsroom data and AI.
“So the motivation is to better leverage our archives and super serve our audience in this lane of coverage that we absolutely want to own.”
Lars is a RAG model, grounded in WSJ content and also publicly available content from the tax authorities. It went through two phases of evaluation: the first internal and then to assess whether the audience found it useful.
“The first thing all your readers are going to do is try to break your chatbot” to embarrass you, Jeffers said.
How easy was it to break the bot?
“This was our second chatbot in six months, and the RAG models are getting so good that we were really impressed,” she said, pointing out that it was much easier to keep Lars on topic and pull in pertinent sources. (The Journal’s previous chat product, Joannabot, was aimed at answering questions about the new iPhone last fall, but it could be tricked into making inflammatory statements or talking about movies or writing code.)
The team will soon need to deploy a new model to prevent drift because the old one is reaching the end of its life.
What does Jeffers find genuinely useful? “I’m happy we have a workflow editor,” she said, pointing out that this role turned out to be essential to operationalising GenAI.
Date for the calendar: Friday, May 23
The GenAI seminar at the INMA World Congress of News Media in New York features insightful speakers on topics that matter to us. I hope to see you there.
Worthwhile links
- GenAI and video: Thomson Reuters includes advanced transcription and translation services for clients. Scene detection and synthetic voice are next.
- GenAI and analysis: Bloomberg launches AI-powered analysis and insights for corporate statements.
- GenAI and newsletters: Fully automated local news aggregation in newsletters.
- GenAI and adoption: Employees must prove why they “cannot get what they want done using AI” before asking for more headcount and resources, Shopify says.
- GenAI and fakes: Some experts quoted in reputable news brands are not real.
- GenAI and scraping: Wikipedia says AI bots have stepped up content scraping.
- GenAI and search: Traffic to Web sites tanks after the introduction of AI overviews.
- GenAI and product design: A process that used to take six months now takes less than six weeks.
- GenAI and agents: Your company risks becoming invisible to these new decision-makers, but if you do it right, you’ll create brand new, potentially multibillion-dollar markets.
About this newsletter
Today’s newsletter is written by Sonali Verma, based in Toronto, and lead for the INMA Generative AI Initiative. Sonali will share research, case studies, and thought leadership on the topic of generative AI and how it relates to all areas of news media.
This newsletter is a public face of the Generative AI Initiative by INMA, outlined here. E-mail Sonali at sonali.verma@inma.org or connect with her on INMA’s Slack channel with thoughts, suggestions, and questions.