The Guardian tries to teach the machine news sense

By Sonali Verma

INMA

Toronto, Ontario, Canada

Connect      

Chris Moran, head of editorial innovation at The Guardian, and his team wanted to give AI an ambitious task: live blog summaries.

“Most people who run live blogs have the same thing: Every four hours or so, your live blogger has to stop for a 20-minute period where they have to write a summary of what has happened over the previous four hours” because there are new people landing on the live blog who need to catch up on a developing situation, Moran said.

Chris Moran, The Guardian. Photo by Alexa Cano, provided by Creative Commons.
Chris Moran, The Guardian. Photo by Alexa Cano, provided by Creative Commons.

Successful live blog summaries offered two possibilities: If the summary was good, it would be offered to the live blog editor to cast an eye over and insert in the coverage. And if the summary turned out to be reliable, The Guardian could offer it to the reader directly — a reader could land on the live blog, press a button, and generate a summary of what has already happened, in real time.

The Guardian selected 3,700 live blogs from 25 “very different tones and topics” to train the system. To evaluate its success, they asked:

  • Does it read in the style of a Guardian live blogger? 

  • Is it fundamentally accurate?

  • Are the points that are included important enough to be included? 

“We were trying to teach the machine news sense” because it wasn’t trying to summarise from an inverted-pyramid-style news article, which typically contain many different signifiers of importance. Instead, it was trying to pick out what was important from a reverse chronological list of events, “which is a radically different challenge,” Moran pointed out.

The team had to find people to undertake this evaluation but did not want it to be too time consuming. 

They found that initially, almost half the bullet points were marked as being inaccurate and about a quarter of them were considered unimportant. This improved with fine tuning to one inaccurate point and one unimportant point. 

This was a radical improvement from the starting point. But difficulties remained.

“It is super hard to spot a single error in 400 words when you are live blogging and be confident about catching it,” Moran said. “Ultimately, we came to the conclusion that providing live bloggers with this was not as efficient as their writing it themselves.

“We were trying to teach it good news judgment. Journalists have a sense of this that is rooted in real-world judgment.” 

His advice? Educating staff about how LLMs work and their probabilistic nature is super important. Journalists can get better outputs if they ask LLMs their questions in a certain way so training on prompts is important. 

Note The New York Times’ example is purely a language task, whereas The Guardian one includes an element of knowledge or judgment. That is what makes it trickier. GenAI is generally quite competent at language tasks, whereas its track record on anything requiring judgment is spottier. 

If you’d like to subscribe to my bi-weekly newsletter, INMA members can do so here.

About Sonali Verma

By continuing to browse or by clicking “ACCEPT,” you agree to the storing of cookies on your device to enhance your site experience. To learn more about how we use cookies, please see our privacy policy.
x

I ACCEPT