How newsrooms can leverage generative AI — and common pitfalls to avoid
Newsroom Transformation Initiative Blog | 28 June 2023
Generative AI is one of the hottest topics in every industry right now as companies look at how to leverage its capabilities and turn it into a competitive advantage. During Wednesday’s Webinar, INMA members heard from Nicholas Diakopoulos, director of the Computational Journalism Lab at Northwestern University in Evanston, Illinois, who offered insights into the challenges, opportunities, and limitations of using AI.
During AI in the newsroom: A practical guide from Northwestern University, Diakopoulos explained some of the terminology associated with the technology, as well as illustrating different ways newsrooms can interact with AI models. He also emphasised the importance of understanding what they can and cannot do.
To get everyone on the same page, Diakopoulos clarified that “generative AI” refers to technology that can create new content. That covers a wide range of possibilities, from text, images, audio, and video to 3D models. He told INMA members that the output is based on what the technology has been trained on: “These models are just statistical models,” he said.
“We talk about AI, but these are fundamentally just statistical text generators. They learned from lots and lots of data, including books and Wikipedia and online news, and they’re able to generate one next word at a time based on all of the previous words that they’ve seen.”
Understanding generative AI for newsrooms
Diakopoulos gave an overview of the different types of generative AI model providers already available, noting that several companies are starting to produce their own generative AI systems. Open source options are available, but he said they “aren’t quite at the level of quality as the large commercial offerings in the space right now.” Regardless of which model is used, the process begins with writing a prompt, which is the text you input into the model to explain what you want it to do. He demonstrated how simple it is to write a prompt but also showed that the AI could easily provide text with incorrect information.
“The big issue here that I want to illustrate is this idea of information fabrication — also sometimes referred to as hallucination,” Diakopoulos said. “It’s important to keep in mind that this model has only been trained on information up to September 2021, so it doesn’t actually know anything since that date. So it’s just … pulling words out of a hat in a sort of statistical generation process.”
That’s one reason that getting quality results depends on writing good prompts. A better way to use the current technology could be to generate headline options for a staff-written news article. In such a case, the AI won’t hallucinate but will respond to what has been given to it.
“You’re asking it to manipulate the language in that article, summarise it, and come up with other language that looks like a headline for that article,” he said. “So this would be an example of a safer use case for generative AI, where you’re not asking for knowledge, but you’re using it as a language machine.”
The Art of the Prompt
Writing prompts is a skill in itself, as it affects the quality of the generated content.
“It’s the main way of controlling the AI. You can provide your intent, so a keyword like ‘summarise’ or ‘extract’ or ‘write’ or ‘ideate.’ Then you can also provide context for your text,” Diakopoulos said. “In my experience, coming up with good prompts is time-consuming. It can be hard to come up with prompts that are good in general, and it can be variably time-consuming depending on the task complexity.”
AI capabilities shouldn’t be limited to practices such as generating text or images, he noted. Generative AI can wear many hats and perform a range of tasks.
“Some capabilities are more analytic, things like classifying data, extracting data, scoring documents, that kind of thing. And other capabilities are more synthetic, more generative like rewriting, summarising, and translating.”
He encouraged newsrooms to think about various tasks that could be handled by AI and said “back office” tasks like content discovery, document analysis, tips, and SEO generation could be great starting points because the output won’t be consumed directly by audiences: “The back office is the safer spot to start out in thinking about how to use generative AI,” he said.
For more forward-facing uses, in which audiences see the written product, Diakopoulos emphasised the importance of having a human touch: “You really need to have a person in the publishing loop to check the content before publication. You just can’t risk that there might be a hallucination, that there might be an inaccuracy included in forward-facing material.”
In addition to checking for accuracy, humans should be checking for plagiarism, as well as risks like copyright violations, defamation, libel, or the use of private/sensitive information that was unintentionally shared.
Responsible use of AI
A growing number of newsrooms are creating guidelines for how to use generative AI safely and responsibly. That includes being transparent with audiences about how and when it is being used and ensuring journalists and editors are trained on the tools and know the capabilities and limitations.
“Many newsrooms are signalling that they’re responsible for the content that they publish. They’re accountable for that content and having a person that’s responsible for the output of generative AI,” he said. “And I think that’s actually a very important and good thing to include in guidelines that, at the end of the day, it’s people that need to take responsibility for the content.”
Some of the parting lessons he shared with INMA members that he said are important were:
- Expect to iterate many times, and develop criteria to know if you’re moving in the right direction with your prompts: “You’re not going to get it right the first time,” he said. “You need to think about the criteria for whether or not the result that you’re getting is good or not.”
- Recognise that you’ll need training data or data sets to test prompts. This is important not only in the prompt development process but when you have a good prompt. “You probably want to test that prompt on a whole new set of data that you haven’t tried before just to see if your prompt is still able to perform well.”
- Consider breaking down prompts into chunks the model can handle.
- Think about parameters for different use cases.
- Consider creating larger workflows for editing and checking. This goes beyond checking for plagiarism in stories but also looking at things like summaries that are generated by AI.