The great content shortage and AI model collapse
Generative AI Initiative Newsletter Blog | 09 May 2024
The past few weeks have brought a flurry of news that gave new meaning to the phrase “content strategy.” Generative AI platforms are desperately trying to get their hands on as much quality content as possible to train and improve their models rather than risk losing the race to their competitors.
First, The Wall Street Journal wrote about how the supply of publicly available high-quality data on the Internet will be overtaken by demand for content by 2028.
Then, Reuters wrote about “a bustling data market that’s arising in the rush to dominate generative AI technology,” where tech companies are quietly paying for content locked behind paywalls and login screens, giving rise to a hidden trade in everything from chat logs to personal photos from faded social media apps. The market for this data could be worth US$30 billion within a decade.
Then, The New York Times wrote about how Meta even considered buying book publisher Simon & Schuster and how Google and OpenAI transcribed YouTube videos to obtain more content.
Then, Bloomberg wrote about Adobe being willing to pay US$3 per minute of video for AI training.
The race for domination is on, particularly as generative AI platforms face rising costs. According to Dario Amodei, who built ChatGPT-3 and then left to start Anthropic, costs could escalate exponentially. (And that is without considering the upward pressure on demand for electricity.)
What does this mean for our industry, as news companies of credible, high-quality content?
We are sitting on a gold mine. We have something that deep-pocketed technology companies very desperately need — and, if you look at the lawsuits being filed, very eagerly use. (You will recall that some news brands, such as Axel Springer, the Associated Press, Prisa, and Le Monde have already signed licensing agreements with OpenAI. The Financial Times just joined them, as did Dotdash Meredith.)
If we are thinking of deals rather than lawsuits, we should negotiate hard when it comes to the rights to use our content. We should be talking to multiple GenAI platforms or marketplaces, and we should take our time to understand how they will work.
We should not sign any agreements that limit our ability to sell to more than one generative AI provider unless the price is right. Microsoft and OpenAI have both said they need authoritative, trustworthy, reliable data, and observers have pointed out that there are trillions of dollars at stake.
There obviously is a question mark over how sustainable and stable this revenue is. OpenAI appears to have a rather vague business plan (and here’s a look at what it was offering news companies a few months ago).
What happens when the Internet runs out of content that models can be trained on? Well, GenAI platforms could try generating their own content and then training their models on that. “Feeding a model text that is itself generated by AI is considered the computer-science version of inbreeding. Such a model tends to produce nonsense, which some researchers call model collapse,” The Wall Street Journal pointed out.
This suggests there is a limit to GenAI’s capabilities and that new content has to be continuously churned out by humans.
It is easy to also envision an arguably dystopian future because of this. Will the future owners of media companies be GenAI platforms, which can use them to produce content as needed for training? Or, at a time when news executives spend every waking hour thinking about how to make money, will GenAI providers find a new way to harness our verbal powers?
I don’t consider this line of thought to be particularly constructive at this point, but tell me if you see ideas in there that we can channel to build a thriving news industry.
Every news company I have spoken to over the past four months is using GenAI tools and is counting on them to deliver efficiencies or better user experiences. But at this moment in time, it looks like GenAI needs news companies even more than news companies need GenAI.
GenAI tools: We get a lot of questions on them and want to write more about them. Could you please help us by filling in a 60-second survey?
GenAI that saves €500,000 annually
How often do you develop a tool within 10 weeks that saves half a million euros annually?
That is the story of Wortwandler, an editing tool developed by OVB Media in Germany and a finalist in the INMA Global Media Awards.
OVB noticed that about 60% of the locally produced content in its local newspapers comprises events, such as hyperlocal news about sports teams. This is what its readers value. The content is mostly generated by about 300 local freelancers, and it was a huge task to edit all this content for grammar, style, and ensure it was the correct length.
“It was the biggest pain point when we looked at our processes. It was also the biggest opportunity,” said Managing Director Florian Schiller, who is responsible for all digital business areas at OVB.
To solve the problem, OVB outsourced this editing work to a team of 15 editors so its own staff could focus on quality journalism.
“We thought we had solved the issue, but it was only half the truth because it created new processes back and forth between the editing team and the print desk. It could be that an article went back and forth two or three times,” Schiller said.
OVB decided to try generative AI technology as it built a tool called Wortwandler. It defined 12 different use cases for 12 different types of articles that freelancers filed, covering 99% of the content it wanted to edit. It then plugged these into Wortwandler and let it edit the text.
After about two weeks of fine-tuning under the hood, OVB was satisfied with the quality of the editing. The results were consistently good, Schiller said. Its in-house editors were also pleased with the product, which saved them the back and forth with the external team and reliably produced edited copy within 10 minutes.
Wortwandler started handling copy for all local editions within 10 weeks. OVB dissolved the external editing team, resulting in savings of €500,000 annually.
Schiller also loves another feature of GenAI, which was set up to handle a wide range of use cases at OVB. He calls it The Diplomat.
“On our reach news portals, where there is lots of traffic and engagement and comments, people attack each other all the time and you have to calm them down. From time to time, people start attacking you, the newsroom.
“So the audience team said, this is really exhausting. Can you do something?”
All editors need to do is copy the insult from the commenter into the The Diplomat, and “it will write the most polite, nice, charming answer. It is really astonishing how it calms down the audience. The response is also direct and takes up the point that they are making,” Schiller said.
The Diplomat is now used for dealing with critical e-mails to the editor of the print newsroom, as well as certain special cases of customer service, he said: “It is now the most popular, beloved feature within the company.”
You can learn more about this and other clever uses of GenAI at our master class, where Schiller will be speaking.
Dates for the calendar
Thursday, May 16, when we kick off the dynamite GenAI Master Class. Come learn about innovative use cases of GenAI being used all over the world and hear from some of the biggest names in the business.
Worthwhile links
- GenAI for marketing: It is as good as humans at getting you to change your mind.
- GenAI for advertising: Axel Springer expands ad tech, content partnership with Microsoft.
- Accuracy in GenAI: Fine-tuning LLMs to reduce error rates.
- GenAI: What is it good for? Some use cases are better than others.
- GenAI for personalisation: Ebay will curate outfits for you.
- GenAI for fact checking visuals, misinformation, and its limitations.
- GenAI and trust: The U.S. military halts GenAI adoption over trust issues.
- GenAI and licensing: The Financial Times signs a deal with OpenAI.
- GenAI and lawsuits: Eight Alden newspapers sue OpenAI over copyright infringement.
- GenAI for Elon Musk: He plans to use AI to combine breaking news and social commentary around big stories, present the compilation live, and allow the reader to go deeper via chat.
- GenAI inspiration: 101 use cases, compiled by Google.
- 10 key takeaways from Stanford’s 502-page AI Index Report 2024.
- Don’t do this, folks: Meta’s chatbot in a parents’ group said it had a gifted, disabled child.
- Don’t do this II: This bot pretended to be a Catholic priest and said it was OK to baptise a baby in Gatorade.
Diversion
I read this article and learned a lot about how psychics con us. The writer’s point that the illusion is similar when interacting with chatbots was interesting.
About this newsletter
Today’s newsletter is written by Sonali Verma, based in Toronto, and lead for the INMA Generative AI Initiative. Sonali will share research, case studies, and thought leadership on the topic of generative AI and how it relates to all areas of news media.
This newsletter is a public face of the Generative AI Initiative by INMA, outlined here. E-mail Sonali at sonali.verma@inma.org or connect with her on INMA’s Slack channel with thoughts, suggestions, and questions.