News companies are sitting on a “gold mine” during great content shortage, AI model collapse

By Sonali Verma

INMA

Toronto, Ontario, Canada

Connect      

The past few weeks have brought a flurry of news that gave new meaning to the phrase “content strategy.” Generative AI platforms are desperately trying to get their hands on as much quality content as possible to train and improve their models rather than risk losing the race to their competitors.

First, The Wall Street Journal wrote about how the supply of publicly available high-quality data on the Internet will be overtaken by demand for content by 2028.

Then, Reuters wrote about “a bustling data market that’s arising in the rush to dominate generative AI technology,” where tech companies are quietly paying for content locked behind paywalls and login screens, giving rise to a hidden trade in everything from chat logs to personal photos from faded social media apps. The market for this data could be worth US$30 billion within a decade. 

Headlines on AI companies and their need for content as training data.
Headlines on AI companies and their need for content as training data.

Then, The New York Times wrote about how Meta even considered buying book publisher Simon & Schuster and how Google and OpenAI transcribed YouTube videos to obtain more content. 

Then, Bloomberg wrote about Adobe being willing to pay US$3 per minute of video for AI training.

The race for domination is on, particularly as generative AI platforms face rising costs. According to Dario Amodei, who built ChatGPT-3 and then left to start Anthropic, costs could escalate exponentially. (And that is without considering the upward pressure on demand for electricity.)

What does this mean for our industry, as news companies of credible, high-quality content? 

We are sitting on a gold mine. We have something that deep-pocketed technology companies very desperately need — and, if you look at the lawsuits being filed, very eagerly use. (You will recall that some news brands, such as Axel Springer, the Associated PressPrisa, and Le Monde have already signed licensing agreements with OpenAI. The Financial Times just joined them, as did Dotdash Meredith.)

If we are thinking of deals rather than lawsuits, we should negotiate hard when it comes to the rights to use our content. We should be talking to multiple GenAI platforms or marketplaces, and we should take our time to understand how they will work.

We should not sign any agreements that limit our ability to sell to more than one generative AI provider unless the price is right. Microsoft and OpenAI have both said they need authoritative, trustworthy, reliable data, and observers have pointed out that there are trillions of dollars at stake. 

There obviously is a question mark over how sustainable and stable this revenue is. OpenAI appears to have a rather vague business plan (and here’s a look at what it was offering news companies a few months ago).

What happens when the Internet runs out of content that models can be trained on? Well, GenAI platforms could try generating their own content and then training their models on that. “Feeding a model text that is itself generated by AI is considered the computer-science version of inbreeding. Such a model tends to produce nonsense, which some researchers call model collapse,” The Wall Street Journal pointed out.

This suggests there is a limit to GenAI’s capabilities and that new content has to be continuously churned out by humans. 

It is easy to also envision an arguably dystopian future because of this. Will the future owners of media companies be GenAI platforms, which can use them to produce content as needed for training? Or, at a time when news executives spend every waking hour thinking about how to make money, will GenAI providers find a new way to harness our verbal powers?

I don’t consider this line of thought to be particularly constructive at this point, but tell me if you see ideas in there that we can channel to build a thriving news industry.

Every news company I have spoken to over the past four months is using GenAI tools and is counting on them to deliver efficiencies or better user experiences. But at this moment in time, it looks like GenAI needs news companies even more than news companies need GenAI.

GenAI tools: We get a lot of questions on them and want to write more about them. Could you please help us by filling in a 60-second survey?

If you’d like to subscribe to my bi-weekly newsletter, INMA members can do so here.

About Sonali Verma

By continuing to browse or by clicking “ACCEPT,” you agree to the storing of cookies on your device to enhance your site experience. To learn more about how we use cookies, please see our privacy policy.
x

I ACCEPT