An “LLM for journalism” has interesting benefits

By Ariane Bernard


New York, Paris


I was recently attending the American edition of Newsgeist, the unconference Google convenes for journalists and technologists. Several of the discussions I attended touched on AI, of course, and in one of them, Aimee Rinehart, the senior product manager for AI strategy at the Associated Press, mentioned the idea of a “LLM for journalism.”

Now, colour me intrigued, of course!

If you’re even an occasional reader of this newsletter, you know there are some recurring challenges from the news industry — and, in fact, not just our industry — using generative AI to create new content, even if much of it seems very promising. 

By now you have probably heard about the issue of so-called hallucinations from large-language models, the content engine that underpin tools like ChatGPT. Hallucinations is the name given to content output that, in human terms, seems to be “made-up” knowledge from the LLM.

LLM training

The other issue is how/where the LLM was trained. What did it read for breakfast, lunch, and dinner? This debate often will veer into debates about “truth,” “bias,” or, on the copyright and ethical end of things, about whether every party who contributed to the training was fairly compensated for this work. 

So after a very short night (because Werefolf, iykyk), I caught up early the next day with Aimee to chat about her intriguing idea. And in the interest of making this as clear as I can, this is an idea that Aimee floated, not a project she is working on or one that has received the greenlight from her employer. This is shared here in the spirit of the Newsgeist conference: a chat among folks with similar goals but often different expertise and perspectives. 

“One of the biggest frustrations among the journalism community about large language models is we don’t know the training data that’s gone into it. So my thinking is, let’s build on trusted, licensed, more representative content than what we have currently,” Aimee said, pointing out that she saw this as more as a foundational tool than an area of competition for news organisations. 

The approach, in Aimee’s mind, would lead to building a licensable LLM to which organisations could then add their own archives as an element of reinforcement for the LLM. 

Deep learning

It’s worth noting what reinforcement means in the context of deep learning, which is basically various highly focused methods to train-in or train-out an undesirable outcome in the system. In a regular, non-deep learned system, you remove bad outcomes by looking for the lines of codes that cause the bad outcome. While the bad outcome may be caused by several disparate parts of your code, perhaps only with specific triggers and conditions, all the code is legible. So fixing it is only a matter of understanding the issue well enough and making the targeted, direct adjustment necessary.

Deep learning, on the other hand, is a black box by nature. So if you think of the way you get a child to behave differently on something you don’t agree with, you can’t really reach into their brain and take out what causes the problem. What you do instead is present the child with competing alternatives to their behaviour, you provide additional reasons, you repeat yourself, etc. That’s reinforcement training. 

One basic way to do reinforcement training is copying the same data a large number of times. This is really like repeating to your kid “Say, thank you” over and over until they do it on their own. If you copy the content of Wikipedia into your LLM 1,000 times, this LLM will be reinforced for Wikipedia relative to another source that’s just copied over one time. The LLM doesn’t “know” more in the sense of diversity, but because of the probabilistic nature of LLMs, it is now slightly tipped toward the content of Wikipedia. 

It will not have escaped you that the amount of reinforcement necessary for anything is largely commensurate with the underlying size of the model. Adding one pound of something to a scale weighted with 98,320 elephants will not have the same effect as that same extra pound added to a scale weighted with a walnut. 


So, Aimee’s idea to use as a base, well-licensed “LLM for journalism” and customise/reinforce it with your own licensed archive for your personal version of it has a few interesting consequences.

One of them is the ability to leverage an archive.

“There are a number of archival opportunities for an AI co-pilot, especially as newsrooms have such a high turnover, there’s no more institutional knowledge,” Aimee said. She identified things like being able to query a LLM for the abstraction of how a story may have evolved over time as one such example. But there is also a possible end-user application.  

“If I’m an audience member and I’m interested in this news story, and I go to your Web site and ask, ‘What is the mayor doing about power?’ and then I get the summary.”

Aimee also sees the ability to reinforce the base LLM with the archive as a way to have an output with a stronger tone fidelity to your own writing style. (As she reads this, INMA’s editor is dreaming of a day when she can run my copy through a robot, press the “fix this mess” button, and get something that removes all my valuable hot takes, Taylor Swift references nobody asked for, and generally agrees with INMA’s house style.)

Something that’s particularly attractive in this approach is that it also potentially gives an interesting edge to industry participants that aren’t in the scale game — namely, regional publications. The more niche the content, the trickier it is to imagine that a LLM will be able to reflect your specific content back at you. And local content, in many ways, is niche — drowned out by the Big Internet of general-interest news and content. So fine-tuning a LLM for the needs of specific local actors feels like a necessary step if there’s any hope to have an LLM be able to generate content with this vantage point. 

Local organisations, particularly these local organisations that have a long history and therefore deep archives, are uniquely valuable.

Fine tuning

Any “LLM for journalism” would still have to contend with a necessary condition for any LLM to work, which is the enormous amount of content necessary to understand language. And LLMs need this for both sides of what happens when they get used: On the input side, when an end user passes a prompt to the LLM, and on the generation side. Just grabbing the contents of the AP and your entire archive is not a fraction of a fraction of what is needed for a basic LLM. So this asks the question of what the base model is built on.

“Before anyone gets into the millions of dollars of investment necessary for the foundational model, I think a logical first step is to look around the open-source models and see which one is the best and why,” Aimee said. “I think the best version would be the one that has the most transparency on what it has ingested and has made an effort to either use Creative Commons information, or licensed, though that one is not likely with open source models.”

Now, I found Aimee’s thought experiment super interesting because I do think that fine-tuning, in general, is one of the logical approaches to looking for improvements when a use case is well defined and specific.

Whether “journalism” as a concern is well defined enough that fine tuning for our use cases is possible, or would show improvements, that’s a big question. Bloomberg, famously, trained its own LLM, BloombergGPT, but Bloomberg’s use case is specialised journalism. General news is perhaps more niche than “all the content on the Internet,” but it’s definitely closer to the middle of the bell curve of general than financial news is on that same curve. (What could be at the top of that curve, one could wonder? I think it’s cats. Cat videos are really the regression to the mean of the Internet. But this one is a question for another day.)

There is also another dimension — and this one is a drum I will beat given any opportunity to do so — which is that purely LLM-based approaches to knowledge extraction are perhaps too antithetical to how LLMs work in the first place.

Fine-tuning tips odds in a probabilistic system, but there is still not a notion of “knowledge” in the way that human brains apply certain attributes to information that effectively creates different classes for the information in our brains. LLM hallucinations from a system design perspective aren’t really an error from the LLM. They are not desirable from our perspective, but an LLM doesn’t have a notion of “fact” to distinguish it versus everything else. It makes a statistical call on what is likely the most correct language (i.e., “string of words”) to generate for a given query. 

Where we humans differ in how we treat Knowledge (capital K) in a deterministic way is this: We separately treat facts as being impervious to other manipulations. So even if I am told once a piece of information from a party I trust (or my own observation), no amount of random other information in my model will make me change that fact. But LLMs, like GPT-4, are non-deterministic (though it is a puzzling bit for generative AI specialists) so the notion of a protected “class” for facts would actually be defeated by that non-determinism if it could somehow be modeled.

I am sharing all of this because this is a dimension of LLMs that may make them ultimately unsuitable for tasks where high factual accuracy on complex information is an important part of the requirements. The idea of a LLM for journalism is one that treats our specific context and use cases as exceptionalists and proposes that we may explore an exceptionalist approach and see how far this gets us. 

In the spirit in which Aimee shared this, it is the definition of how technical systems are often improved: identifying unique use cases and trying to build a system that is specifically interested in solving for them. You can think of it as an effort to subtract in parts and add in others. But unlike other types of simpler systems where you can make some informed projection for how you think the system will likely improve the outcome, the very murky nature of generative AI introduces a lot of uncertainty in these calculations in the first place.

Which means if this were an idea that moved forward (and folks should reach out to Aimee!), accepting a certain amount of uncertainty, the chances of failure, and the fact that, well, it’s just hard, are basically part of the requirements. But then again, there are the words of JFK about choosing to go to the moon: “We choose to go to the moon in this decade and do the other things — not because they are easy, but because they are hard.”

If you’d like to subscribe to my bi-weekly newsletter, INMA members can do so here.

About Ariane Bernard

By continuing to browse or by clicking “ACCEPT,” you agree to the storing of cookies on your device to enhance your site experience. To learn more about how we use cookies, please see our privacy policy.