Journalism and generative AI differ in one big way: auditability
Smart Data Initiative Blog | 29 January 2023
You probably recently encountered some headlines related to the news that CNET was using generative AI tools to create articles — and the tension, questioning, backlash that followed (this great article from The Verge can be your guide into this story).
I haven’t dug into the angles of how much was known about this project, how long, etc. — I refer you to the article in The Verge — but the reality is that there are really three big areas of tension in this story:
One has to do with disclaimers.
One has to do with whether AI-generated stories have the potential to further reconfigure the economics of newsrooms — ie, will there be fewer journalists if more content is created by AI.
One has to do with the quality of the content being created.
The first issue — disclaimers — isn’t particularly the purview of the Smart Data Initiative. Essentially, it’s pretty analogous to, for example, how newsrooms wanted to address native advertising that may be running in close proximity with traditional (non-sponsored) editorial content and with a look and feel close, if not entirely similar to, traditional editorial content.
2. AI tools vs human journalists
The second issue — would AI tools replace journalists? — is closer to our interests here.
Wherever there is automation, there is likely a reconfiguration of what roles humans take. This is true for every manner of technical and technological advances. There are no longer folks who are paid to light up street lamps. AI-driven content generation tools ask the same kind of questions as other parts of our business where robots and humans already collaborate.
For example, we might use personalisation algorithms to create curated feeds rather than human newsroom personel to build such feeds. In the same way, AI-created content may mean human journalists do different works (where AI can’t do the job as well as we do).
3. Content quality
Now, the third issue — the quality of the content being created — this one is new and different.
One of the core value propositions of journalism is sourcing. Journalists, of course, report and record the source of the information they uncover. Even when they don’t name these sources (either deliberately or because they deem this level of detail to not materially add to the story), they are in fact able to explain how they acquired the various elements that make up the article they wrote.
A piece of journalism is, in other words, auditable. It is the possibility of an audit that, in itself, justifies the trust you may put in it. I am not asking for the journalist’s notes, but I could.
Meanwhile, the work of current generative AI is essentially not auditable. Only if I ask ChatGPT for something where I have some measure of pre-existing expertise am I able to make an estimation of the quality of ChatGPT’s output. If I asked ChatGPT about the content of Prince Harry’s book but hadn’t usefully wasted 20 minutes watching a fierce YouTube takedown, ChatGPT could tell me that the book is all about Prince Harry’s love of crochet and, well, I guess I don’t really know any better so why not.
In a very insightful and clear way, the excellent folks at Data & Society have gone over the three broad methods (first-, second- and third-party audits) that can be used to audit algorithms. But as you’ll see, these three methods each have their own blindspots.
And, in the case of large language models like GPT3 (on which ChatGPT is built), the issue is that the AI’s “understanding” of language (that is to say, its model) is purely statistical. ChatGPT understands your question because it has encountered so many other words so many times that it has a statistical understanding of that sequence of letters and proposes an answer (the text it generates) based on the matching rules.
If you’d like to subscribe to my bi-weekly newsletter, INMA members can do so here.