Generative AI and human journalism differ in one big way: auditability
Smart Data Initiative Newsletter Blog | 26 January 2023
Hi everybody.
This week, I read the smart report from my colleague Greg Piechota, who went deep into the capabilities of various AI tools to examine how they could support marketers in their various goals. I also watched a YouTuber reviewing Prince Harry’s book so I didn’t have to.
These two overviews do have something in common: They distill some of the most buzzy topics of our days, on which most of us feel like we should have an opinion, but many of us (or maybe just me) are a little bit daunted by the prospect of having to do all the work yourself.
But only Greg’s report will make you want to learn more.
And in this new start of the year, do drop me a line if you’re interested in presenting/speaking at our events this year. I particularly want to encourage folks who have not yet participated in this capacity. You won’t find a more engaged group of peers to be your first audience.
On this, off with today’s newsletter.
All my best, Ariane
Auditing an algorithm or providing a feedback loop (and why it matters re: generative AI)
Last week, you probably encountered some headlines related to the news that CNET was using generative AI tools to create articles — and the tension, questioning, backlash that followed (this great article from The Verge can be your guide into this story).
I haven’t dug into the angles of how much was known about this project, how long, etc. — I refer you to the article in The Verge — but the reality is that there are really three big areas of tension in this story:
One has to do with disclaimers.
One has to do with whether AI-generated stories have the potential to further reconfigure the economics of newsrooms — ie, will there be fewer journalists if more content is created by AI.
One has to do with the quality of the content being created.
The first issue — disclaimers — isn’t particularly the purview of the Smart Data Initiative. Essentially, it’s pretty analogous to, for example, how newsrooms wanted to address native advertising that may be running in close proximity with traditional (non-sponsored) editorial content and with a look and feel close, if not entirely similar to, traditional editorial content.
The second issue — would AI tools replace journalists? — is closer to our interests here.
Wherever there is automation, there is likely a reconfiguration of what roles humans take. This is true for every manner of technical and technological advances. There are no longer folks who are paid to light up street lamps. AI-driven content generation tools ask the same kind of questions as other parts of our business where robots and humans already collaborate.
For example, we might use personalisation algorithms to create curated feeds rather than human newsroom personel to build such feeds. In the same way, AI-created content may mean human journalists do different works (where AI can’t do the job as well as we do).
Now, the third issue — the quality of the content being created — this one is new and different.
One of the core value propositions of journalism is sourcing. Journalists, of course, report and record the source of the information they uncover. Even when they don’t name these sources (either deliberately or because they deem this level of detail to not materially add to the story), they are in fact able to explain how they acquired the various elements that make up the article they wrote.
A piece of journalism is, in other words, auditable. It is the possibility of an audit that, in itself, justifies the trust you may put in it. I am not asking for the journalist’s notes, but I could.
Meanwhile, the work of current generative AI is essentially not auditable. Only if I ask ChatGPT for something where I have some measure of pre-existing expertise am I able to make an estimation of the quality of ChatGPT’s output. If I asked ChatGPT about the content of Prince Harry’s book but hadn’t usefully wasted 20 minutes watching a fierce YouTube takedown, ChatGPT could tell me that the book is all about Prince Harry’s love of crochet and, well, I guess I don’t really know any better so why not.
In a very insightful and clear way, the excellent folks at Data & Society have gone over the three broad methods (first-, second- and third-party audits) that can be used to audit algorithms. But as you’ll see, these three methods each have their own blindspots.
And, in the case of large language models like GPT3 (on which ChatGPT is built), the issue is that the AI’s “understanding” of language (that is to say, its model) is purely statistical. ChatGPT understands your question because it has encountered so many other words so many times that it has a statistical understanding of that sequence of letters and proposes an answer (the text it generates) based on the matching rules.
In journalism, the quality of the context is everything — but there is a famous algorithm that uses it too: Google search
For us coming from the practice of journalism, this is a very divergent approach to creating content: When a journalist is given a surprising “fact” from a politician, for example, the journalist uses the specific context in which they acquired the fact to decide how to handle this fact.
There is not a use of statistics on the fact itself. There is a use of a broader understanding of the incentives of politicians (which are not always to tell the truth) to derive a decision on how the journalist may treat the new “fact” they received.
Meanwhile, an AI like GPT3 would consider this fact differently if it got repeated over and over. The “fact” now has statistical significance. It shows up often, maybe in different contexts.
This leads us to a crucial question, crucial for us as citizens but also as members of the news industry: What is the quality of what GPT3 ingests to build its model? And does it have a differentiated understanding of the quality of the content it ingests in the first place?
This differentiated understanding of the quality of source content is hugely important to good journalism, of course. In fact, it’s also something an algorithm we’re all familiar with also uses: the Google search engine ranking algorithm(s).
Google Search magic is not fully known by us mere mortals, but a lot is out there. And we do know an important factor in search rankings is called page authority (it is not, in fact, domain authority, but some factors that are part of authority metrics are site-wide metrics; see this article from Searchengineland for the full sidebar).
But this is all to say: The Google search ranking algorithm does want to create a differentiation on the source (page) itself — not just the number of instances in which a keyword shows up, not just the number of clicks or inbound links the article has. Margaret Mitchell, a researcher formerly with Google Brain, has an excellent thread over on Twitter to take you into this topic in details.
Now, historians of the Internet (by which I mean 2011) may remember that there was a dark period where some content farms seemed to edge ever higher every day to the highest search result pages. This was before the Google News carousel, so Search was everything. And we, in news, we’re accustomed to ranking high (we tend to have good domain authority). It seemed that all these weaksauce articles were crowding us out — bad for our business but also for us as people since we’re also users of the Internet.
Google famously took on the content farms — and won. There are still content farms, but when is the last time you encountered one high up in Google?
You’re wondering: Well, I thought we were talking about algorithmic auditing.
But we are. Because we have to remember why we even want to audit the AI in the first place: We want to do this to have a way to assess the quality of what it gives us.
As it turns out, Google Search does this in a different way — perhaps a meaningful avenue to audit a generative AI.
With Search, an essential element of how Google informs changes to its search algorithm involves human reviewers — that is, humans who review results and essentially give feedback on whether a page is good or bad (and, by consequence, whether the search engine did a good job giving this result).
This is not an audit. This is a feedback loop. But if Google link reviewers seem to reliably thumb down certain articles, analysing the statistics of these bad links is going to allow for the creation of new algorithms that can be used to refine and fix the final Search results. Same could happen for GPT3.
By the way, we have a name for this role in our newsrooms: the folks who reread the material and say whether it’s good or bad.
They are called editors.
About this newsletter
Today’s newsletter is written by Ariane Bernard, a Paris- and New York-based consultant who focuses on publishing utilities and data products, and is the CEO of a young incubated company, Helio.cloud.
This newsletter is part of the INMA Smart Data Initiative. You can e-mail me at Ariane.Bernard@inma.org with thoughts, suggestions, and questions. Also, sign up to our Slack channel.