Here’s what generative AI and Google Search do — and do not — have in common
Smart Data Initiative Blog | 30 January 2023
For us coming from the practice of journalism, this is a very divergent approach to creating content: When a journalist is given a surprising “fact” from a politician, for example, the journalist uses the specific context in which they acquired the fact to decide how to handle this fact.
There is not a use of statistics on the fact itself. There is a use of a broader understanding of the incentives of politicians (which are not always to tell the truth) to derive a decision on how the journalist may treat the new “fact” they received.
Meanwhile, an AI like GPT3 would consider this fact differently if it got repeated over and over. The “fact” now has statistical significance. It shows up often, maybe in different contexts.
Auditing AI
This leads us to a crucial question, crucial for us as citizens but also as members of the news industry: What is the quality of what GPT3 ingests to build its model? And does it have a differentiated understanding of the quality of the content it ingests in the first place?
This differentiated understanding of the quality of source content is hugely important to good journalism, of course. In fact, it’s also something an algorithm we’re all familiar with also uses: the Google search engine ranking algorithm(s).
Google Search magic is not fully known by us mere mortals, but a lot is out there. And we do know an important factor in search rankings is called page authority (it is not, in fact, domain authority, but some factors that are part of authority metrics are site-wide metrics; see this article from Searchengineland for the full sidebar).
But this is all to say: The Google search ranking algorithm does want to create a differentiation on the source (page) itself — not just the number of instances in which a keyword shows up, not just the number of clicks or inbound links the article has. Margaret Mitchell, a researcher formerly with Google Brain, has an excellent thread over on Twitter to take you into this topic in details.
Now, historians of the Internet (by which I mean 2011) may remember that there was a dark period where some content farms seemed to edge ever higher every day to the highest search result pages. This was before the Google News carousel, so Search was everything. And we, in news, we’re accustomed to ranking high (we tend to have good domain authority). It seemed that all these weaksauce articles were crowding us out — bad for our business but also for us as people since we’re also users of the Internet.
Google famously took on the content farms — and won. There are still content farms, but when is the last time you encountered one high up in Google?
You’re wondering: Well, I thought we were talking about algorithmic auditing.
But we are. Because we have to remember why we even want to audit the AI in the first place: We want to do this to have a way to assess the quality of what it gives us.
What AI can learn from Google Search
As it turns out, Google Search does this in a different way — perhaps a meaningful avenue to audit a generative AI.
With Search, an essential element of how Google informs changes to its search algorithm involves human reviewers — that is, humans who review results and essentially give feedback on whether a page is good or bad (and, by consequence, whether the search engine did a good job giving this result).
This is not an audit. This is a feedback loop. But if Google link reviewers seem to reliably thumb down certain articles, analysing the statistics of these bad links is going to allow for the creation of new algorithms that can be used to refine and fix the final Search results. Same could happen for GPT3.
By the way, we have a name for this role in our newsrooms: the folks who reread the material and say whether it’s good or bad.
They are called editors.
If you’d like to subscribe to my bi-weekly newsletter, INMA members can do so here.