AI-driven search could upend the search-media relationship

By Peter Bale

INMA

New Zealand and the U.K.

Connect      

Welcome to the latest INMA Newsroom Initiative newsletter.

The implications of Artificial Intelligence — especially the new breed of “generative” AI tools with their spooky ability to give us answers that approximate journalism — reach across the newsroom and all departments in publishing.

This Newsroom Initiative newsletter focuses on generative AI. Future editions will try to help you stay current with these critical trends as they affect our business. There are fantastic opportunities to create better, more accurate information, more efficiently — yet they carry some risks.

This is just the start, but some of it is going to feel familiar and may expose gaps in the ways journalists, newsrooms, and publishers deal with rapid technological change, as well as how quickly and imaginatively we may have to respond to this fundamental upheaval in search.

The Newsroom Initiative will build on the December report How Newsrooms Succeed in Google Search, which is all about getting the most out of the existing paradigm of search and may give us clues as to how to embrace or manage the revolution happening right now.

[BUTTON: CLICK TO SUBSCRIBE]

Generative AI search challenges all news publishers

If publishers thought they had it hard to extract value from or understand Google Search, they face an immense struggle from the new breed of search chatbots using generative Artificial Intelligence led by those based on the ChatGPT model.

Attribution of content and sources — already seen as a big problem by publishing content creators — is set to become a battleground between publishers, companies promoting AI-driven search, and regulators and politicians who generally lag in responding to technology upheaval.

“Search seems like a sort of Utopia when compared to what’s happening with AI and how we’re going to get attribution in AI. It already feels like that’s the real battleground over the next few months and years,” said a senior executive who handles search policy at an international publisher but preferred not to be named this early in the discussion about the implications.

The existing model of ChatGPT created by OpenAI has almost no attribution or clarity as to where the material contained in its uncannily correct answers derives from. 

The new OpenAI-powered Bing showcased by Microsoft appears to have some attribution and links back to content creators, but nothing like the funnels Google Search has historically shown and which drive billions of pageviews back to publisher Web sites — much as they might complain about an imbalance of power between Google and journalism. 

In the Bing demonstration, the results use numbered citations that allow you to drill down to the site where the actual answers reside — very much like Wikipedia but again, less obvious.

Screengrab from the Bing demonstration with OpenAI-assisted panel on the right.
Screengrab from the Bing demonstration with OpenAI-assisted panel on the right.

When did you last drill down to the actual source from a Wikipedia page?

Perplexity.ai, which is evidently also built on a version of ChatGPT, gives a hint of what a generative search engine that understands the value of attributing the origin of a fact or piece of news might look like, with clear branding and paths to major sources such as Reuters, NPR, or a range of relatively authoritative publishers, as well as government and official sources.

Screengrab from Perplexity.ai.
Screengrab from Perplexity.ai.

But the vast range of material open to or digested by the new generative AI models itself opens up all sorts of issues of copyright, terms, accuracy, as well as the big question of payment.

Where does this stuff come from?

“How do we have any level of grip over whats going on with the use of AI with news and information? Theres been very, very little discussion between the key platforms and publishers about how this is going to work,” the source told me, having been heavily involved in negotiations and unofficial product discussions with Google especially.

That lack of consultation reflects how fast the field is moving with some products perhaps rushed into the market without adequate thought.

In its frequently asked questions, Microsoft’s Bing search engine acknowledges that Artificial Intelligence engines can give what you might call flaky answers, saying: “Bing aims to base all its responses on reliable sources — but AI can make mistakes, and third-party content on the Internet may not always be accurate or reliable. Bing will sometimes misrepresent the information it finds, and you may see responses that sound convincing but are incomplete, inaccurate, or inappropriate. Use your own judgment and double check (sic) the facts before making decisions or taking action based on Bing's responses.”

Does that sound like a search service based on reliable information and attributed sources?

Publishers worldwide have complained for years of what they say is the opacity of how the Google search algorithm works — what they see as an imbalance of power. They also increasingly object to Google search responses that may contain the entire answer a user seeks, making it less likely they believe that traffic goes from the search engine to publisher sites — now known as “zero-click” answers to critics from the publishing industry.

(There’s much more on these gripes with the current Google Search — and Google’s answers to them — in the recent INMA report How Newsrooms Succeed in Google Search.)

Those questions over the current dominant search engine pale in comparison with the issues raised by the first generation of generative AI search devices coming to market right now.

For example, it is widely understood that Open AI has scraped an immense corpus from the Internet — much of it copyrighted material from publishers globally — which may be fine when it is used for experimental or non-commercial purposes. But what happens when Bing tries to present answers derived from that content and gain from it commercially?

“That will be where the next wave of concern probably comes in for publishers,” my source said. “We need a sustainable revenue stream in order to be part of the service.”

Conventional search, they said, will still be important for publishers for a long time to come. We know also the Google standards can help publishers who demonstrate the qualities of expertise, authoritativeness, and trustworthiness Google seeks. 

But there will now be a big focus on the implications of years of historic content as well as breaking news and fresh information being sucked into the maw of generative AI without attribution, linkage, or some form of adequate compensation for publishers — and much of it may also be plain wrong.

Given that in ChatGPT-driven answers the entire answer is usually complete, it can be assumed that very little traffic will be generated back to publisher sites and, therefore, little revenue.

However, it is clear Microsoft has the margins Google makes from search in its sights, not the margins publishers make. Microsoft Chairman and Chief Executive Satya Nadella told the Financial Times the new paradigm of generative AI search would permanently trim margins from the search business, fundamentally changing the economics of the entire industry.

“From now on, the [gross margin] of search is going to drop forever,” Nadella said in an interview with the Financial Times, making clear he believed Google or Alphabet was more vulnerable given its narrower spread of revenue-earning products: “There is such margin in search, which for us is incremental. For Google it’s not, they have to defend it all.”

In the same way that platforms had been dragged to negotiate on payments with legislation in Australia and other markets and in others, they had chosen to work with publishers. Some way to create a sustainable base of reliable content was in the interests of all parties.

“Without news sources, it becomes anarchy,” the source said.

Wikipedia is one way to think about generative AI

Journalists have reacted with horror-tinged-with-disdain at the ability of even early generative AI models to produce more or less adequate and sometimes very good articles of various kinds. It seems to particularly suit data-led or often repetitive journalistic modes, such as stock market reports, sports results, or weather updates. 

But we can forget it is not original material.

Wikipedia, the crowd-sourced online encyclopedia, is perhaps a good proxy to understand what generative AI is doing with journalism created and theoretically owned by publishers.

Wikipedia is not a source in its own right, though it is often used that way by students and others. Wikipedia is, in fact, a curated distillation by human volunteers rather than Artificial Intelligence, with a supporting collection of sources — all identified and attributed and linked back where it can be to the original source or owner of the material.

I asked Wikipedia co-founder Jimmy Wales, for whom I once worked on a journalism start-up, how he was thinking of ChatGPT and the comparison with the encyclopedia of sources.

Since ChatGPT doesnt really understand anything, it might not really be able to know where it learned something — or maybe it can, Jimmy said in an e-mail. I continue to be absolutely astonished by it and absolutely frustrated by how bad it is alongside how good it is.

If you think about the last time you traced a link back from Wikipedia to its original source, you have some idea why publishers may have a coronary at the thought of losing traffic from search if generative AI search of the sort demonstrated in ChatGPT takes off.

“It’s actually in the interests of the people who are making those products to ensure that there is a sustainable ecosystem still exists beneath them, my source said. The fact is it seems like they have been scraping all publisher Web sites on the open Web, probably breaching the terms of service of those publishers for months if not years. Nobody I’m aware of has ever been approached by these companies asking for a commercial license to do that.” 

The voice of God problem, again

Politicians and regulators will inevitably take time to get up to speed even as Microsoft launches its Open AI pilot and Google evidently rushes its Bard generative AI tool to the market and millions of people try and clearly enjoy the answers provided by GPT and other tools. 

The battle lines are being drawn and moving fast, and publishers need to get up to speed. After all, it’s not as though we haven’t been here before.

“If you then have a tool, which claims to be God, but that doesn't have any attribution in it, producing content at zero cost to the consumer and then free to distribute across the Web. It’s, it’s insane. It’s completely insane,” my source said. “It shouldn’t satisfy lawmakers. After all, we’ve only just come through a process where social media platforms were just spewing stuff.”

5 things you need to know about AI-driven search but were afraid to ask 

Q: What does “generative Artificial Intelligence” mean?

A: Generative AI systems fall under machine learning. Through machine learning, practitioners develop Artificial Intelligence through models that can “learn” from data patterns without human direction. These models then generate answers, having digested an enormous corpus of information to create relationships from a Large Language Model, in ChatGPT’s case said to be 45 terabytes of text data. (Source McKinsey.com/Wikipedia.)

Q: What is ChatGPT?

A: ChatGPT, or generative pre-trained transformer, is a chatbot interface to a model created by OpenAI to showcase the potential of generative AI to curate coherent answers to complex questions or tasks — from a piece of journalism to a piece of software code. OpenAI has also released the Dall-E tool to produce images with generative AI. There are others available. (Sources McKinsey.com/Wikipedia/OpenAI.)

Q: Who owns OpenAI and what exactly is it?

A: OpenAI is a San Francisco-based Artificial Intelligence research and product-creation organisation under the umbrella of a non-profit foundation with a for-profit company to monetise what it develops. It was founded in 2015 by a group including Sam Altman (now CEO of OpenAI and formerly President at Y-Conbinator), Reid Hoffman (founder of LinkedIn), Jessica Livingston (a founder of Y-Combinator), Elon Musk (PayPal, Tesla, Space-X, Twitter), Ilya Sutskever (computer scientist and chief scientist at OpenAI), Peter Thiel (PayPal, Palantir Technologies, and Founders Fund). Microsoft has invested several billion dollars in OpenAI. (Sources Wikipedia/OpenAI/CNBC.)

Q: Should I be worried about using ChatGPT in journalism or about my journalists using it?

A: Transparency may be the key, especially if you are publishing anything derived from ChatGPT or another large language model purporting to act as AI-driven search. It might be smart to insist that journalists disclose to editors and readers when they are using it, even on an experimental basis. The best answer may lie in the warning from Microsoft in the Bing FAQ: “AI can make mistakes, and third-party content on the Internet may not always be accurate or reliable. Bing will sometimes misrepresent the information it finds, and you may see responses that sound convincing but are incomplete, inaccurate, or inappropriate. Use your own judgment and double check (sic) the facts…” CNET and Bankrate may not have been transparent

Q: We already use Artificial Intelligence in some of our reporting. Should we stop?

A: Artificial Intelligence has already proven immensely valuable in data journalism, some forms of rote journalism (like sports results and stock market reports), let alone analysing enormous data sets in investigative journalism or health reporting, for example. It is a well-established process in many newsrooms and should not be confused with evolving “instant answer” or the emulation of journalism of these early generative AI applications. However, it also seems that generative AI as it develops may be able to assist or replace some reporting tasks and coding.

5 must reads/listens to help you stay on top of the generative AI discussion

  • Bing’s Revenge and Google’s AI face-plant is a sometimes frivolous but actually incredibly well-informed edition of Hard Fork, a New York Times podcast with Kevin Roose of The Times and Casey Newton of The Platformer. In this episode, they interview OpenAI CEO Sam Altman who gives genuine insight to where this technology could go — especially to my ears in coding, where it may democratise what until now have been specialised skills. The podcast also interviews Microsoft Chief Technology Officer Kevin Scott. I learned a lot.
  • ChatGPT and the Imagenet moment by Benedict Evans on his must-read blog and newsletter tries to capture whether we are at an inflection point when everything changes. The short answer is maybe, but let’s also be real about what generative AI is good at and not. I also recommend his excellent podcast on this topic Another Podcast — in fact the last three episodes with his co-host Toni Cowan-Brown are as good a primer as any out there.
  • JournalismAI.com is a Canadian site set up by former CBC journalist Andrew Cochrane and is a handy clearing house of curated articles on all aspects of Artificial Intelligence and journalism.
  • The Tow Center for Digital Journalism at Columbia Journalism School has an extensive set of resources on Artificial Intelligence, trends, tools, and ethics.
  • Journalism is lossy compression is a pretty good critique by journalism professor and author Jeff Jarvis, who questions what he sees as a dismissive approach inherent in much journalism towards potentially groundbreaking technology. It was prompted by this analysis of ChatGPT in The New Yorker, ChatGPT Is a Blurry JPEG of the Web by Ted Chiang.

Handy INMA resources on AI and generative AI

Recommended follow

In keeping with the tone of this AI search special edition, here’s the OpenAI CEO Sam Altman @sama, a sort of philosopher-developer-investor-philanthropist who is definitely thinking big thoughts and gaining big investments in his company.

Talk back

Tell me what you want to read and what you like or dont like in this newsletter, please. E-mail: peter.bale@inma.org . There’s also an INMA Newsroom Initiative Slack channel.

About this newsletter

Today’s newsletter is written by Peter Bale, based in New Zealand and the U.K. and lead for the INMA Newsletter Initiative. Peter will share research, case studies, and thought leadership on the topic of global newsrooms.

This newsletter is a public face of the Newsroom Initiative by INMA, outlined here. E-mail Peter at peter.bale@inma.org or newsroom@inma.org with thoughts, suggestions, and questions.

About Peter Bale

By continuing to browse or by clicking “ACCEPT,” you agree to the storing of cookies on your device to enhance your site experience. To learn more about how we use cookies, please see our privacy policy.
x

I ACCEPT