The idea was simple: Take this new-fangled GPT thing and use it to summarise the news. How hard could it be?
In late 2022, AI took a massive leap forward, and OpenAI’s ChatGPT-3 blew minds with its human-like responses to questions. Everyone wanted in on the hype, and news publishers — still suffering from FOMO and PTSD from the rise of the Internet — were not going to be left out.
Daily Maverick was no exception. Inspired by a talk from FT Strategies’ Tim Part on newsroom experimentation, I wanted to build something small to see what the technology could do. I started experimenting with OpenAI’s Playground, which offers more options than the usual ChatGPT interface.
I fed articles to GPT and asked it to summarise, create headlines, check for errors, and create Tweets. That taught us its shortcomings quickly: GPT-3 was good at summarisation, OK at headlines, useless at finding copy errors. For Tweets, it was unpredictable, creating a brilliant Tweet one second and an unusable one the next. But we could see the potential; it was surprisingly creative, generating hashtags and the occasional emoji for Tweets.
Its biggest struggle was counting. GPT-3 wouldn’t limit itself to 140 characters, or X number of sentences for a summary, or even summarise in four bullet points with any reliability. It’s the same reason GPT-3 is poor at rhyming: It cannot plan ahead. (This is what AI researchers call “thinking slow.” Large language models like GPT-3 are good at “thinking fast,” but planning ahead is their Achilles’ heel.)
Another problem is that OpenAI’s models want to summarise every major point in an article. This isn’t necessarily what a newspaper wants; depending on the use case, it might want to entice readers to read a summary giving a few top facts and then click through to the full article if they want more details. AI summaries didn’t complement the articles, they made them redundant.
And if an article wasn’t written in a strict pyramid style, the summaries were often wildly off the point. A flowery intro — common in opinion writing — would throw GPT-3 off completely.
Occasionally the summaries would miss the hook completely or get the article factually wrong.
Another issue was tone: The summaries were dry without careful prompting, lacking the stab-and-thrust of Daily Maverick’s typical jousting style. This was solved largely by adding: “... in the style of Daily Maverick.”
However, adding this occasionally resulted in summaries that began, “According to an article in Daily Maverick…” or “This article is about … .” Gouge my eyes out!
While GPT-3 was astonishingly good, it was also get-yourself-sued bad. Like a toddler, you wouldn’t want to leave it home alone, unsupervised, because, at some point, mischief would occur.
Enter the concept of human-in-the-loop, where a real live person assists the technology. We marked summaries as unapproved until an editor had checked them. The editor could decide to accept it as is, reject it (which immediately generates a new option), or edit the summary.
Creating the interface
I built two separate interfaces for what became the SummaryEngine WordPress plugin. One interface appeared while the editor was editing the article; the second was an overview of all summaries for all articles, which lets an editor work through multiple summaries quickly.
It was an inexpensive experiment to start generating summaries, see how editors use them, and build a foundation of knowledge, awareness, and acceptance of AI within the organisation.
Of the two interfaces, only the article CMS is really used — providing one of my big takeaways from this project: Editors are busy people, apparently. They want to do everything on one screen, and this screen is the one they work on the most. Just because it’s someone’s job to use a Web interface, the same rules that work on a potential browser or customer apply: “Don’t make me think” and definitely “Don’t make me load a new page.”
We went through two major design iterations with the user interface on the edit article page, with one of the smallest changes having one of the biggest impacts: changing “Unapprove” to “Reject.” (Thanks to Styli Charalambous for suggesting this.)
Too often, as software developers, our nomenclature is based on our data model or application model but doesn’t match the model in our user’s heads of what’s happening. Their perception of reality is more important than your coded reality because, at the end of the day, you need their buy-in more than they need extra work to do on every article.
Getting editors on board
To my delight, the editors started using the summaries, but we still didn’t know what we were going to do with them. (We didn’t tell the editors this, of course.)
Options included a summary newsletter, use in our mobile app, or even as a popup or sidebar while reading an article. And while I wouldn’t usually recommend starting a project without a firm design, in this case it worked out very well: Daily Maverick has quietly launched a completely new interface with summaries as another cheap experiment to see how our users interact with them. It uses two types of summaries — a short summary and bullet points.
It has quickly become my favourite way to read Daily Maverick and was visited by over 17,000 readers in just one week. The fact that we could quickly put out this product, and maintain it, even though we’re just starting to experiment with AI, shows the promise of the technology.
The project has also opened up the possibility of other use cases: experiments with GPT-4 show better results in terms of length and tone. It still cannot sub-edit or copy-edit worth a damn, but it can do translations.
TranslateEngine is on the cards. It can also suggest headlines, which we will probably roll into the headline scorer I built, which has changed how news editors structure headlines based on what our readers respond to. That has helped Daily Maverick surpass 10 million unique readers a month.
Once again, these interventions will be built with our people in mind, to give them a tool and not try to replace them.
If you’re going to let AIs write headlines alone, expect the same result as giving a toddler access to a permanent marker, a bucket of paint, and your makeup drawer, and then leaving them alone for a few hours.
Daily Maverick runs on Wordpress, so SummaryEngine was built in PHP on top of Wordpress, using MySQL as the data store, with some of the user interface built in Svelte with Typescript. We used the GPT-3 Completion interface, but will be trialling GPT-3.5 and GPT-4 using the Chat interface.