Schibsted makes the most of its newsroom humans in new text-to-audio initiative
Newsroom Transformation Initiative Blog | 09 July 2023
Journalists worried that Artificial Intelligence might replace them could find a little reassurance that audiences want and prefer a human reader — even if it is synthesised from a real journalist.
In the second of two reports on innovation at the Norwegian publisher Schibsted, we look at how audiences — and newsrooms — embrace the human voice of journalists even if they were the synthesised version of human journalists.
Publishers the world over are investing in audio to supplement their traditional skills in textual reporting in a trend resembling the infamous “pivot to video.” This time, however, it is backed by data and developments in text-to-voice technology that mean they can do it at scale at low cost.
Schibsted has had something of a natural language defense against would-be competitors with its main newspaper and online titles in Norway and Sweden. Still, it knows that as Artificial Intelligence helps create better and better synthetic voices, it has to compete with all comers — whether global media brands or technology platforms — to stay relevant to audiences.
“We’re in this global competition for attention,” said Karl Oskar Teien, director of product for subscription news at Aftenposten. “Our biggest competitors are not just other media companies. It is anyone distributing their content on a screen.”
Norway and other Scandinavian countries may have an edge in audio storytelling, according to Teien. Without being overly romantic about it, there is a cultural legacy of oral storytelling: “There’s the intimacy of conversation … . We used to come from a world of storytelling and we’re going back to the instinctual way of telling stories to one another.”
Like many publishers, Schibsted has developed the podcasting muscle after much experimentation, and the results are impressive. It has also, as have others, experimented with various forms of text-to-audio in using artificial voices to read out text stories. Now it is stepping up those initiatives and in some senses combining them, bringing the data and experience of podcasting to the question of meeting the needs of users who want to listen to their stories.
Perhaps the most significant step Schibsted has taken in this area is to work with a British voice-to-text company Beyond Words (which it found through an INMA connection) to create a synthetic voice of the person who presents its most popular podcasts. The results suggest that a familiar voice in your language — even if synthetic — increases engagement.
In Norway, it has now synthesised the voice of its leading podcast narrator — and Aftenposten.no homepage editor Anne Lindholm — to become the “voice of Aftenposten.” She spent 34 hours in the studio, read 6,812 sentences, and supposedly consumed 21 espresso to manage it.
“Our brands in Sweden, like Svenska Dagbladet, will do the same and have after an internal competition decided on one person from the newsroom to be their cloned voice. Several other Schibsted brands are likely to follow, as we believe this is a bet on the future worth making,” Teien told me.
The critical element, of course, is that you might be able to narrate a special podcast or a news bulletin. But if you want to scale to voicing 100 articles a day or more, you need synthesised or artificial voices, which it is clear will get better and better over time. It also opens the way to a level of personalisation — implicit and explicit — that should feel effortless to the user.
“If we can do this as close to zero marginal costs, and you can update it all the time with text-to-speech, then you can have this large content volume, which enables personalisation at scale,” he said. “Right now, if you can choose from 20 stories that are narrated (in audio), that gives you a relatively limited amount of choice …”
In the newsroom, journalists who write some of the landmark pieces of the week are encouraged to do the audio on their own stories — narrating them in person. That sort of bespoke work appeals to the journalists and on those special stories works with users. You can imagine in the future all individual journalists may have synthesised versions of their voices.
Personally, I am amazed at how well traditional media companies have taken to audio. Video always seemed like a stretch in cost and in skill or fit. Of course, many organisations have managed that shift and done well at it. But audio — especially text-to-voice — seems a logical extension of text and introduces a more conversational and explanatory tone.
“We believe there is still a significant gap in the experience of listening to a reporter reading their story and that of a synthetic voice, so we are likely to explore human-narrated articles in parallel with our text-to-speech efforts,” Teien said. “We’re exploring the potential for creating a voice purely for commercial content as well. The end goal here is really about making journalism available for consumption in more contexts throughout the day, and we have a particularly strong position on audio. It is more likely that we’ll win a significant share of users’ sonic attention than visual attention (where the competition is fierce).”
The data is pretty compelling that the trend towards podcasting and listening to non-radio outlets is increasing and that newspaper companies can compete in that area if they do it right and understand the trends that their customers are on.
Some of the signals are surprisingly familiar: completion of a story — a critical benchmark often ignored in print — is higher in audio than in text. There’s also a signal that may help retention of subscribers since there is the phenomenon of a sense of guilt that we don’t use our subscriptions to the full.
“If you go from reading two articles and then listening to five articles … we think that’s just gonna help you feel better about paying for the product. It’s like a gym membership. If you’re paying for it, you need to use it. This is going to help you use it more.”
It is also clearly important to distinguish between types of audio. A serial podcast is a different proposition, in cost and relevance terms, to a regular news update that is more like a top-of-the-hour radio bulletin, which in turn differs from a text-to-audio version of a story.
One area that appealed to me in what Teien was saying was the idea that journalists who had reported long-form work could extend that commitment to reading their work aloud — and that the data suggested listeners would understand the value-added that represented.
Another was that the use of synthesised voice would allow audio news to be delivered at scale and that updates of stories would in effect seamlessly be recreated in audio as stories developed. That is only possible with the effective use of synthetic voices.
“Should we, or must we, become an audio-first product for the next generation of users?” is essentially the question he is hoping these projects will answer.
Audio is a huge area of progress and innovation and the INMA Newsroom Initiative and the INMA Product Initiative will try to stay on top of it in our respective lanes. The Product Initiative has a Webinar focused on the Schibsted work in August.
For more information on Schibsted’s project to create synthesised “voices” of its titles, it was a finalist in the INMA awards last year.
For another perspective — and some warnings — on the use of AI and synthetic voices, see INMA Smart Data Initiative Lead Ariane Bernard’s report from a Nordic AI conference Readers have mixed feelings about AI-generated voices reading the news.
If you’d like to subscribe to my bi-weekly newsletter, INMA members can do so here.