Text-to-speech technology is driving audio strategies

By Patrick O’Flaherty


London, England, UK


When you look at recent success stories in digital news subscriptions, one theme comes up time and again: audio.

The New York Times, The Washington Post, and The Wall Street Journal — top three in the PressGazette rankings for English-language digital news subscriptions — are just some of the publishers investing in robust audio strategies.

Audio has proven to be a powerful tool for newspapers including The Wall Street Journal, The New York Times, and The Washington Post.
Audio has proven to be a powerful tool for newspapers including The Wall Street Journal, The New York Times, and The Washington Post.

The NYT’s flagship podcast, The Daily, attracts 4 million listeners a day — twice the newspaper’s peak circulation. CEO Meredith Kopit Levien says it “drops people into the core news subscription funnel” and boosts brand affinity among younger audiences.

Maggie Penman tells a similar story about The Washington Post’s flagship podcast, Post Reports. The newspaper also launched text-to-speech audio articles across its platforms, following a successful trial in its iOS and Android apps. The WSJ added ‘Listen to this article’ functionality using text-to-speech, and Jason Jedlinski, head of audience touchpoints, called it a “big success.”

Audio articles aren’t just a trend in the United States. In Denmark, “most news brands offer automated audio versions of articles,” according to Reuters Institute.

The rise of audio publishing

The rising popularity of podcasts, audiobooks, and, more recently, social audio platforms like Clubhouse no doubt made news publishers sit up and take notice of the format’s potential — especially when they were looking for ways to incentivise subscriptions and differentiate their packages from competitors.

But the shift to audio has also been motivated by a widespread problem in digital news subscriptions, something that Denise Law, then head of strategic development for The Economist, referred to as the “unread guilt factor.”

That theory goes is that when readers can’t find time to read articles, they feel guilty. They can no longer justify the subscription cost, so they cancel. This ties in with Northwestern University’s finding that frequency of consumption is the biggest predictor of retention in digital news. 

When audio versions are available, subscribers find it easier to engage — they can listen while they’re driving, cooking, or exercising.

Studies also indicate audio can resonate more deeply: 62% of Americans listening to more spoken-word audio said it engages their minds in a more positive way than other media does, while text-to-speech audio article listeners spend 973% longer on-site than non-listeners. As a result, the publisher sees less subscriber churn.

It’s also reassuring that consumers have demonstrated increased willingness to pay for audio access, as well as for news. Today, 47% of Americans subscribe to an audio service — more than double the number subscribing six years ago. And 36% of U.S. podcast listeners said they have or were likely to pay money to listen to a podcast.

The accessibility of audio publishing

Audio production has become more accessible to publishers thanks to advancements in text-to-speech. It’s no longer necessary to rely on human recordings, which are time-consuming and expensive to produce.

With the right technology, production and distribution can be automated and seamlessly integrated into an existing workflow. This makes it possible to deliver audio versions almost instantly and at scale, and in a way that supports wider publishing goals.

Berlingske, one of Denmark’s dominant news brands, uses BeyondWord’s API to automatically process its articles into audio. These are auto-embedded into the corresponding pages using a JavaScript player, which the Berlingske team has customised to maximise subscriptions: Upon pressing play, the user is encouraged to subscribe for audio access.

Berlingske's audio article player.
Berlingske's audio article player.

Claus Danboe Poulsen, product owner at Berlingske, said the company decided to make the text-to-speech feature a subscription product to increase the number of subscribers for its brand, berlingske.dk.

“Since the launch, we’ve seen an increase in new subscribers and considerable growth in the number of listens,” he said. “We are also very satisfied with the completion rate, which reaches around 50% on average.”

Another benefit of text-to-speech audio is its adaptability. Journalists can update evolving stories and rely on the audio version to keep up. As Andy Webb of the BBC, said: “You can’t have somebody producing a new audio version of one article every time it’s updated. But with [...] synthetic language, there’s hardly any additional cost to production at all.”

Low-quality synthetic speech was once a barrier to adoption, but text-to-speech can now produce engaging and naturalistic audio. There is also a wider variety of AI voices available, making it easier for publishers to truly speak to their target audience.

Customising the audio experience

Publishers who invest in custom voices, created using voice cloning technology, are seeing some of the strongest engagement metrics. Media24, South Africa’s leading media company, commissioned an AI voice based on a South African voice artist.

News24's audio article player.
News24's audio article player.

Kelly Anderson, deputy site editor at News24.com, said it was necessary to develop a custom synthetic voice: “Existing voices struggled to handle pronunciations unique to South African accents and it was imperative that the voice resonated with our readers.”

The customised voice is able to pronounce local names, towns, and places better than any previous solutions.

“It’s much more engaging to listen to a voice that sounds like our brand,” Anderson said.

Ultimately, as audio adoption and usage increase, a publisher’s voice or voices will play a key role in branding. There will also be scope for personalisation, with voices tailored to each listener’s location, demographics, and preferences.

Early adopters are likely to benefit most, as they build their listener base and adapt to their audience’s needs over time. But with the appetite for audio showing no signs of slowing, and text-to-speech continually evolving, expect audio to soon become a core aspect of every news media strategy. 

About Patrick O’Flaherty

By continuing to browse or by clicking “ACCEPT,” you agree to the storing of cookies on your device to enhance your site experience. To learn more about how we use cookies, please see our privacy policy.