What does GenAI mean for the ROI of audio?

By Sonali Verma


Toronto, Ontario, Canada


News publishers are becoming far more ambitious and innovative about using GenAI in audio than they had been in the past — and finding the ROI on audio-related tools often makes them worth investing in. 

Take, for example, the INMA Global Media Award submission by Germany’s Medien Hub Bremen-Nordwest, a joint venture between regional newspapers Nordwest-Zeitung and Weser-Kurier. They were dealing with print subscribers phoning them about delays in newspaper delivery. 

It is a situation that will sound familiar to many of us: Their agents were often dealing with long queues, and their customers were often dealing with long wait times. The biggest concern? Cancellations because of poor customer service.

Medien Hub wanted to relieve its agents of 20% of the cases and increase customer service availability significantly, especially at peak times. It worked with an AI tool, Parloa, to build a voicebot. Within six weeks, the bot ended up processing 30% of telephone complaints and significantly improved customer service availability. 

Even more impressive: The AI voicebot paid for itself within six months — which is half the time that Medien Hub had expected it to take.

“Our minimum viable product (MVP) goal was to achieve a return on investment on project costs plus ongoing costs within 12 months. However, the bot proved to be much more productive than expected from day one, leading to the ROI being reached in just six months,” Fabian Rosekeit, head of CRM and growth at Medien Hub, told me.

(You can hear more about it from Rosekeit himself and ask him your questions at our Generative AI Master Class in May.)

Medien Hub has now expanded the AI voicebot’s service to include vacation holds as well and is looking into letting customers submit complaints about past service, rather than simply current issues.

Another audio use case that generates efficiencies is audio transcription. Many news publishers have told me they are building these tools; Norway’s Schibsted has developed a tool called JoJo (using OpenAI’s Whisper model), which has already saved its journalists more than 18,000 hours of work over the course of a year. Other publishers are also using JoJo now.

A screenshot of Schibsted’s JoJo transcription service.
A screenshot of Schibsted’s JoJo transcription service.

Switzerland-based Ringier AG’s Blick built a tool that not only transcribes audio and video content and translates it into Swiss German but also exports subtitles as needed. The time required for transcribing was reduced from up to four hours to mere minutes — and journalists can now upload files immediately after interviews, with transcriptions ready by the time they return to their desks, thus speeding up the publication process.

As far as consumer-facing applications go, Spain’s Prisa Media uses audio for its personalised AI voice assistant Victoria (built in collaboration with Amazon, it works on the Alexa product). Victoria lets football fans pick their favourite team and ask questions about it. This allows the audience to engage with and interact with content on Prisa’s radio stations in a new way.

Prisa Media’s personalised Victoria football voice assistant.
Prisa Media’s personalised Victoria football voice assistant.

A Gazeta in Brazil clones its reporters’ voices so that they need to simply submit text to create audio voice-overs for videos.

Is consumer-facing audio monetisable? Well, The New York Times’ subscription-only app just passed a million downloads in about seven months.

The New York Times’ subscription-only audio app.
The New York Times’ subscription-only audio app.

How about GenAI audio as a way to build trust or to reach new audiences? It could be a play for younger consumers or for busy people who are multitasking — e.g. listening while cooking or driving their kid to sports practice or out for a run. 

But Dutch public broadcaster NPO is looking at an entirely new, even surprising, end user for audio: a listener who has trouble hearing. 

“We are experimenting with visual enhancements of podcasts,” said Ezra Eeman, director of strategy and innovation at NPO. “You can not only experience a podcast as something you listen to, but you can also visually experience it.” 

In other words, automatically creating video content from audio content.

Eeman also envisions a future in which instead of listening to a podcast, you could have a conversation with a podcast — a natural extension of the text chatbots many publishers are already building.

ICYMI: For a deeper dive into audio trends, take a look at INMA’s recent report, Why Some Media Companies Are Betting Big on Audio.

If you’d like to subscribe to my bi-weekly newsletter, INMA members can do so here.

About Sonali Verma

By continuing to browse or by clicking “ACCEPT,” you agree to the storing of cookies on your device to enhance your site experience. To learn more about how we use cookies, please see our privacy policy.