Svenska Dagbladet embraces audio to increase engagement
Generative AI Initiative Blog | 11 August 2024
Audio is becoming increasingly important to audiences, and according to Sonali Verma, lead of the INMA Generative AI Initiative, it is one of the most prevalent ways news media companies are experimenting with generative AI.
During this week’s Webinar, GenAI audio and internal tools: experiments from two innovative news brands, Svenska Dagbladet’s Ebba Linde shared her company’s journey to embrace synthetic voice technology. As head of product and UX for the Schibsted-owned newspaper, Linde said the company had identified the growing importance of audio in users’ daily lives and saw an opportunity to make journalism more accessible and engaging.
The data showed that time spent in listening mode had doubled over the last decade, with 71% of Swedes having access to a paid audio subscription (music and/or podcasts), and spending about 33% of their day engaged with audio.
“Audio can give increased accessibility,” Linde said, adding that the audio format is good for people who have difficulty reading and it can make listeners feel closer to the story because voices convey emotion that doesn’t come through in print or on a screen.
“It also gives people who are busy the possibility to multitask; audio is a great function for that,” she said. “And we can, simply by delivering our content, be available for our users a larger part of the day.”
Even though podcasts have been around for several years, the market is continuing to grow, with 44% of users saying they listen to podcasts at least once a month. However, as SvD began looking at the feasibility of creating podcasts, it quickly ran into a wall: producing custom podcasts required more time and resources than the team had. So Linde said it was time to look at other ways to use audio.
She didn’t have to look far; in 2021, sister paper Aftenposten had jumped into the text-to-speech arena, having a journalist spend 34 hours in a studio to record more than 6,000 sentences that the AI could develop a cloned voice from. The results were impressive, as Aftenposten saw that users who listened to content finished a larger part of the article than readers.
“So we thought, let’s do the same thing, this will be easy.”
It wasn’t.
Turning text to audio
Six months into working on the audio project, SvD still didn’t have a usable product and was unhappy with the results, so Linde said it was time to pivot. The team decided to manually record some articles to see if audiences were receptive to them.
“We started with maybe three or four articles. We put the person in the studio, they just read the whole article,” she said. “Then we put the player in our article and the results blew us away. The user feedback was so positive, almost all users that gave us feedback for this were super positive.”
Users of all ages were asking for more audio content. This experiment showed that while users appreciated a manual voice more than a robotic one, there was a clear interest in audio content. Now, it is experimenting with AI-powered personalised playlists, using existing algorithms for text to create text-to-speech articles. This approach combines AI and user intent data to deliver audio content to logged-in and non-logged-in users.
Linde said the decision to halt the voice-cloning experiment was the right one. As AI voice cloning becomes mainstream, providers are emerging that allow companies to input less than five minutes of voice data and quickly receive a high-quality voice clone, which it uses for some shorter articles. At the same time, it still uses “real voices” for some articles. That has allowed it to compare what audiences find most appealing.
“What we see is that the user feedback for our manually read in article is much higher than the AI voice,” Linde said. “That doesn’t mean that we should stop our AI experiments. It just says that users still appreciate a manual voice more than a robotic voice.”
She also pointed out that while English versions of AI voice clones are “almost as good as a human,” the intricacies of Swedish and Norwegian dialects are harder to master: “They are a quite complicated language to make voice clones. So we will get there, but the quality is still not good enough to put the long read in an AI voice for us right now. But soon, probably, it will be.”
The promise and pitfalls of AI voices
As exciting as the opportunities of AI are for news media companies, Linde reminded that it’s “important to remember that we’re experimenting with something that, in a controlled environment, is adding user value.” But that doesn’t mean it can’t be misused in other ways. SvD is committed to not only using GenAI but also using its journalistic power to write about it and educate users about the dangers of deep fakes. She said it’s also important to be transparent about where and how the company is using GenAI.
“It’s not something that will scare off our users. We should use it as something that helps us, but we should also be aware of the misuse that is out there,” she said.
She said recent reminders of technology’s imperfection were found in Aftenposten’s AI-generated voices, when the phrase “deep fake news” became “deep fuck news” and “Martin Luther King Jr.” became “Martin Ludder King Jr.” Everyone using the technology needs to keep in mind that it’s not perfect and requires human oversight.
As AI tools continue to proliferate, Linde advised that companies avoid being distracted by shiny new objects. Instead, they should focus on the problems they need to solve and then consider how AI can assist them. The tools are widely accessible to everyone, so what will make them unique to each company is how they’re used.
“You should focus on what problems that you are in a unique position to solve,” she advised. “Ask yourself what the problem is where you have a unique competence or unique input to deliver so you are in a better position than others to solve this. Focus on your unique competence.”