AI voice products are improving quickly and news companies must keep up

By Jodie Hopperton

INMA

Los Angeles, California, United States

Connect      

A few months ago, I wrote about OpenAI’s voice assistant and its remarkable ability not just to answer questions but to do so in multiple languages, with natural intonation — basically, to act like a human. You can see the short example video I made here.

On the ever-insightful New York Times podcast Hard Fork, I recently heard about Sesame, which may be the most realistic AI voice out there today.

If you’ve followed my work, you’ll know I use the RayBan Meta AI glasses daily, and yes, the voice is impressively lifelike. Amazon’s newly announced updates to Amazon Echo Plus will no doubt bring even more sophisticated voice capabilities.

But it’s not just about how these voices sound. It’s about what AI makes them capable of. You can now have intelligent, nuanced conversations with these tools. Their intonation, timing, and even breathing patterns are exceptionally realistic. 

And yet, there’s another layer to this shift — one that’s particularly important for us as news organisations.

It’s what we can now create, almost instantly, with these tools. We’ve moved beyond simple speech-to-text or text-to-speech. These systems can adapt language to context — and tailor tone and delivery to fit the moment.

One of my long-standing frustrations with narrated articles is that they’re written to be read, not heard. A media friend said the same thing to me last week: When listening to an article, it felt jarring that the narration dived straight in. Humans don’t do that. We say, “Good morning.” We set the scene. We ease into the conversation.

Now, we can do that. ElevenLabs, one of the leaders in this space, highlights these capabilities right on its homepage. Take a piece of text and instantly turn it into a podcast intro or a voiceover for video. It’s fast, flexible, and increasingly human-like.

If you haven’t already, take five minutes to try these tools. It’s like talking to a person — a friend or an assistant. As you play with them, notice how intuitive it starts to feel. You might begin to imagine a future with fewer screens. A future where instead of asking your device to remind you to do something later, you just ask it to get it done.

However you imagine using these tools, that’s how your readers will interact with them, too. Their habits, their expectations, and their media consumption patterns are going to change — fast.

We’re approaching a step change. We can now hold real dialogue with voices that are, for all intents and purposes, indistinguishable from human. It’s exciting. It’s unsettling. It’s full of possibility. And it’s coming.

The question is: Are we ready?

If you’d like to subscribe to my bi-weekly newsletter, INMA members can do so here.

About Jodie Hopperton

By continuing to browse or by clicking “ACCEPT,” you agree to the storing of cookies on your device to enhance your site experience. To learn more about how we use cookies, please see our privacy policy.
x

I ACCEPT