This pace of technological change is killing me. In September, we released an audio report packed full of useful insights on building audio products. We already need to add a new chapter.
Reporters have long had tape recorders to enable them to go back and double check what someone has said. This is now available on steroids.
Not only can you replay audio, but it can be automatically transcribed in real time, have notes added, and summarise the conversation. If it’s a presentation or someone is sharing a screen on a video call, the slides are automatically placed at the right points in the transcription. The transcript is searchable. In fact, all your transcripts are searchable so you can also see who else was talking about the same topic or using the same phrases.
To be fair, I didn’t need to lead a recent INMA Silicon Valley Study Tour to know that. As anyone of you who has spoken with me knows, all my calls are transcribed by Otter.ai. I couldn’t even begin to tell you how much time it has saved me, nor how it has improved my memory and accuracy.
Technically you can join meetings without actually joining the meeting. Sometimes I’d love the TL:DR version of a one-hour call! It’s not so far into the future that the assistant will know your input so well that it could predict what you can say, which means you literally don’t have to join a meeting — although I’m not sure how much I would trust that until I see it in action. I also hate the fact that I am probably that predictable!
Right now Otter.ai is only available in English, but they tell us they are planning to move into other languages. Competitor Trint, which was started by a veteran journalist (and, for transparency, I helped launch in 2015), is already available in multiple languages and talks a lot about their collaboration tools.
We collaborate in documents but being able to do it real time is helpful. And having an AI assistant on a call can be helpful, too, especially with acronyms (IYKYK).
This tool has a host of other benefits, too — transcriptions and summaries of podcasts, which in turn helps SEO. This makes it easier to translate content into multiple languages, opening up to different markets or different languages that should be served within a single market (hello friends in Switzerland, South Africa, New Zealand, and others).
But wait. You don’t need the translation step to turn words into another language. Synthetic voice companies such as Resemble AI simply create the version in many different languages. But wait, you ask, you can’t always just translate as there are analogies and references that don’t always make sense. I know this as a Brit in America when I talk about the weather in Fahrenheit or hear the term “inside baseball.” These large language models are building a library of alternatives to make the languages truly native.
Everything I have heard about narrated articles is positive — more engagement, longer listen times, etc. — but I have a bugbear. Articles are written to read — not to be listened to. Personally I find them hard to engage with and sometimes zone out. Good news folks: With a simple prompt, you can get the text changed to be more script-like, or chatty, which helped enormously. In fact, there are many prompts you can give to change the text to make it more appropriate to an audio product.
Maybe you create podcasts and have someone go through and edit the ums and ers or try to figure out how to get rid of background noise. Companies like Resemble and Adobe Podcast enhance suite take average audio and make it excellent. In fact, we used the latter in the audio version of the audio report. It cleared up a couple of awkward moments with the synthetic voice — just like magic.
When we learned about Schibsted’s synthetic voice, we talked a little about how they went about identifying what their “brand” sounds like. Now you can play around with that fairly easily. Maybe you give readers the choice? Or have a synthetic version of all journalists’ voices. Timelines have come down from several months to create a voice to just minutes.
The companies that I have spoken with all have CMS integration available through an API. It takes a little configuration, but it’s now reasonable to integrate a full audio offering without having to set up studios and editing desks in your newsroom.
Another impressive service I recently came across was Deepgram, which has a speech-to-text service only available as an enterprise version through an API. It was mainly developed for call centers but also has uses in news: sentiment detection, finding the right place within an audio product to place or track ads.
Many of these AI audio product enhancements have been announced in the last couple of months. Whatever we think about the speed of change, one thing is sure: They present serious opportunities for news media at a fraction of the cost of creating all this manually. We would be crazy not to evaluate them and the benefits they can bring to our readers. Or perhaps we should call them our (potential) listeners ;).
If you’d like to subscribe to my bi-weekly newsletter, INMA members can do so here.