Swiss publisher Neue Zürcher Zeitung (NZZ), which produces a German-language daily newspaper, recently launched a beta version of a Text-To-Speech (TTS) audio player in Web and mobile channels. This TTS service is aimed at on-the-go and audiophile users as an alternative method to make consuming news simple and convenient.
With voice technologies like Amazon Alexa and Google Home becoming increasingly prevalent in readers’ lives, media companies have begun implementing strategies for delivering news with audio. While podcasts are still on the rise, NZZ readers have been specifically asking for more audio content. The product development team at NZZ decided to start a pilot project that would offer NZZ’s articles in audio format as a way to fulfil the need.
However, NZZ publishes up to 200 text articles per day on its digital entities. Recording all of these with professional, native speakers was not an option, so NZZ began seeking an automated and scalable method of converting text to audio.
The product development team at NZZ looked into several TTS services, including IBM Watson, Amazon Polly, and Google Wavenet (to name a few). Knowing that any TTS service on the market would develop rapidly over the coming months, an architecture had to be chosen that would be flexible enough to react favourably to change.
This need for flexibility led to a unique structure: The text runs through a self-built middleware, where words like “z. B” are replaced by “zum Beispiel” (German for “for example”) or abbreviations such as “boa” are replaced by “Boas Ruh” (one of NZZ’s editors). The text is then transformed into Speech Synthesis Markup Language (SSML) and afterwards sent through the TTS engine where an MP3 is generated.
With SSML, a standardised method for controlling different aspects of speech synthesis output could be provided. For example, it is possible to alter rate, pitch, and volume; insert pauses of any length; change the speaking voice while reading; and control many other aspects of how the text is read by the synthetic voice. The great thing about SSML is that basically the same input can be fed to any TTS engine because they all follow the same commands (with some small exceptions).
With state-of-the-art audio players such as those used by Spotify, Apple Music, Acast, and Sonos already heavily represented in everyday life as apps and services, the product development team at NZZ decided not to reinvent an audio player. Moreover, that would likely be confusing to NZZ’s users.
Instead, a player design was chosen that would fit into NZZ’s existing product environment and would at the same time create familiarity with existing audio players in the market. Of course, the player was extensively designed and co-created with beta users to land a version that would only needed tweaking, instead of complete redesigns after launch.
Just a few weeks before the planned beta launch of the audio service, DeepMind’s Wavenet released its TTS service for the German language. Thanks to the flexibility of the chosen architecture, it was a manageable effort to change to this new service within short notice. Furthermore, it is expected that this rate of continuous improvement will accelerate further in the future, and that the human nativeness of the voice will improve over the coming months so a switch to another provider is a realistic scenario.
NZZ’s strategy is to stay profitable in the long term through paid journalism. The audio functionality will play an important role since it is incorporated into the user’s conversion funnel. The effect on the conversion will deliver first results in the summer of 2019, after the service has emerged from its beta stage.