During INMA’s recent Product and Data for Media Summit, I gave a short presentation on how United Robots builds automated content without any heavy lifting on the part of the publisher.
In this article, I want to follow up on that presentation with insight on the process of getting automated content off the ground. After all, as Gunnar Södergren, head of delivery for United Robots has said, “This is, of course, a brand-new process for most of our clients, but it’s what we do every day. Our job is to make it as easy as possible.”
At the recent NYC Medialab AI & Local News event, moderator Matt MacVey asked the panel what tech expertise a publisher needs in-house to start using content automation. Cynthia DuBose, managing editor of audience engagement at McClatchy, which works with United Robots on automated real estate content and with Lede AI on automated high school sports content, said knowing where to find the right data is essential for building out the content you want.
She also emphasised that automation is absolutely possible even in a small newsroom: “Don’t feel discouraged,” DuBose said. “Find the right vendor or partner and start experimenting.”
While content automation is becoming increasingly commonplace in the news industry, almost everyone we work with is doing it for the first time. Letting robots produce some of the content published by the newsroom is a big step for many companies. This is for editorial reasons, but also, significantly, out of a concern for what the technical implications might be. In fact, the very first thing one of our U.S. publisher partners said to our team was, “Before we decide to start, we need to understand how much heavy lifting is involved.”
It is Södergren’s job to take publishers through the process of getting automated content live on sites and in apps. “Our language team and development team are involved as well, but I’m the go-to guy for the publisher,” he said.
There are three processes that happen before launching. These sometimes happen in this order, but more often there is some overlap.
- The robot is customised to the publisher’s product and specifications, such as what zip codes of real estate data to include, what sports to cover, and which roads to monitor for traffic updates.
- The language is iterated in a joint process between the newsroom and our language team. Dedicated journalists or editors in the newsroom are necessary for this to be an efficient process.
- The content feed is integrated with the publisher’s CMS.
The following are the key tech aspects, success factors, and possible blockers Södergren has identified based on his experience with client projects.
Know your CMS
According to Södergren, the CMS integration is potentially the most challenging part of the process for publishers. “It used to be the final step we did before launch, but we’ve come to realise it’s better to get integration started early on,” he said. “If a publisher is used to receiving external feeds into the CMS, it’s very easy. But if this is a new process, it can take some time. From my point of view, it’s key to have a contact person who knows the CMS.”
There are a number of ways the content can be delivered. By far the most common one is for the publisher to receive JSON files to a specific end point (URL) in the CMS, which then triggers an automatic workflow to convert the file into an article and automatically publish it. Some publishers require other formats, such as XML.
If a CMS doesn’t support this type of receiving end point, files can be uploaded to an FTP or sent via mail. Another option is setting up an RSS feed. “The goal from our point of view is that it should be possible to send the content straight to readers on sites and apps, though some publishers do the publish step manually,” Södergren said.
Decide on a schedule and distribution system
To get the most out of the automated content, it needs to reach the right reader at the right time. In terms of timing for topics like sports and traffic, the sooner the content is published, the better.
However, for topics like real estate, United Robots tends to receive the data in batches once a week, for example. In these instances, Södergren’s team can deploy a delivery schedule, so that, instead of receiving 100 texts once a week, the publisher receives 14 a day. The schedules can also specify the hours of the day based on a given time zone, or it can weigh the number of texts to times or days when there’s less other content.
In terms of geography, content can include any metadata tags, which is how publishers manage where it ends up, whether on local subsites or pushed to individual readers in specific locations or newsletters.
Give someone ownership
While content automation might not be considered a core business for media companies, having a clear project or product owner generally means a publisher gets progress and value out of the effort in a much more efficient way. Current product owners include Jan Stian Vold at Bergens Tidende in Norway and Ard Boer at NDC in the Netherlands.
Content automation does not have to involve a lot of heavy lifting for publishers. As DuBose said, find the right vendor or partner and start experimenting.