After three-and-a-half hours of six case studies — all focused on the expanding deployment of data-crunching algorithms in the news media industry — at the Smart Data Summit as part of the  INMA World Congress in Washington, D.C., one might conclude that data is potentially all things to all publishers: Data as a batch of technical initiatives. Data as editorial insight. Data as subscriber profiling. Data as robo-journalism. Data as the raw material for building publisher revenue. Data as content-aware artificial intelligence (AI).

Then again, the half dozen actual data scientists in the room of about 200 media executives gathered in Washington, D.C., might suggest we need more data to be sure.

Shailesh Prakash explained the many ways data is being integrated at The Washington Post.
Shailesh Prakash explained the many ways data is being integrated at The Washington Post.

“Those of us who are excellent in journalism have an easier shot at becoming excellent in technology than our friends who are excellent in technology have in trying to become excellent in journalism,” said Shailesh Prakash, executive vice president and chief information officer for The Washington Post. In light of that view, plus the fact Prakash’s boss and the news media company’s owner is Amazon’s Jeff Bezos, one might think The Post has plenty of data tech in its closets.

That calculation would be correct.

Prakash spent time describing that technology. For example, Clavis is the company’s recommendation engine for media content. Virality predicts which stories will be popular; this apparently takes about 30 minutes of processing at the moment but eventually should work in real-time. Bandito automatically varies and rates different versions of stories.

Additionally, Headliner is still in the lab but trying to automatically write headlines while Heliograf is designed to write entire articles, if only relatively simple ones at the moment. HelioVideo automatically generates video stories from stills and text. ModBot automatically trolls for trolls in comment sections. And, of course, there is Arc, The Post’s commercially available umbrella publishing platform for all this tech and more.

Google Search and Facebook are not as dominant as they once were, according to Josh Schwartz of Chartbeat.
Google Search and Facebook are not as dominant as they once were, according to Josh Schwartz of Chartbeat.

Turning data into editorial insight, Josh Schwartz, head of product engineering and data at content intelligence company Chartbeat, shared graphs with participants that demonstrated a dramatic shift in the news landscape over just the past 10 months.

Dominant media discovery engines Google Search and Facebook — often pejoratively referred to as duopoly — are becoming less dominant, Schwartz noted. Google Chrome Suggestions and Flipboard have both risen substantially in the rankings of how most people are finding and accessing news, mainly on their phones.

He also highlighted data signals pointing to huge opportunities in what he called “direct mobile,” and he urged publishers to invest more efforts in that area to get the best performance out of their editorial resources.

“Data science techniques really can broadly affect editorial strategy,” Schwartz said. “I really encourage folks to think about applications like this and others to inform their editorial decision-making and not just their business decision-making.”

Gerard Brancato said propensity modeling of reader behaviour has had a positive effect at the Chicago Tribune.
Gerard Brancato said propensity modeling of reader behaviour has had a positive effect at the Chicago Tribune.

In addition to content insights, data is also providing valuable information about readers. Propensity modeling of reader behaviour, using a wide range of digital indicators, has allowed the Chicago Tribune to increase its conversion rate for new subscribers by 36% while lowering the cost of acquiring those new paying subscribers by 27%.

That’s money in the bank, said Gerard Brancato, vice president of digital subscription marketing at ‎tronc (formerly Tribune Publishing), which owns the Tribune.

Brancato urged participants to develop their own propensity-to-convert and propensity-to-churn models, connect them to digital marketing platforms such as Google and Facebook, institute frameworks to measure effectiveness of these efforts, and then use systems that automate the resulting marketing and communications processes.

Finally, he suggested participants employ a well-considered, but not overly complex or high-cost, integrated marketing stack.

Humanoid journalism has allowed MittMedia to produce a lot of additional content, said Robin Govik.
Humanoid journalism has allowed MittMedia to produce a lot of additional content, said Robin Govik.

And when it comes to creating the content people read, there is data involved there, too. Robin Govik, chief digital officer at MittMedia in Stockholm, said he prefers the term “humanoid journalism” when talking about the use of digital robots to automate content creation and other editorial processes. It turns out Sweden employs more humanoid journalism than any other place in the world.

One particular project at MittMedia called the Homeowners Bot is actually the company’s most productive editorial employee, writing highly popular automated articles about every single home purchase in the region.

“Since more than 100,000 homes are sold within our geographical area every year, it wouldn’t make economic sense to let reporters write articles of every sale,” Govik said. “We decided to make automated articles. The Homeowners Bot was born.”

He added to editorial skeptics: “This isn’t a threat. It’s a possibility. If a journalist fears that he or she can be replaced by a robot, that particular person can probably be replaced by a robot. But a robot can never replace an ambitious, talented journalist. They can only make that person’s job better.”

Josh Siegel offered suggestions for participants on how to develop a data road map.
Josh Siegel offered suggestions for participants on how to develop a data road map.

Of course, the question is always where to start when it comes to data adoption and integration. Josh Siegel, the new senior vice president and general manager for the Audience Platform Group at Gannett, walked those in the audience who were looking for a smart data project road map through the steps and requirements for success:

  • Pursue business goals at different stages rather than in just one big push over a multi-year data project.
  • Leadership is required to manage the necessary cultural changes brought on by the new technology.
  • Focus where business value and data align to take the clear wins in this area.
  • Then gradually scale up your data project aspirations and the talent required to meet them.
  • Start simply: Ask simple questions for early data science efforts.
Ricky Sutton of Oovvuu said his company has grand visions with what AI can do.
Ricky Sutton of Oovvuu said his company has grand visions with what AI can do.

Ricky Sutton, founder and CEO at Oovvuu, is not new to the data integration space and has moved beyond thinking small. “Our mission is to embed a contextually relevant video in every article in the world,” Sutton said. “To do that, you have to read every article. We do it by reading articles, analysing videos, then matching them together using ground-breaking AI.”

Oovvuu originally tried to do this with people but found it wouldn’t scale. So the company started to develop its own algorithm to handle the task and was eventually approached to use IBM’s Watson AI. But that didn’t work out entirely well, either, because of the nuances involved in deciding which videos were most appropriate for which articles.

“We eventually cracked it by using journalists’ everyday editing behaviour to teach the AI and to improve the quality of the recommendations the system makes,” Sutton said.

Sutton closed the session by pitching Oovvuu’s efforts to create a consortium of 10 publishers to help improve its Compass platform and prove financial outcomes that it says could top US$17 billion and be the salvation of struggling legacy publishers.