Why news publishers should make automated journalism a core competency

By Kasper Lindskow

Ekstra Bladet

Copenhagen, Denmark


Automated journalism (or robot journalism) is beginning to affect journalism in significant ways, marking the beginning of what will, in time, be a revolution in news publishing. The impact of this impending revolution on journalists is already being thoroughly discussed. However, less thought has been given to the strategic implications automated journalism will have for news publishers. This includes how it will affect competition between news publishers and tech firms, including Google and Facebook.

The discussion about automated journalism’s strategic implications is important, as automated journalism holds the potential to chip away at news publishers’ control over their value chain and further “debundle” their news offerings. Such dynamics are not new, as technical innovations have already caused digital news publishers to lose control over the means of content and advertising distribution, for example, while classified advertising has effectively been “debundled” from the digital news offering.

However, this time, what is at stake is control of our core competency and core asset: journalism.

News media publishers must consider the strategic implications automated journalism will have on the industry.
News media publishers must consider the strategic implications automated journalism will have on the industry.

Automated journalism today and tomorrow

To appreciate the strategic significance of automated journalism, it is necessary to understand distinct approaches to automated journalism where different developmental paths and potentials exist. It gets a bit technical, so bear with me (or skip ahead to the summarising figure below).

Simple Natural Language Generation (simple NLG): The most basic form of automated journalism consists of automatically pairing structured data (e.g. sports results, traffic news, or crime statistics) with text bits in an article template via a number of logical rules. These texts bits and the rules governing their connection with specific data points are coded into a rule engine via an editor can normally be operated by a journalist with some technical flair and a little training. The result is fairly standardised news stories with language variety that depends on the amount of work put into setting rules and writing text bits via the editor. The results can be seen at the LA Times, Aftonbladet, and in local Swedish news media.

Natural Language Processing (NLP): While automated journalism based on simple NLG has been around for a number of years, automated journalism based on NLP is about to reach its breakthrough. Unlike simple NLG which rests on logic, NLP rests on Artificial Intelligence (machine-learning models such as BERT or ELMo) that enables summarising and synthesising large bodies of text such as industry reports, classical novels, or the news stories usually produced by news publishers.

Today NLP is able to produce text in a language that is just as rich and varied as articles from human writers or journalists. It is able to produce fake news stories with a frightening quality, as was vividly demonstrated by OpenAI last month. Soon it will be able to curate real news stories by parsing and synthesising other news stories, reports, and online discussions, but it will not be able to produce original (as in “new to the world”) news stories by itself.

Advanced Natural Language Generation (“true NLG”): Unlike NLP, true NLG will be able to produce original news stories based on parsing and synthesising news stories, quantitative data, industry reports, and similar content. Therefore, when fully realised, true NLG is close to being the fully autonomous robot journalist. However, to do so, true NLG must rely on a mix of advanced NLP as well as Natural Language Understanding (NLU), and possibly a number of other ingredients that do not yet exist. For that reason, even though intense research in these areas is being carried out by the likes of Google and Facebook, the realisation of true NLG is currently well beyond our “event horizon.”

While the development of simple NLG has been driven by news publishers in collaboration with producers of rule engines and editors (such as Automated Insights, Narrative Science, or United Robots), the development of NLP and true NLG has so far been driven almost exclusively by tech firms such as Google, Facebook, and OpenAI.

A large part of the reason for this is that simple NLG requires journalistic competencies and is less flexible and scalable than the other types of automated journalism. Conversely, NLP and true NLG is flexible and scalable (when it works) and does not require journalistic or other content-specific competencies, making them more attractive to tech firms.

The figure below summarises these three distinct approaches to automated journalism and provides a guesstimate as to when they will be widely available. The guesstimates should be taken with a grain of salt. This is because the emergence of true NLG is uncertain and — more importantly — the three approaches are likely to develop and merge as they progress, enabling still more relevant and usable automated journalism.

Three approaches to automated journalism and guesstimates as to when they will be widely available.
Three approaches to automated journalism and guesstimates as to when they will be widely available.

The strategic significance of automated journalism

The emergence of the three distinct types of automated journalism will change the strategic context that news publishers are embedded in. Even though the exact timing of the changes is uncertain and the severity of the consequences to some extent depends on the actions taken by news publishers, these dynamics must be expected:

Automated journalism will become a source of competitive advantage for news publishers. Automated journalism will become a source of both “quality” and “cost” advantages for news publishers.

The quality advantages stem from automated journalism enabling the production of news stories in areas and quantities that are economically unfeasible for human journalists. The cost advantage stems from “robots” being able to produce news stories more cheaply than humans, which either frees up editorial resources for other purposes or offers outright cost savings. These advantages will increase as automated journalism progresses, as will the competitive pressures on news publishers to engage in robot-based competition.

Tech firms will chip away at news publishers’ control of the value chain. Even though news publishers will publish automated journalism, most publishers will choose not to develop and own the technologies (whether rule engines and editors or NLP) that power automated journalism.

Accordingly, just like news publishers have lost ownership of the devices on which news is accessed and the methods by which content and advertising is delivered, they will increasingly lose control of some of the methods by which news stories are produced – and with that, risk appropriating a smaller portion of the revenues generated by journalism.

Tech firms will further erode news publishers’ near monopoly on journalism (“debundling”). As automated journalism progresses, still more types of news stories can be produced via technology and with little human involvement. For that reason, automated journalism invites tech firms (including Google and Facebook) to begin automatically producing and publishing journalism. This brings them in direct competition with news publishers.

At first, the competition is likely to be centered on the curated or synthesised news stories that can be produced with NLP, which are not regulated by current copyright law. These stories can be distributed via Google News or the Facebook News Feed as well as via voice assistants such as Alexa, Google Home, and Siri, for example. Later, the scope and intensity of competition will increase as NLP improves and progresses towards true NLG.

What should we do about it?

If the above scenario sounds scary to us the news publishers, at least we can be happy we are still in the early days of automated journalism and the future is not set in stone. However, if the scenario is true, it shows we as news publishers must view automated journalism as an existential issue and decide what position to aim for in a future where a growing part journalism will be facilitated by technology.

In responding to the rise of automated journalism, news publishers have at least four options:

  1. Wait and see. We are still in the early days of automated journalism, and all decisions are still made under a cloud of uncertainty regarding the quality and timing of future developments. For that reason, adopting a wait-and-see pattern will allow us to evaluate the pace of technological improvement as well as the strategies chosen by other publishers and tech firms.
  2. License a rule engine and editor from a tech firm. Simple NLG has matured enough for any news publishers to license a rule engine to begin producing automated journalism in areas where very structured data exists (e.g. sports or weather reports). This will allow journalists to develop competencies in automated journalism and create a “template style” that fits the news publishers’ editorial tone of voice. Further, it positions news publishers to reap the competition benefits against other news publishers as the quality and scope of automated journalism progresses.
  3. Build a rule engine and editor from scratch. To enable more flexible development of aligning automated journalism capabilities with publisher-specific needs, we may opt to build the rule engine and editor ourselves. This path allows us to maintain control over this step in the value chain and avoid new tech fees associated with the production of journalism.
  4. Engage in developing NLP-based automated journalism. Finally, we can engage in the development of NLP-based curated news stories. This approach will (probably) not result in the addition of curated news stories at scale in the short term, as NLP is still only on the verge of this capability. However, it will allow us to build competencies and prepare for competition in the area of automated journalism, which is where most progress is likely to occur in the near future.

Most of the news publishers engaged in automated journalism today have chosen to license rule engines (sometimes even template production) from tech firms such as Automated Insights (e.g. Associated Press ) and United Robots (such as in the case of MittMedia ). Other front-running news publishers have built their own rule engines and editors, and some are experimenting with integrating NLP into these tools (for example, The Washington Post’s Heliograf and Forbes’ Bertie).

However, even though things are changing as we speak, the development of NLP (and true NLG) has largely been left to tech firms. This pattern is natural, as the immediate benefits of automated journalism are limited and most news publishers have little experience with engaging in tech development.

Nevertheless, the key difference between the loss of control publishers will incur from the rise of automated journalism and earlier losses is that this time our core asset, journalism, is challenged. At the same time, we’re still in the infancy of automated journalism, and the technologies powering it have not yet progressed to the point where housing simple NLG or NLP technologies in-house is a reality.

Perhaps, it is thus both the right area and the right time for mainstream news publishers to make automated journalism (a journalistic technology) a core competency and capability. Regardless of whether we do, some tech firms will.

About Kasper Lindskow

By continuing to browse or by clicking “ACCEPT,” you agree to the storing of cookies on your device to enhance your site experience. To learn more about how we use cookies, please see our privacy policy.