El Comercio just completed a pilot programme that allows the media company to use alternate ways of generating content in the newsroom.
Content generation has a tremendous opportunity to leverage technology and semi-structured and unstructured data to enhance its readership and monetisation, Giulianna Carranza, chief data officer at El Comercio, said in an INMA members-only Webinar this week.
El Comercio is part of a conglomerate of Peruvian media companies, managing eight brands in digital or print format, or both, oriented to specific audiences. The same news may have a different language and pitch depending on the outlet in which it is published and the audience to which it is addressed.
In this operation, content objects — such as photos — are optimised so that they can be used in multiple media at once without having to spend more resources.
“These eight brands are our daily and annual challenge for us to deliver much more relevant content to our target audiences,” Carranza said.
The El Comercio Group initiated three initiatives, which began shortly before the pandemic, that involved data scraping as the main input, allowing newsrooms the ability to generate content, either in the form of a special report or an article.
In 2020, the digital transformation became more necessary, forcing the company to increase revenues through digital media, Carranza said.
This meant having faster and more agile content, as the news was constantly coming in. Thus, 2020 was the year of consolidation of data scraping and the relationship between the data and analytics team and the newsroom.
Text scraping is a way of writing specials and articles based on open sources.
Web scraping is a process within data science that allows you to extract and collect structured data from Web sites in an automated way, while simulating how a human being would surf a Web site by using programming languages such as Python and machine learning.
According to Carranza, the first step is to identify an unstructured or semi-structured open source to extract the information to generate content based on the analysis of this data.
This process goes through a debugging and design phase, where it is determined whether this data becomes an article with graphics, texts, and photos, or a special feature, which includes only statistical graphics and can omit text, and therefore may be published practically in real time.
The result of a special on elections in Peru in 2021, which had more than a million views, was so positive that it opened the possibility of incorporating advertising to monetise it.
“Scraping is very simple, but it can make a single article achieve a large number of pageviews and traffic,” she said.
This special worked very well for the presidential elections, but it also works for issues such as the coronavirus or for general breaking news.
This content is measured by El Comercio to understand the daily evolution of the article’s readership, as well as the traffic and engagement generated among registered readers and subscribers.
A second measure is the conversion of a scraping article, that is, how many subscriptions a piece brings, such as one that resulted in 12 subscriptions.
To guarantee the quality of the product, there is a person in the newsroom responsible for giving the final touch to each article. “There is no limit on the number of articles generated from scraping,” says Carranza.
Some of the open sources El Comercio uses to create articles from scraping are: Forbes, The Economist, The New York Times, the International Monetary Fund, JP Morgan, and the Lima Stock Exchange.
Four human teams are involved in this new workflow:
- The statistical hub or data science team, which collects the data and presents it in a digestible way.
- The portfolio of brands and audiences, which dictate the editorial guidelines for each brand.
- The technology department, with whom the newsroom must interact when using the CMS.
- The editorial team, with its writers and editors, who are in charge of quality control.
In 2021, smart videos were added to data scraping, and in 2022 the team will create robotic articles from smart videos. The robotic content gives the newsroom the ability to include face recognition and speech-to-text.
In the past three years, readers have dramatically changed their consumption habits: “This means that we have to ally ourselves with technology to make our work and that of the editorial staff much more relevant,” explained Carranza.
The addition of smart videos arose from the need to store and dispose of video reports and interviews, as well as to improve current processes.
Then came the idea of doing searches by facial recognition of off-line videos, instead of doing it by text, to use the archival material to make or complement an article.
This process was achieved through cloud tools and deep learning, after a trial-and-error process and changing the format of old videos to AVI, MP4, etc., to use them on the different products.
Face recognition also served to identify the main characters of a certain video to tag them, keep them stored and indexed for easy access by the newsroom staff (with FaceSearch), in a safe way, through access codes.
The next step for El Comercio will be the application of speech-to-text technology, which allows reporters to retrieve quotes or comments said by the people in the videos. These statements are converted to text automatically to create an article from scratch.
This process saves a lot of time and effort since transcribing what a person says in a video can be time consuming. The software even writes a base article that only requires a quick check by the editor before it is published.
There is another step, which is still in the implementation stage, and is to give an automatic check to the article before it goes to prepress and from there to the rotary press for printing.
All these initiatives make what is called robotic content: an automation of daily routines to make them more efficient. This automated content is properly written, with a certain level of detail, plus statistics graphics, unique tags, and references to social media.
It also frees editors from writing daily general newsy articles and lets them focus on specials or premium articles for subscribers.
The benefits of such content include:
- The system can create automated quick articles from major soccer leagues.
- It sends content to the platform, social networks, and the CMS.
- It increases the number of subscribers.
- The texts are enriched with images, maps, graphics, tags, and links.
- It allows editors to focus more on the writing of premium content.
- It favours greater audience reach and increased readership.
- It increases output speed and article reach.
- It generates more content for international audiences (in Colombia, Mexico, and the United States).
Additionally, it improves the scope, opens up possibilities for getting new subscribers, increases the writing speed, and grows the number of readers — the greater the volume of articles, the greater the readership.