Hi! This is the Smart Data Initiative newsletter, a bi-weekly post for INMA members on creating value with data analytics for media companies and incorporating a data-positive culture. I am a researcher-in-residence at INMA. E-mail me at: email@example.com.
DATA AND ADVERTISING: As Google ditches user tracking and shocks the ad industry, publishers see opportunities for the fittest
Citing privacy concerns, Google said goodbye to individualised advertising. News publishers are urgently planning scenarios of their ad business future. Many are expecting an increase in value of their first-party user data and rich content data.
Google announced Wednesday it plans next year to stop using technologies that identify individual Web users and track them as they browse across multiple Web sites.
“Once third-party cookies are phased out, we will not build alternate identifiers,” wrote David Temkin, Google’s director of product management, ads privacy, and trust.
Considering Google’s market position — 30% share in the 2020 U.S. digital ad spend, 63% share in worldwide browser usage, and 85% share in smartphone operating systems — this move signals a profound change in the digital advertising ecosystem.
IAB Europe Chief Economist Daniel Knapp said: “It’s no surprise that Google abandoned the idea of preserving individual targeting. What threw the ad industry in disarray is that Google said explicitly that it wouldn’t allow others to use individual tracking technologies on the Google’s ad tech infrastructure. A big part of the open Web is forced back to targeting by clusters of anonymous users and by context. It’s a profound change, full of uncertainties, and for which many have not prepared.”
INMA interviewed Knapp and executives of Amedia in Norway, JP/Politikens Hus in Denmark, Mediahuis in Belgium, and Tamedia in Switzerland to draft implications for news publishers.
First party-data play: Publishers that collect first-party user data, such as Amedia in Norway, will continue to offer targeting to individuals within their walled gardens.
They expect a higher demand and see an opportunity in cutting middlemen in ad tech or the media buying space, and they hope to be finally able to price quality inventory at a premium.
Amedia, a regional news publisher, reaches 40% of Norwegians, and 80% of its pageviews are generated by logged-in users.
The size of those gardens matter, so publishers might wish to form alliances with peers and, for example, pool data to create targetable segments across media brands. Amedia teamed up with Aller Media in Norway, for example.
Christian Thu, vice president/advertising sales at Amedia, explained: “You need reach, user data at scale, make ads easy-to-buy, provide insights, and document effects of campaigns. If publishers stand alone, they won’t provide sufficient value for ad buyers.”
The INMA interviewees don’t expect Google to entirely block personal data from flowing through its infrastructure, used by many publishers, such as Google Ad Server, assuming the use is privacy compliant. “This could flag up massive anti-trust concerns,” Knapp of IAB Europe said.
Yet uncertainties remain. Publishers await clarification of privacy rules from other tech giants, such as Apple, which recently forced mobile app publishers to ask users for permission to track them across apps.
Contextual targeting play: News publishers sold ads based on context for decades, in print and online. Advances in data analytics made this form of targeting quite sophisticated and yet fully privacy compliant. Content signals can be combined with insights based on first-party data or surveys.
For example, JP/Politikens Hus in Denmark built a technology to let advertisers target the same kind of segment of users, but not the actual users across Web sites. Last year, it invited six peer publishers, such as TV2 and Berlingske Media, to join its Publisher Platform.
Thomas Lue Lytzen, head of product development and insights, ad sales and tech at JP/Politikens Hus, said: “Publishers need to come up with standards for contextual targeting to be effective. Currently, we’re using the IAB taxonomy of topics to classify content, but we need a true news-centric taxonomy that would represent the true value of our content.”
The shift towards contextual targeting likely benefits premium publishers, as the access to their exclusive inventory becomes more valuable. In the era of audience-based ads, advertisers could track users of premium sites to cheaper ones and save.
Unfortunately, as the INMA interviewees observed, the advertisers and their agencies had been slow in shifting, fearing lower effectiveness of contextual ads, and revenue from this kind of targeting remained marginal.
Knapp estimated the total spend on online contextual ads in Europe in 2020 at €1.5 billion, or 5% of total display ad spend.
Google play: Publishers who haven’t amassed first-party data and those who haven’t broken free from the tech platforms’ infrastructure — read: most publishers in the world — will likely depend on whatever Google builds, and their dependence will grow. They will potentially surrender control of their audience value to the Google’s black box algorithms.
Christoph Zimmer, chief product officer at Tamedia, described the Swiss publishing industry’s push towards logging in users. In 2019, publishers formed an alliance and plan to build a single sign-on across media outlets.
Today’s reality is different: “In Switzerland, Google generates more ad revenue than all other media companies combined. Therefore, its decision to block third-party tracking entails a significant revenue risk for Swiss publishers and Tamedia.”
It’s not much different in Belgium, where most publishers, advertisers, and agencies use Google’s tech stack, while Mediahuis invested in a data platform and offered targeting based on its first-party user data. Interestingly, while in the United States 85% of ads were sold programmatically in 2020, in Belgium it might have been less than 30%.
“So, we are less stressed than stretched. We have several task forces on a local and group level, we talk with publishers internationally, and look at scenarios of how the ad market might evolve,” said Gert Desager of Mediahuis. “One thing is clear, since Darwin had observed: It’s not the strongest of species that survives, nor the most intelligent, but the one that is most adaptable to change.”
Google promises to replace individual identifiers with the Privacy Sandbox technology. In a nutshell, it clusters users based on interests and targets ads in a browser on the user’s device, keeping individuals anonymous to advertisers. It named the clusters flocks, like birds travelling together.
“Advertisers can expect to see at least 95% of the conversions per dollar spent when compared to cookie-based advertising,”declared Chetna Bindra, group product manager, user trust, and privacy, after internal tests.
Google expects to begin testing the new tech with advertisers in the second quarter.
- INMA Report: The Third-Party Cookie Trigger, INMA, June 2020.
- INMA Knows: Third-Party Cookies and Advertising curated by Dawn McMullan.
TEXT MINING: How Denmark’s Ekstra Bladet develops the tech behind contextual advertising
Availability of pre-trained algorithms and other advances in machine learning help news publishers make giant leaps in development of their own systems for contextual advertising.
“What required a lot of effort, time, and money spent on annotating data to train our own algorithms now can be achieved with surprisingly little data, short time, and low cost,” said Kasper Lindskow, head of research and innovation at Ekstra Bladet, part of JP/Politikens Hus in Denmark.
In an interview with INMA, Lindskow discussed practical applications of natural language processing, a method of extracting information from text.
Publishers use it, for example, to automatically analyse contents of thousands of articles and identify their topics. This classification is then used to match ads to relevant articles and to derive interests of readers (based on the topics they had read).
Text mining is a new key capability of news publishing, as it enables targeting ads contextually rather than based on individual user data. Other typical uses include assistance in editorial planning, suggesting headlines or summaries, automated content curation, and engaging audiences with recommendations.
Breakthroughs in text mining: Invented in the 1950s, natural language processing was revolutionised with the introduction of machine learning in the 1980s and further advances in its techniques, such as artificial neural networks, in later decades.
A big challenge of this type of analysis was that it required a lot:
Large quantities of articles annotated by people to train the machines.
A lot of storage.
And a lot of computing power.
This all changed in 2018 when Google, Stanford University, OpenAI, and others started publishing a new type of algorithms pre-trained on large datasets, such as the whole English Wikipedia.
“Publishers can today apply such algorithms to their articles, and after relatively easy and quick fine-tuning classify articles with high accuracy,” Lindskow explained.
Fine-tuning requires feeding the algorithm with examples of texts classified by people. According to Lindskow, one gets a decent result on topic classification with only 5,000 annotated articles. Ekstra Bladet decided to annotate twice as many to improve their model.
For this task, it hired linguistic students at US$25 per hour. Developing a basic model took two data scientists six weeks. This can be shorter if they would use open-source resources instead of developing the architecture themselves. The time-consuming and costly part has been to integrate the new database with advertising systems to allow planning campaigns and targeting ads.
Road to data independence: Ekstra Bladet has been upgrading its data and advertising infrastructure for years, aiming at reducing its dependence on tech giants’ systems, such as Google:
In November 2019, it launched its own data platform, Relevance, that segmented users based on first-party reader data and context derived from content.
In October 2020, it launched a contextual advertising network, named the Publisher Platform, in collaboration with six peer publishers, such as TV2 and Berlingske Media, offering advertisers reach to 90% of Danes across all devices or browsers.
In January 2021, Extra Bladet ditched Google Analytics for its own Web analytics software, Longboat, and has become independent of third-party technologies in the entire data value chain.
Kasper Lindskow is now heading an ambitious project to develop systems for news personalisation in collaboration with Denmark’s leading universities.
Treasures hidden in articles: Ekstra Bladet wants to extract more information from articles, such as people, organisations, places, things, and sentiments to allow more granular segmentation. “We have a proof of concept and now we are waiting for integration into our data products,” Lindskow said.
One of his ambitions is to link the content metadata with information from other databases or Web sites — for example, an article mentioning a person with her biography elsewhere. Scientists call such databases knowledge graphs: They collect pieces of information from different resources and organise them by linking through keywords. Google and Facebook use their graphs to improve search results, news feeds, and more.
Ekstra Bladet’s Lindskow believes publishers need to develop machine learning and Artificial Intelligence systems themselves to offer competitive reader experiences and to ensure these systems reflect journalistic values and ethics — and not those of the tech giants.
How are you tying data analytics to your business objectives? E-mail firstname.lastname@example.org.
About this newsletter
Today’s newsletter is written by Greg Piechota, researcher-in-residence and Smart Data Initiative lead at INMA, based in Oxford, England. This newsletters shares insights and best practices on creating value with data analytics and incorporating a data-positive culture.