Vox Media, Le Monde have cookie best practices you should know about
Smart Data Initiative Newsletter Blog | 01 August 2022
Hi everyone.
I really didn’t want you to bask in the rest and relaxation you have earned. Sure, the pool seems nice, and that cold drink is attractive. But also, why don’t I drag you right back to your desk with tales of cookies, consent, analytics, and GDPR? Does it sound like I’m bitter because I’m not at the pool?
Yeah. I’m not at the pool.
Feel free to send me your best FOMO pix from your vacation or data-related thoughts from your deck chair. The first will be met with all the studied French indifference of my cultural heritage, the latter gratefully accepted: ariane.bernard@inma.org.
Ariane
Best practice for your analytics from a consent perspective
I am a technical optimist. And I believe in the incentives of capitalism to make certain problems (emphasis on “certain”) inherently likely to find solutions in the free market. “There’s always money in the banana stand” school of thought, for my fellow Arrested Development fans.
This is why I think our currently fairly young data privacy legislation will mature. The never-ending cycle of legal challenge to our privacy practices or tools will slow down if not cease. We will have more clarity about data we can acquire with what tool under what circumstances. It won’t always be as murky and shifty as things are today. (See last week’s newsletter for the legislative angle.)
But this constant whiplash of course worries various corners of the data world — from analytics people, from data bosses who had a run-in with their legal department. This concern leads us to question whether we should drastically reconsider our practices: “Do we move to self-hosted analytics?” or “Do we completely flip our data acquisition to privacy by design?”
And there are several other connected angles to this big, big question: Should we really reassess how much data we’re gathering in the first place, and how we acquire and store it?
So, to be clear, these questions above: That’s not one newsletter, that’s a whole tome of research. And I’ll revisit some of these topics more specifically in the future, in particular privacy by design.
First things first: best practice to be “by the book”
Vox Media, come forward so I can bow at your respectful cookie drop. You are dropping almost nothing pre-consent (just a session identifier from your CMS, I believe). Even if I continue to navigate your site without accepting, you’re allowing me to navigate and still won’t drop any cookies.
You get my newly minted “Golden Cookie Award” (Street value: worthless) for following the law to the letter, but I am sad you have to give up tracking anything about my reading behaviour all that because I didn’t touch your consent banner. Still, “A+, fast shipping, would do business again” as they say on eBay.
I am spotlighting Vox Media because they are not using a vendor to manage their cookies, and theirs is a consent banner not a CMP (content marketing platform). So, something simple. There are other good citizens out there, I see you.
By the book, with a little bit more
Le Monde, please accept my congratulations and a Golden Cookie Award. You also do it with a lil’ bit of something extra, for which I must commend you.
The reason I’m picking Le Monde is because I didn’t want to turn this into a CMP vendor endorsement, and I know Le Monde is using a homebrewed CMP. There are other smart publishers out there. I see you, too.
In a nutshell:
Pre-CMP, Le Monde is dropping a couple of anonymous IDs to have a “whole audience” view. Whether I accept their CMPs’ term or not, the publisher has a view of their unique visits.
If I accept the cookies, every cookie in the universe of cookies gets dropped.
If I refuse (there’s no “customise” on Lemonde.fr), a few additional session cookies get dropped, and a cookie I am pretty sure is used for reporting audience measurement to the French audience measurement alliance ACPM. This one carries very little inside so must fall under “legitimate interest.”
So what Le Monde does here, that’s it. That’s the best practice. And they are going one beyond — they are dropping a very skinny cookie to do some basic analytics. But note that Le Monde has opted to go the route of a cookie wall: You must accept cookies or buy a subscription to navigate the site (many articles on lemonde.fr require a premium subscription, but even to browse the homepage as a non-subscriber, you must either accept cookies or buy a subscription).
But say you have not opted for a cookie wall — you’re willing to allow your non-cookie-having users to browse around. Le Monde’s strategy can be useful to you, too.
Consider adding a self-hosted analytics solution to your stack
Notice I’m not suggesting removing your current provider if you’re using a SaaS solution like Google Analytics (my previous newsletter explains why I believe our current legal pickle will improve). But I am suggesting you add a brick to your stack so you can collect additional analytics — particularly from folks who do not accept your cookies. This also divides and conquers what you may do with your primary analytics versus your secondary set.
This is a strategy that does two things:
It gives you a way to pick up basic analytics for the totality of your visitors (the cookie-havings and the not-cookie-havings).
It allows you to have a kind of “always there” batch. You don’t need to configure your self-hosted analytics with every kind of event you may already be tracking with your primary analytics suite (in fact, this has downsides, as we will see later).
I’m anonymising many publishers here because the spotlight is a bit harsh as you will see. But let’s say a hypothetical publisher (not you, of course) doesn’t have a CMP where they should have one, is dropping cookies alongside their consent banner, or is dropping their big analytics cookie before the cookie banner.
“Legitimate interest per GDPR,” says the hypothetical publisher.
At a high level, legitimate interest is meant to cover things like audience measurements for a publisher. But solutions like Google Analytics straddle such a wide range of needs that go well beyond audience measurements. And jurisprudence (from Germany in 2019) has ruled that GA needed to be part of a tracked consent strategy.
In addition, you would have to give up a lot of the features of your GA to pass off GA under the banner of legitimate interest.
Sidebar: You got cheated because I was shooting to give you something far more interesting than this general “give up lots of features.” But instead, I will share what I tried to do and what I didn’t succeed at doing, but it’s actually illustrative of the issue at hand. I originally went down a research journey to build a handy little table of what in GA would be “pure audience measurements” versus “marketing-y and requiring consent.” I read so much conflicting information that, dear reader, I’ll turn you back to your legal department instead. It’s a mess. But that’s the thing: The mess is why you don’t chance it. Put that GA cookie behind your CMP.
Enter your self-hosted analytics tool, with a privacy-by-design bend. This one will be configured to also take only basic analytics, the kind that would pass legitimate interest (as in, inherently wouldn’t allow you to fingerprint a user in any way). This cookie can be very high up, pre-consent.
Why you should consider a secondary analytics cookie
Upsides from a data building perspective
Beside giving you a good “whole audience” basic cookie and allowing you to keep your main analytics solution as full-featured as you want to — but respectfully behind your CMP, naturally, and taking in any current caveats introduced by our lack of a Privacy Shield legislation — you also get these other benefits from this approach:
It allows you to create the minimal year of double data that you would want to have if you ever had to switch analytics tools. Basically, whether you think you may switch in a year, two years or five years, you’re beginning to build that doubled-up data. I wouldn’t suggest that you go about creating double data for no reason (there are costs as we will see). But since you are adding the privacy-first audience cookie, consider that you get a data set that you may also lean on should your strategy really change in the future.
Along the same lines, this secondary audience cookie gives you a path to do a kind of slow replatforming — if this was a decision you wanted to explore. As you rehaul parts of your property for reasons unrelated to your analytics strategy — and have to retag these pages for analytics as you do so — this is when you may want to onboard the double event reporting across your primary and secondary self-hosted analytics. It’s replatforming — but from a roadmapping perspective, it’s part of your other projects. It may take a few years to have any form of critical mass of useful analytics from the second self-hosted cookie, but you incurred minimal built cost for this.
Upsides from an organisation perspective
Your data science, marketing, and product teams get to learn how to use your new self-hosted platform — and what can be done with it — at a far more reasonable pace than if you were setting out a big replatforming project. If there are areas where your main platform and your new platform don’t have the same affordances, you’ll discover this organically. But it won’t be a five-alarm fire since you’re not leaning on this data at this point.
You also have some peace of mind that you can pull a fire alarm system on your main analytics tool should some new legal challenge emerge. Your secondary cookie data wouldn’t be at par with your main analytics platform, but it can tide you over and give you a little bit of flexibility if you find that a particular country is mounting a legal challenge you couldn’t contend with quickly enough. Let’s hope things don’t come to that though.
Downsides
Obviously, if this approach was only upsides, this newsletter would just be a paragraph long. So here are some downsides:
Loading another analytics tracker into your property would affect page speed.
The cost of running this additional tool.
On the first, analytics tools aren’t usually the most offending scripts you load on a page. That title reliably gets awarded to the third-party scripts of your advertising — I will never not take that shot. Sure, the whole concept of optimising your site for speed is to take a harsh look at anything that’s not truly delivering a lot of value and taking this out. The weight creep of trackers is a story of incremental lenience, more often than true mismanagement. Except bad ads. Have I mentioned the ads?
But this is why the secondary audience cookie shouldn’t try to be as full featured as your primary audience cookie. Don’t track every event. Take just what you need for that back-up. Over time, and as you expand the double tagging, the loading costs of your secondary analytics will increase. And it will be worth watching how these incremental events are affecting the script weight.
If the scenario where the secondary audience cookie eventually subsumes your primary platform, then things change, of course. Then you’d be increasing the load of the secondary tracker but with the goal to eventually get rid of the first. In other words, if you’re headed to “perfect double-tagging” territory, you’ll take on more weight temporarily. But you may actually soon be in a single-tag universe again.
The second downside has no contest. You add a data pipeline from some infrastructure you own, you’re going to pay for it. Consider that it’s probably quite a bit less than if you wanted to pay for unsampled analytics from a vendor-hosted solution, but still.
There, the path to having a realistic assessment for the operation is to drop this cookie on a sample of your pages for a period of time, with super minimal work to actually calibrate and check the quality of the data. Basically, run the secondary cookie programme just to have a sense of what your bill could be.
There are ways to run the numbers from a hypothetical perspective, but running an experiment may be even faster and will be more accurate.
Further afield on the wide, wide Web
For a change of pace and something light a refreshing, how about a bit of climate crisis light reading? The MIT Technology review reports on a recent paper that looked at effective ways deep learning could be made to consume significantly less energy by carrying out some steps of optimisation in some of the biggest commercial clouds in use today.
Did I say I was tapped out on cookies and analytics? Haha, that was a lie. This is a bottomless pit of doom. America is percolating a federal GDPR. An update about Congress’s work on a federal data privacy bill, from The Hill.
On deck for this summer
I am d.o.n.e. with cookies for the time being. Up next, AI-related projects for a bit of refreshing innovation: Natural language processing (Should we translate all the articles?) and Responsible AI, data science ethics. Write to me if you’d like to chat about these topics as I dig into them further.
Further up ahead: Jodie Hopperton, lead for the Product Initiative at INMA, and I have started working on programming our excellent Product & Data Summit in November. We’re digging into potential speakers, great tales of transformation and innovation. My inbox is open at ariane.bernard@inma.org if you have great ideas we should discuss!
About this newsletter
Today’s newsletter is written by Ariane Bernard, a Paris- and New York-based consultant who focuses on publishing utilities and data products, and is the CEO of a young incubated company, Helio.cloud.
This newsletter is part of the INMA Smart Data Initiative. You can e-mail me at Ariane.Bernard@inma.org with thoughts, suggestions, and questions. Also, sign up to our Slack channel.