How supervised learning of recommendation algorithms adds a human touch
Smart Data Initiative Blog | 14 March 2022
My recent deep dive into personalisation and discussion of unsupervised learning leads us to supervised learning. Such learning works well for the areas of a publisher site where personalisation could be an option as long as the recommendation algorithms behind personalisation are being trained on data that reflects how a human would judge every piece.
This additional structured data (tags, scores) opens up the option of so-called supervised learning. The recommendation algorithm can now train on an enriched data set that, specifically, includes human judgment of the article. The outcome of such a personalisation will feel much more like the “perfect outcome” of personalisation: The impression that humans actually produced every single personal feed given for each individual user.
Human-like recommendations aren’t as outside of the realm as it may sound: The New York Times has a project in the works where editors are in the loop of the recommender system to better inform and coral it.
An Indian publisher recently was telling me about their current large effort to cohortise users in part to support their personalisation effort. To be clear, this is hugely important because it’s a dimension of personalisation that can support efforts well behind content recommendation. Content tagging will, obviously, readily be useful for content recommendation, and, to a lesser extent, to advertising on the page. But it cannot readily inform customer journeys, whereas audience cohorts can.
Audience cohorts also give you trends — and there’s an algorithm you have no doubt encountered that’s based on capturing these audience trends, which is Netflix’s recommendation engine (this excellent talk from 2018 explains). But audience cohorts are orthogonal to article quality — that stuff the newsroom worries about when they hear a recommendation engine is moving in where manual curation used to live. And it will not have escaped you that, crucially, Netflix’s recommendations make no attempt at ranking content on your screen on the basis of editorial quality. At Netflix, the worst and best movies are competing on an equal basis of quality: What matters is the likelihood you want to watch it.
For personalisation to be informed with human-like understanding of quality, the algorithms used to produce these recommendations must be supervised on training data that includes quality as one of their training factors. If there are three bins of articles for the site — one with “super important articles” and two other bins for “cool stuff” and “less essential” — you can write rules that will state certain areas of the site will use a personalisation rule that only dips from the “super important articles.”
Within these sets of binned content, you don’t necessarily have to write rule-based personalisation. The same sort of unsupervised learning you may use to power a widget far down the page can be used. A version of this was an approach taken by NPR with its NPR One curated story app back a few years ago, and the way human editors were tipping the scales was quite straightforward — an “editorially conscious algorithm,” as it was called then.
In 2020, CNN bought a company called Canopy that was working on a personalisation tool leveraging human inputs to be integrated into its wider product offering.
It is that approach — where the newsroom has essentially infused its reading of quality into the set of articles the recommender is working with —that will enable personalisation outcomes to align with the newsroom’s appreciation of quality.
If you’d like to subscribe to my bi-weekly newsletter, INMA members can do so here.