SPH’s in-house brand safety classification AI helps expand ad inventory
Ideas Blog | 29 November 2021
In the world of programmatic advertising, impressions are sold and ads are delivered by algorithms within milliseconds. So how can publishers help advertisers ensure their digital ads are delivered in a brand-safe environment?
Right now, advertisers rely on topic keywords provided by publishers to ensure their advertisements are placed alongside content that is appropriate for their brand. Singapore Press Holdings, like most publishers, uses a variety of APIs to ensure these keywords are generated accurately and made accessible to our ad tech platforms.
For brand safety protection, brands can rely on SPH’s solution powered by machine learning models developed and maintained by our data science team and business stakeholders.
Making our site safer for brands
Our brand safety API generates two crucial metadata for each article:
- An overall brand safety score.
- The brand safety topics associated with the article.
SPH Data Scientist Lu Yongning explained the choice of metadata: “By decoupling the brand safety score with brand safety topics, we enable the best of both worlds. We can have the flexibility to fine-tune our standards for brand safety while at the same time provide more keywords for audience segmentation via targeting and exclusion.”
“The brand safety score also serves as a guardrail. No matter what keywords appear, as long as the brand safety AI determines an article is not brand-safe, it will indicate as such.”
Right now, the brand safety score is an internal threshold that the SPH team sets for its articles. The customer experience will remain unchanged as advertisers will still target ads by topic keywords. In the future, we may allow thresholds to be set dynamically and varied across our customers. That way, we provide finer controls to brand safety for customers with different risk appetites.
Lower latency, greater inventory
By hosting the Brand Safety Classification API in-house, we have reduced the average latency of our API calls, which led to an increase in ad inventory. In our post-deployment test, we found the in-house API was able to identify 14% more inventory than our vendor’s API. An added benefit of having lower latency is that it enables faster page loads, which in turn leads to more responsive sites and consequently improves the overall user experience.
Accuracy is an important consideration in brand safety classification. To minimise risks to our advertiser’s brand, it is imperative to err on the side of caution; false negatives need to be minimised. As with any classification task, a precision-recall trade-off has to be made when setting the AI model’s threshold. A high recall is associated with lower false negatives while high precision is associated with lower false positives. In this case, we tuned our model to ensure a high recall is achieved.
To measure this empirically, we benchmarked our AI model against our vendor’s API using human-labeled test data. The results show our model was overly strict (high recall), whereas the vendor’s API was overly lax in brand safety classification.
Being overly strict with our model does come at a cost: It reduces our total brand-safe inventory. However, with the added inventory brought by lower latency, the in-house solution was able to provide a net 5% increase in ad inventory over the existing solution.
Improved flexibility
The solution also gives us the flexibility to refine our content taxonomy to better serve the needs of our customers. Standard keywords used in the industry could be complemented with ad hoc ones that could better identify with new emerging topics. The solution will also enable us to create bespoke topics for clients when the need arises.
Embracing the “Day One” attitude has helped the SPH team overcome obstacles along the path of innovation to provide the best solution for our customers. Right now, we are working to extend our solution to include an automated feedback loop between AI-assisted topic discovery, human validation, and machine learning.