The New York Times is using various forms of experimentation to optimise product features and marketing, and to improve content recommendations.

Experimentation in the form of A/Bmulti-variate, and bandit testing can provide the most actionable data to support business decisions, especially on digital platforms where we have the ability to vary the user experience systematically; that is, assign visitors to a randomly selected experience, and track subsequent behaviours after exposure to the experiment.

But what should we do in situations where the experience cannot be systematically varied? Or, where the cost of doing so is significant?

An example of such a situation occurs in testing new audience development tactics on search and social platforms, as we are unlikely to have the same fine-grained control over experiences in third-party platforms.

In these experiments, we are often measuring the effectiveness of pilot initiatives on driving new visitors to The Times digital properties (e.g., and native app downloads). Additionally, we measure the impact on re-engaging existing Times readers and deepening their engagement on our news platforms.

As these pilot initiatives do not allow for randomised experiments, weve opted to use pre/post-analysis methods, comparing performance to a matched control.

Here’s how it works:

Using a period of time prior to the experiment, we identify a matched control that is highly correlated with the test audience. In experiments on search or social platforms, we’ve found traffic patterns from internal and external referrals that correlate strongly with the audience of interest and can serve as the control.

Depending on the nature of the experiment, matched controls may be defined by geographic regions (e.g. one or more correlated cities or DMAs), user demographics, or a prior behaviours of the population.

The matched control allows us to separate the impact of the news cycle from the impact of the test initiative and more reliably ascertain the incremental benefit of the test than a straightforward pre-/post-comparison.

The chart below highlights a recent example, where we used this approach to estimate triple-digit lifts from a search initiative tested at The Times.

While this approach is by no means a replacement for randomised experiments, which are generally the preferred and more reliable way to measure the effect of changes to product and marketing, there are some advantages to a pre-/post-analysis approach, including:

  • Ease of setup, as there is no additional implementation cost or barrier that is often associated with randomisation.

  • Longitudinal analysis, as this approach can be appropriate for measurement of changes to long-term trends, while randomised experiments often measure only short-term effects.