2 approaches to forecasting pageviews offer insight to advertising targets
Big Data For News Publishers | 27 December 2020
Many of us could not have predicted the uplift in pageviews that most of news media brands had during 2020 because of the COVID-19 pandemic. Predicting future levels of pageviews could be difficult. Still, good predictions are important if, for example, you want to set up pageview targets or base the advertising budget on predicted pageviews per month or week.
Here are two ways to make forecasts of pageview levels.
The open source software Prophet, released in 2017 by Facebook’s data science team, is a great way to make advanced and scalable forecasts of pageviews. The forecasting procedure is implemented both in R and Python, so you can pick your favourite. Its strength lies in being fully automated while providing good opportunities for you as an analyst with domain knowledge to tweak and tune the result according to your best guess.
The input to Prophet is a csv file with historical dates and the corresponding number of pageviews. I usually use at least two years of historical data, preferably three. Prophet will give newer data higher impact on the forecast, so don’t worry about old data having too much impact on the end result.
Prophet is good at finding and applying trends and seasonality, handling outliers and missing data, and taking holiday effects into account. There are some manual adjustments I usually do to fit the procedure into the pageview forecast purpose, though. Holidays are one of those. Since some holidays, like Swedish Easter and Midsummer celebrations, can occur on different dates from year to year, I need to help the model pinpoint these holidays since I expect these events will have some kind of impact on the pageviews.
Prophet has built-in holidays for some countries, like the United States, India, Indonesia, and Korea for example, but Sweden is not one of them. Special events that you might want to highlight on the model could be elections, Olympics games, or other large happenings that probably will affect the pageviews positively or negatively.
Several brands within Bonnier Corp. have been using Prophet for pageview forecasts during the last couple of years with great success. For example, the Dagens Nyheter pageview forecast for 2019 turned out to be just 0.9 % away from the actual outcome. Also, the 2020 forecast was 0.9% off the actual outcome, up until COVID-19 hit us.
The procedure can, of course, also be used if you want to make forecasts of other metrics. I’ve used it to forecast the number of phone calls to our customer service centre (the more calls, the more staff needed and the higher costs), but it could also be used to forecast how many newspapers to print and distribute.
If you’re not comfortable enough to work in Python or R, you can find other ways to forecast pageviews. One less sophisticated but valid approach could be to do your forecast in Excel.
Start with historical data, preferably at least the last two or three years of pageviews. Outliers can be detected and handled; for example, by first calculating quartiles and interquartile range (IQR) and then lower and upper bounds. Excel has an built-in function called FORECAST.ETS(), which can take seasonality into account. The function supports up to 30% missing data and can automatically detect seasonality.
When the first draft of the forecast is created by the FORECAST.ETS() function, there are still some manual adjustments you might want to make. Just like Prophet, you need to adjust the forecast for planned events. For example, if you take an election (or similar event with a large impact on your pageviews) into account, you can calculate the percentage uplift in pageviews during the last election and apply that uplift to your forecast.
Forecasting is an iterative work that requires good domain knowledge and methods. I encourage you to start experimenting, dig into your trends and seasonality, and try to understand them. You can learn a lot from it, and after a while you’ll be able to make some on-point predictions.