2 analytics methodologies help shape media content

By Sebo Banerjee

HT Media

New Delhi Area, India

Connect      

If you are in the editorial content business, particularly with a newspaper on the way to transforming itself into a successful digital publication, you may have had the dilemma of focusing on the right stories.

You have the trending stories on juicy topics or celebs, often occupying the upper berths of the top-25 story list that you look at every morning. On the other hand, you have the great political scoops, human interest stories, analyses, points-of-view pieces, and crime reports that often don’t get as many readers you would expect them to, unless they are related to a major event that touched many lives.

The first type doesn’t require great skills or much investment. The second type, of course, is what good news media organisations stand for. These stories have made people buy your product years on end.

But now, you tend to believe digital media readers don’t like these stories as much. An incisive analysis of your city’s growing level of pollution may never match the performance of a photo exposé of a top diva’s wardrobe malfunction.

But does this infer that the second type of story needs more focus, if your brand stands for good journalism? Journalists have much more interesting things to expose, don’t they?

Though most digital news publications still depend on pageviews to generate revenues, the journey to move behind paywalls has gained major momentum of late. Sooner or later, most publications will move more of their content behind paywalls. And that’s when the second type of stories, the gritty ones that took considerable research and efforts to be produced, will be looked upon as the saviors again.

It will be more about getting readers who value good stories and retaining them. 

In most cases, such readers spend more time on pages, click on more “recommended links,” and are mostly loyal (returning often since their first recorded session) to one or more publications. To be better and better at keeping them engaged, the first thing you need is to tweak that list of top-25 stories and examine the performance of good journalistic stories alone.

Method 1: This is the easier and a very basic technique you may have been using already, but let’s revisit it. If Google Analytics is the tool you use:

  1. Create a custom report drawing up a list of top-100 stories sorted in descending order by pageviews (PVs) for any date range.
  2. Add the average time on page (ATOP) metric to that report too, so each story would show the total pageviews it generated and a macro level average of time users have spent on it.
  3. Calculate the median for the ATOP range.
  4. Use the advanced filter to extract stories above the median ATOP.
  5. Next, calculate the median for the PVs range, and filter stories at a value above the median using the advanced filter.

This will give you a list of stories that attracted more readers (compared to the rest of the stories you published) and were read and most probably engaged with too.

This chart shows advanced settings in Google Analytics where filters should be set.
This chart shows advanced settings in Google Analytics where filters should be set.

However, in Google Analytics, you still can’t set dynamic median values for live data, not even in “Calculated Metrics.” That limitation doesn’t let you fully automate the custom report, which is a problem.

For quicker action on data, important reports should be both templated and automated. Not doing that can cost you dearly. If you have started using Google Data Studio, it offers a solution to the problem.

Method 2: This is a tad more complicated and requires knowledge of SQL queries and entry-level statistics.

Let me assume you are importing Google Analytics data into an online SQL server middleware (or you have Google Analytics 360 subscription with Google BigQuery, but that’s very costly). And your data visualization layer is connected to the middleware through automated SQL queries. Advanced users prefer NoSQL over SQL these days. However, for in-house report automation, SQL is still good enough and offers adequate flexibility in creating relationships among stored data sets:

  1. A SQL query can be written to pull stories above median ATOP and Median PVs, for a time range of your choice. Be careful while writing queries for date ranges/timestamps on Google servers — only select Unix formats work.
  2. If we now distribute the stories on a normal distribution chart only for PVs, they get distributed around three quartiles.
  3. The query used to pull data in step a. above can be modified to exclude PV values below quartile 1 and extract the rest. Diagram 2 explains the process.

(Technically speaking, this process extracts the IQR range (quartile 1–quartile 3) values along with the outliers above quartile 3.)

This chart lists an imaginary set of stories with corresponding pageviews. HT assumes the stories have already been filtered for above-median pageviews and “average time on page.”
This chart lists an imaginary set of stories with corresponding pageviews. HT assumes the stories have already been filtered for above-median pageviews and “average time on page.”

You’ve now got a list of about 30-40 stories. Except for the days with major news breaks, more than 95% of the stories on this list will be the ones that attracted a large number of readers, and were read actually. These are stories into which your newsroom has put serious effort.

This query can be automated by adding a trigger, and can be visualised.

Chart 2: Normal distribution of the pageview data range of Chart 1. Calculated Q1, median, and Q3 values are shown. Pageview bucket size: 1,000. Diagram shows stories that fall below Q1 are excluded from the final story list.
Chart 2: Normal distribution of the pageview data range of Chart 1. Calculated Q1, median, and Q3 values are shown. Pageview bucket size: 1,000. Diagram shows stories that fall below Q1 are excluded from the final story list.

When you stretch the time range of the query to a fairly long period — and also pull information on author, publication time, Facebook reach, and Google search traffic for the stories that surfaced — then you start to gain wonderful insights about their performances. You need to ask two questions now:

  1. What are the attributes that made the upper half of the stack so successful, and can they be recreated?
  2. Why didn’t the stories in the lower half perform at the same level?

And if you start acting on the correct answers to these two questions, you will definitely see a spike in your editorial content performance. That’s guaranteed.

About Sebo Banerjee

By continuing to browse or by clicking “ACCEPT,” you agree to the storing of cookies on your device to enhance your site experience. To learn more about how we use cookies, please see our privacy policy.
x

I ACCEPT