Some old debates die hard; the arduous types live longer.

One such reasonably old and sticky debate is whether the volume of stories a digital publication produces has a bearing on the pageviews it generates.

I stand corrected. The topic may sound dated if you are from a publication with headquarters on the Eighth Avenue of Midtown Manhattan or in Mitte area of Berlin or the like. The business metrics measuring editorial performances in those places have changed for better.

For scores of others, pageviews are still a sacred battle to die for, owing mostly to the impression-based ad revenue generation culture.

This is not going to go away soon, as we expect no imminent change in how a lot of digital ad agencies still deal with their news media clients. And so it goes down: From ad impressions to pageviews to total volume of stories, there is a common belief that an increase in story count augments the “long-tail” traffic volume. That is, traffic from stories that are old toppers or “also ran” types.

And make no mistake, long-tail stories drive up to 65% of traffic to many free news Web sites sourced mostly from search, social, and messaging distribution channels.

But is that all you need to do to increase a larger part of your Web site traffic? Doesn’t this make the digital avatar of the haloed “fourth estate” look a shade paler? Is it that mechanical?

Wouldn’t it have a negative impact on the newsroom ingenuity? Wouldn’t that give rise to a wave of uncritical publication styles feeding on raw agency copies, stubs, and junks? Wouldn’t that give voice to some brave hearts championing the cause of automated agency copy publication systems? Wouldn’t that ... well, you get the idea.

I have heard equally potent opinions both in favour of and against the motion. A practical approach would be to find ways to improve the quality of the long-tail and check how Big Data analytics can come along.

The process would ideally start with capturing the publication time stamp for each story. This can be easily done in Google Analytics using either custom dimensions or data upload. Other analytics tools have similar processes.

Once you start getting the data, pull a single day’s stories with “pageviews,” “average time on page (ATOP),” and “average scroll depth” metrics sorted by the first.

Here’s an optional step: Depending on how much preference a publication accords to page engagements, one can create a weighted product of pageviews and ATOP or scroll depth for each story. That’s a better way to judge performances. However, to keep things less intimidating here, I’d just use the pageviews.

A single day’s list of stories of a regular news Web site has the top one-tenth part occupied by very high-traffic entries, followed by a sudden taper in the immediate lower part, which gradually wanes to almost nothing.

Let’s calculate the average and median pageviews from the data set.

For many news Web sites, this average would be much higher than the median or the middle value. In basic statistical terms, it’s a case of asymmetric distribution of data with a “right skew” (diagram 1). There’s a measure for it: a very basic and old formula for calculating “nonparametric skew” of a data distribution:

s = (μ-ν)/σ

Where, for the data set (and we are working with the pageviews for now), “s” is a measure of skewness, “μ” is the average (or mean) and “σ” is the standard deviation. Many more improved versions of the equation exist for various data distribution types, but this suits our simple need for now.

This is a common behaviour of a news Web site pageviews distribution curve. A few high traffic stories pull the average up, but median remains very low, causing "right skew."
This is a common behaviour of a news Web site pageviews distribution curve. A few high traffic stories pull the average up, but median remains very low, causing "right skew."

This is quite akin to the classic case of income distribution in some developing countries, where a handful of super rich people stay on the right of an income distribution bell curve — pulling the average income up — while most other incomes gravitate to the left. As most incomes are very low — the median tends to be low — indicating a high level of poverty in the country.

For a news Web site, a low median indicates a very large pool of stories with far fewer page views.

Let me use a hypothesis: If a good part of these stories were written, optimised, distributed, and indexed on search engines properly, they would have generated a higher volume of traffic.

Such stories would also pull the median value up. And in turn, they would contribute to a better quality long-tail in future — easily discoverable on target keywords/tags on search or social media respectively.

So, how do we use these metrics to improve the long-tail stories? The pageview distribution graph will mostly remain asymmetric with a higher average value than the median. But publishers can take a long, critical look at the stories down the median to introspect: If publishing all those stories was necessary, what went wrong with them? Does that call for a need to maintain a more delicate balance between quality and quantity?

Once this is made into a practice, learnings gained may help push the median up. In turn, the improved long-tail quality would positively impact the overall traffic in due course of time.

Let me summarise here: The median has to continuously improve while maintaining the high average. A decreasing positive skewness factor (s) would be a good indicator of growth.

A more accurate growth indicator would be continuous increase in the μ/s factor, where μ is the average and s is the skewness of data (only for the positive values). An increasing μ/s value would indicate that editorial, product, and digital distribution efforts are working.

This can be very easily built into the analytics reporting automation system and looked at at a weekly interval.

More spunky souls may ask now, why don’t we turn the skew factor thingy negative instead?!

Take a deep breath.

That means a median higher than the average, indicating a higher concentration of big traffic stories to the right (diagram 2).

A general "left skew" behaviour.
A general "left skew" behaviour.

I haven’t seen such a data set yet. But does that mean its unachievable? No, not at all!