Until a few years ago, when small- and medium-sized publishers approached Big Data, they did it with some trepidation. The fast-maturing technology indeed showed a rainbow, but not what’s waiting at the end of it.

The smoke-screen hung on for a few more years, making many publishers adopt expensive tech without clear-cut objectives. And the results were commensurate. There is no one to blame; evolving technology bites more than it pays off.

But we can have a clearer vision now, thanks to the frontline Big Data martyrs and some significant work by technology majors like Google and Amazon. The juggernaut of Big Data tech is now online and served in platters of various shapes, sizes, and hues.

Companies need to understand their Big Data needs instead of just jumping in to the biggest, fanciest system available.
Companies need to understand their Big Data needs instead of just jumping in to the biggest, fanciest system available.

I’ll stay focused on editorial analytics and avoid the real-time domain, as it requires significant development efforts involving costly talents. Unless your organisation boasts of a tech budget that puts a mid-sized tech start-up’s annual plans to shame, it would be wise to settle for a subscription-based service such as Chartbeat, Parse.ly, or dot io.

If you are still persistent on having your way with custom real-time analytics, wait until Google Realtime API comes of age. That will save some effort.

The following five steps can be considered as you prepare for setting up online storage and Big Data analytics for deep content and operations intelligence. These will make sure you don’t overspend while your projects yield useful results.

1. First, evaluate whether your analytics project really requires Big Data support. It may sound confusing, but many publishers end up purchasing costly hardware that processes only a few hundred megs weekly or fortnightly. Thats a big wastage. Unless your estimation of data exceeds a few gigs in size, an efficient SQL or NoSQL architecture can handle queries into them.

It may be a good idea to first focus on testing the queries/logic sets capable of retrieving the insights you need on smaller data sets, and upgrade the server/query capacity when data sizes start hitting the limits.

2. Many expect that expensive hardware or software alone delivers great insights. Technology isnt there yet — a cutting-edge film-editing software still needs a good film editor to make a precisely edited film! Every Tableau or SAS project has its genesis in a simple but intelligent question giving cues to an array of equations that churn insights.

There must be insightful, innovative minds among business users who are able to ask those questions.

3. At the same time, processes have to be in place to collect data from multiple functions, and this can be a bigger challenge than getting the technology right.

Consider this: A prospective advertiser wants to know the deep demographic behaviour of readers who read the automobile section in both print and Web editions.

A part of your Web data may have most of those dimensions already, but that identifiable information is only about 20% of the total, collected from a sign-up process most users avoid. The remaining 80% provides partial demographic information from a third-party data management platform (DMP).

The other part of the user universe is from your print side, where various functions collect user data in silos. And, as is true in most cases, the pre-sales department that has to supply the data to the advertiser doesn’t have a clue about what data various departments like subscription, events, and circulation have collected so far.

All this will lead to more conjectures than reliable analyses, supplemented by questionable inputs from agencies based on ludicrous sample sizes. Even the best data-processing architecture wouldn’t be of any help here.

In a better scenario, all functions would regularly upload data in a prescribed format into a central location. Trend and demographic information from each piece would then be automatically compared, de-duplicated, and extrapolated to project for the entire universe, combining web and print. And, any slice of it would be readily available, in a template that often advertisers ask for, avoiding last-minute hustles.

4. Another important piece in the data canvas is the reporting template. Whether you use Big Data, standardising analytical reports across processes has great benefits. It forces various functions to look at data from organisational perspectives.

Frequently changing the nature of reporting templates only complicates the process of decision-making. We all take time to adapt to a new reporting template.

The traditional offline spreadsheet-based ad-hoc reporting style is a big obstacle in standardising reports. Decide which metrics, dimensions, and time ranges best identify the success or failure of content functions. Create templates accordingly, and then automate and follow them uniformly.

5. This is specific to automated processes. Once you have subscribed to online Big Data solutions, take a close look at what data you are saving and in what quantities. Most online Big Data solutions charge by the size of data sets queried into and the number of queries made in a billing cycle.

To save costs, minimise storage of raw data and save extracts in a form that supports your reporting templates (point No. 4 above). That will make your processes faster and more efficient.

In my estimate, up to 70% of regular editorial analytics requirements don’t require custom Big Data queries. That lets us be selective and save storage costs. Store only select sets that would make sense for historical analysis in the future, and activate auto-deletion at suitable intervals.

Another great cost saver is an intermediate data-extract layer between your Big Data server and the visualisation interface. Put the extract on an in-house online SQL setup so you can run unlimited queries into it without incurring extra costs. This is a bit technical and can be discussed in another post.

And finally, if you call a consultant for setting up a data warehouse and analytics architecture, don’t get sold on to the fancy frills these consultants talk about, regardless of how interesting they may sound!

Many such companies earn handsome commissions from hardware, software, and/or SAAS solutions their clients purchase; that’s a globally accepted revenue stream. The point is, you should not be too enamored to spend on stuff that is useless and capable of making people question your decisions in the future.