Who are you? What do you like? And how will I know if we meet again?

If you want to talk about some of the most serious challenges facing the industry, these are the sorts of questions you are going need to come to terms with and understand the importance of. At their core, they are questions that hinge on the idea of identity.

By identity, I mean the ability to recognise, verify, and most importantly, establish a persistent one-to-one relationship with the reader, regardless of platform.

Unless you’re one of the few publishers that has implemented a “hard” registration wall as a gateway to your Web site/platform, chances are the majority of your readers are anonymous. Syndicated data and research studies are providing a more accurate picture of the uniqueness of your audience, but just at the aggregate level. They still lack the depth, dimension, and measurement necessary for a truly customer-centric view.

Simply put, regardless of how much behavioural or transactional data we collect, we often do not truly know who our readers are in a lasting and meaningful way. To be clear, I’m not just talking about having names in a database; I’m talking about a relationship based on trust, in which an identity is established in exchange for value.

This lack of direct connection is the sort of thing that should keep you up at night. I realise this is not exactly a new concept or an emerging phenomenon. But the consequences are now greater than just an inability to measure audiences accurately.  

Before exploring further, let’s take a step back.

Advanced analytics and data-mining disciplines serve as mechanisms to understand, segment, or predict audience behaviour – to drive the development of new products, experiences, and services to attract and further engage the reader.

The proliferation and combination of behavioural, social, and transactional data is resulting in more complex unstructured data, placing even greater emphasis on data preparation and analytical data modeling.  

Luckily, we might in fact be spoiled for choice in terms of how to best equip ourselves. Every day, new Big Data capabilities being introduced, many open-sourced, designed specifically to address the “3 V’s” of Big Data (volume, variety, and velocity).  

With this in mind, the single best initial investment I would recommend as part of a Big Data programme is the development of reader-centric analytical datasets. Think of them as a series of rows and columns in a set of giant tables in which every row represents an individual reader and every column represents a variable or reader-centric attribute that can be used to describe that reader and their behaviour.

These attributes are typically based on observed, transformed, or derived data. They can range from the simple (e.g. browser type or number of visits to a certain content area within a day), to the complex (e.g. percentage of total pageviews in a week that were specific to a certain section or content area, number of visits where time spent was greater than 20 minutes, etc.).  

As part of data-set scoping exercises, I’ve seen creative data miners define thousands of individual attributes, relying almost solely on their experience and imagination.

There are any number of programming frameworks that can be leveraged to automate the generation and time-stacking of these data sets – basically letting you assemble a series of images that gives you the same level of fidelity you might see in an ultra-slow-motion video. These data structures are a data miner’s best friend.  

When implemented and maintained thoughtfully, these data sets will function as the foundational layer to your Big Data analytical stack, enabling a range of analytical outputs from contextual analytics to predictive modeling to recommendation engines.

However, these underlying structures also magnify the importance of identity. The unique parties in these data sets are really only as unique as the identifier that defines them.  

There are a number of methods to capture identity (registration, authentication, single sign-on, etc.), but first and foremost publishers must recognise the purpose behind it all has evolved beyond simple lead generation or a tactic to grow the e-mail database.

There is a battle for identity looming and publishers must take ownership of their audience. Google, Microsoft, Facebook, and other significant players are all rumoured to be developing new fingerprinting and identification technologies. There is already a post-cookie movement. New standards will be developed.

There’ll be big trouble for those who have not undertaken serious measures to understand the role of identity within their business models and taken proactive steps to own it. Have you seen the new Xbox?  It recognises you and can auto-authenticate your sign-in via facial recognition.

These guys aren’t fooling around. And it’s not just the usual suspects. Every week now, new start-ups are popping up, claiming to have developed the new identity secret sauce.

This intense concentration on solving the problems of identity is really driven by a single business imperative: If you know who someone is, you can establish a relationship with them.

Once you clear that bar, though, you can start to talk seriously about identity as something that drives your strategy, inspires creativity, and facilitates the sort of innovative tactics that you will need to beat the competition.

And, at that point, the onus switches to making sure that capturing and maintaining identity is directly integrated into your overarching business models – current and future.     

So what value can we draw from reader identity and the direct relationship it represents? Here are just a few of the more fundamental concepts:

Audience and business intelligence: We need to move beyond aggregate or session-based reader analytics. The current and future business environment requires a more meaningful understanding of what, where, how, and how often readers want what they want.

“Long game” cohort or time-series based analysis provides far more insight into the experiential elements of reader behaviour, allowing us to truly understand the predictable and non-predictable habits that fuel our business (the essence of propensity or probabilistic modeling).

Recency, Frequency, and Monetisation (RFM) modeling also comes to mind as a good example of identity-centric intelligence. An old hat analytical concept proven in other industries, RFM analytics provides tangible insight into the relationship between engagement level, cadence, and value across identifiable audience segments.

Emerging business models will demand a far more intimate understanding of reader behaviour that cuts beyond any one single point in time. Identity in this regard is about stitching together all of these moments in time and surfacing intelligence that describes an experience as a whole.

Customer relationship management: Do you sell a product or subscription as part of your business model? If yes, then customer relationship management (CRM) needs to become an established, cross-functional capability – if it isn’t already.

Financial institutions, telecommunications companies, and retailers have long benefited from advanced direct- and database-marketing, and yet the capability is still somewhat immature among publishers.  

A lot of time and attention is spent to get a customer, to make the sale. Customer relationship marketing is just as much about managing a profitable relationship after the sale is made. Classic CRM principles speak to the disciplines required in order to target the right customer at the right time with the right message.  

All of this is fundamentally customer-centric in nature. There are well-documented analytical concepts meant to support a robust CRM strategy, so I won’t go into detail here other than recommending an investment in predictive modeling and a good understanding of test-and-learn fundamentals.

The essential point here is that all data related to the customer account (products/services), behaviour, and marketing contact history is stored and recorded at a customer-centric level. It doesn’t work any other way, and there can be no work-arounds or shortcuts.    

Personalisation: We’ve all seen the use cases – thanks to Netlfix, Amazon, and eBay, recommendation engines are now a household term.

For publishers, it’s as much a philosophical issue as it is a technical challenge. But personalised and recommendation-based user experiences are concepts that will need to be addressed within any content publisher business strategy one way or another.  

Collaborative filtering and other techniques applied by recommender systems require detailed data on readers’ context and their interest graphs. A recommendation algorithm can only be as accurate as the quality of the data it’s training on.

This, then, is an obvious use case to illustrate the value of identifiable reader personas and an ability to persist the identity over time.

We’re talking about recommending news and information content here, not a camera case; it better be accurate and it better understand what makes me tick if it’s going to be relevant.  Recommendations based solely on what I read three minutes ago are not going to cut it.  

Advertising: Regardless of your digital technology stack, ongoing market shifts, and the dynamic of increasing programmatic buying, things in this space typically have and will continue to come down to data.

Differentiation here comes down to the best marriage between observed, derived contextual data, and first-party “collected” data.

Important industry metrics such as time-spent or your ability to manage audience migration and choice between delivery platforms (devices) will require a single, constant understanding of the reader.

Simply put, if data is what we compete on — and the platforms matter — then there is nothing more important than an understanding of the audience in a sustained way. “Selling on audience” and “competing on data” are terms we’ve used before, so there should be no surprise here.  

Lastly, it is not possible to discuss this subject without speaking to the obvious privacy and compliance implications. Let’s face it, in some cases here we’re talking about sensitive first-party information. At the very least, we’re talking about readers’ behaviour and what we know about what, when, and how they like to read.  

An internal data policy and external privacy policy is essential. Organisations taking on a serious Big Data mandate should also invest in processes and internal policies that protect employees from external forces and often, frankly, from themselves.

Infrastructure and safe data management practices should be established, audited, and governed to protect everyone from malicious threat.

Thanks for reading, now tell me, have we met before?