Does data have a “use by” shelf life? Not if your data scientists are doing their jobs
Big Data For News Publishers | 05 January 2016
There is a rule of thumb in the data world that you should save everything. Every stop, every vacation, every payment. If you can think of it, save it. You might need it. Storage is cheap. Save it all.
Well, sure, you can save it. But does data collected ever reach a point where it is too old to use? Is it possible that old data may be leading you to a flawed analytical conclusion? Is old data setting the stage for errors and waste in your marketing campaigns?
Or, is having ancient information enabling you to discover key insights that would never be possible without the information extracted from the history files?
The answers to the above questions, without a highly refined data governance and usage policy in place, are all: “It depends.” For each situation it is critical to know the context of the use planned for the old data.
Let’s take a quick look at a simple example – a telephone number. The phone number in question is from a customer who last received your publication (or site access) three years ago. Is there value in having this phone number? Should you dial it in a telemarketing campaign?
Well, it depends. You have to look at how the customer associated with the phone number left you. Did the customer leave you with a transaction indicating a move from market? Or did he leave you with a “no time to read” transaction?
A good data governance programme will help you decide how to move forward with using (or not using) the phone number. In many areas of the country, the chances are fairly good that if the separation record showed a move, the number has not been reassigned – given the migration from land to mobile phone numbers and reassignment rates of old land-lines.
I moved from Arizona to Virginia five years ago. I just checked, and my old phone number has yet to be reassigned. However, if I had moved the opposite direction, I guarantee that the old number would be reassigned. And, given the depth of penetration in Northern Virginia for area code 703, there is a chance the land-line number is now ported to a cellular carrier and thus restricted in how/if you can include it in a telemarketing campaign.
Knowing how each of your primary data elements work (need to be governed) in your particular situation is critical to moving from “it depends” to “I know.”
Another example: How about the saving of service records? For example, a password reset request, or, to use an example from the classical era of media, a wet newspaper complaint. Both are hopefully one-time events and both are usually on the storage-space-police’s list for having a short retention period. But when looked at over a time period, your data scientists might find something alarming.
Let’s take the wet newspaper transactions. Take a peek at how they come about and the long-term story. Pick a day when you had rain unexpectedly drop onto your delivery area and you know that the newspapers turned into soggy five-pound driveway-speedbump logs.
Can’t find a day like that? Query your database and do a count of complaints by day. Find the peak days, isolate by day of the week pattern, and you’ll likely find an event to analyse. With your day isolated, do a count of wet newspaper transactions. You will probably be surprised at what you see. Even though you know most of the newspapers were wet, only a few customers actually called in to complain.
Your intuition probably said there were more – after all, your call center was swamped and dealing with 20-minute hold times. But the reality is, as a percentage of deliveries, the complaint ratio is low.
The next leap your data scientist will jump onto is that, at a point roughly 10-15 weeks later – if not longer (delivery grace) – you have a spike in non-pay stops for no apparent reason. Can the data scientist connect the stop pattern to root causality? Is this the fault of the wet newspaper day?
It seems that the people who had wet newspapers decided to let their subscriptions expire rather than press for a refund. Test the hypothesis by finding a day when you know only a few delivery areas didn’t get the squished trees in rain bags when the weather went south. Data scientists love this stuff!
How about the reset password call? Is there a subtle pattern? You might find nothing, but you may also see a pattern that has a strong correlation to customer abandonment. This is a fun exercise for the data geek. After all, in the pure-play digital space, the content consumer is quick to point their search elsewhere; a path of least resistance is found.
So, are customer defection and password reset requests linked? Add the analysis to your scientist’s list.
Let’s go back to the phone number for the moved customer for a second before we wrap up for this blog post. Switch gears to the address linked to the phone number of the person stopping.
What is the message context in your campaigns sent to that address? The person who used to get your publication is no longer there. Do you send that address creative that is designed to win back a former customer, or do you reset the address and treat the resident as one who has never subscribed in your message?
Sending a new home owner a “please come back/former subscriber win-back discount” sends a message to the home owner: junk mail.
This is the era of personalisation. People have better things to do than read a general message. Deep data governance and usage rules require work. In the end, though, knowing what to do with what and how to use – or not use – each data element collected will make your marketing campaigns and general awareness of your customer base an empowering asset.
Doing this right requires discipline and understanding throughout the organisation because you’ll find that you shouldn’t throw any data away. You just need to pay far closer attention to how you use what is saved. And you’ll discover the need to ratchet up a notch or two on your governance and usage rules.
Enjoy. Be careful out there.