It seems like every day I get a letter from a company telling me that I’ve got to get a “Big Data solution” installed or my company will fail. I also get “real world of Hadoop” e-mails – usually links to white papers on why everyone is adopting Hadoop and NoSQL solutions to solve their problems.
The phone calls come as well. Usually the calls are from big name IT firms. They invariably have the same message: They have a solution and consultants that will solve all of my problems by installing a Big Data solution.
I tend to like to play with the callers a bit, first to stay up with the latest technology, and secondly to see if the callers really understand my company or if it is just a cold call script.
I say something like: “What problems? Do you even know my business?” They usually don’t have a clue about my company, making their replies sound like the reading of a script loaded into their CRM systems: “Well, Mr. Bright, we are experts in the Hadoop eco-system with tools such as Apache Spark, Drill, Kafka and Hive ... ”
The rest of the pitch turns to yada, yada, yada.
Big Data is the 2010’s version of the 2000’s “cloud computing” pitch. It sounds interesting and someday worth exploring and implementing. But why? Do I need it now? Can I use it if I have it? What does it mean to my IT department? To my marketing department? What does it cost? What does it take to use? How much training does it require? How long does it take to deploy? And, in the end, what will I get, and is there a return?
I suggest that you do a really deep think through these questions before making any kind of jump into the current state of Big Data. Like the cloud when it first came out, early adoption has risks. The analytical tools and the querying tools are still being developed, so they can be difficult and expensive to use. Vendors are entering, leaving, merging, and acquiring, so the landscape is in flux.
The technology does have a place, but you have to do your research before you get too far into the discussions and find yourself committed to a path that doesn’t fit your needs.
There are many Web site security and performance monitoring reasons to bring in Big Data technologies right now. This post and my follow-up (next week) will set those aside and concentrate on the marketing, analytical, and content recommendation reasons to explore Big Data technology.
The first step is understating what Big Data is and what it is not. There is difference between having “lots and lots of data” and true “Big Data.” The difference is important to understand before you find you’ve invested in the Big Data eco-system for marketing and sales analysis and didn’t really need to.
The difference between “lots” and “big” occurs when you have to make a decision (in the app or on the Web site) using data collected just seconds ago.
Big Data is real time. Lots of data can wait. Big Data in the media vertical will drive story recommendations or alter the next page viewed or ad served. (Again, I’m ignoring the IT security aspects, which in and of themselves can drive to a Big Data need.)
You make the jump to big when you have such huge amounts of data streaming in so fast that traditional analytical applications can’t process quickly enough to present an answer needed when it is needed.
It is important to pay attention to the “when needed” statement. It is a three-part statement in just two words. First getting the data, then analysing it, and finally sending back an answer that is actionable.
Vendors selling the technology will gloss over the need; for them, instantaneous is their world. But, getting, analysing, and then presenting an answer involves linking layers of complex technologies.
For example, your Web site/app will have to pass information through the Big Data technology for processing. Then it has to trigger an analysis, and finally the Big Data technology will have to pass the resulting calculation (recommendations) back to the Web site software to be handled and used to alter presentation. Quickly.
One action without the other isn’t a solution but a point of frustration to all. If you don’t need the answer immediately, can’t change a page presented, or can’t adjust content based on the analysis answer, you are not yet ready to go big. At least not yet.
If you are ready to make recommendations and presentations to your digital users based on their current interactions and if you are able to push the information between the technologies and you have the volumes of user information getting collected, then you probably have arrived at a place with your company where investing in the Big Data space to drive user experience is something for you.
Another potential reason to enter the Big Data technology stack is, but not always, as a way to use different technologies and methodologies to move along a large-scale complex data warehousing project.
The Big Data “ingesting tools” are very good at bringing into their analysis technology eco-system complex data from a large number of sources. This is typically done without a predefined table schema in place.
Eventually, even in Big Data, you’ll have to link the data like you do in a traditional database structure. However, the advantage (to some) is that you can shift who does the linking so the data can be ingested faster.
Think of a large organisation with many different CRM, billing, receivables, and other systems that are just different enough that they won’t fit together. Big Data technology can speed the consolidation process as the technology allows you to defer many of the database design decisions until after the data is ingested into the Big Data eco-system.
You still have to do the data joining and mapping, but the “who” that does the work changes in Big Data – it shifts to the data scientist.
Still not sure if going big is right? In my next installment, we’ll look at what kind of data you have and whether it is or is not Big Data – something you should know before investing in Big Data technology.