I have always run my technology choices under the mantra that you “date” your software. Never fall in love and never get married, because something better always comes along.
I picked or inherited the tools I’m using roughly four to six years ago. A few feel tired. A few are OK, and some are still quite nice. I’m going to spend this first quarter working my way through the technology stack to see what needs to get kicked off the dance card.
Where are you with your data analytics toolbox? Ready to do a revisit?
Most of you probably made your data analytics and database tool decisions four or five years ago (or your predecessor made the choices) and have not paid much attention to the marketplace since then. Well, it is probably time to revisit your decisions. Do your tool choices get you to your long-term goal?
Do you think Big Data still has shine to it based on what you will actually do with the data? I’d bet your goal today is different than it was a few years ago, and some of you are struggling with your old tool choices to help you reach your new goals.
I, for one, underestimated how much the mobile side of things would replace (and invent new) goals. On the database side, I am sure you are surprised at how long it takes to get anything from your chosen Hadoop query tool and how well some vendors have integrated solutions so you don’t need to build your own (content recommendation engines).
So, let us take a virtual walk down the virtual trade show of vendor booths. I do not profess to have knowledge of all of the latest goodies — there are hundreds — so our virtual trade show is just going to cover a few of the vendors. Feel free to post additions in this blog post in the comments of anything you feel noteworthy.
Choosing a big database is actually more complicated today than a few years ago. The type of hardware (big boxes or cheap commodity level), the type of analysis you are doing, and the width of the data tables going into the project are all key decision points leading to different database vendors.
The typical media company is willing to spend on hardware, the data tables are fairly wide, and, unless you are one of the few national “brands,” your data file row counts are fairly small (under one billion).
So, if you are still interested in a true Big Data stack, take a look at the top players: Apache HBase/Hadoop and Cassandra databases are worth a look-see if you want scale with high availability. An interesting player if you are using Amazon Web Services for your cloud solution is the Amazon SimpleDB. (I haven’t used it, but from the propaganda sheets it looks like it will take both hardware and software maintenance out of your realm of worries.)
If you are still in the thinking stage about the Big Data leap before diving in, consider the resources of your technical and analytical staff. The tools here are still a different stack of technology from your daily transactional systems — splitting your thin resources can introduce some unfortunate unintended consequences.
And, finally, before jumping in with the elephant, make sure you really need to. There is a difference between having lots of data and truly needing the elephant (Hadoop).
A final thought on going big: In the media space, there are some well-packaged software tools that easily integrate into your existing Web site if your primary goal is a recommendation engine. Consider buying before building — time to deployment must be considered. In my mind, the jump to true Big Data tools is going to be driven by the speed you need to respond to your Web and/or mobile traffic to guide it in its content selections and recommendations.
Hard to do much arguing here with choices. There are two very good and very well-supported tools, and, given the worldwide acceptance of both, it would be really hard to go wrong with either Microsoft’s SQLServer or the MySQL family. If you want to take a look-see at other options, here are a few vendors beyond the main players: AMPLAB Shark, Amazon Aurora, Drizzle, or HiveDB.
Again, MS SQLServer and MySQL are good, solid bets for all but the largest of media companies. Even if you are one of the big media companies, there is a strong argument I could make to split your strategy and run your data in both the SQLServer and Hadoop spaces.
For off-line traffic analysis and recommendations, if you want the terrestrial world analytics, use the SQL-level tools and use Hadoop for tackling the clickstream and real-time recommendations of the online world.
Typical advertiser and subscriber analysis doesn’t need the complexity of introducing a Big Data tool for your analysts and IT staff.
The speed of a first-returned record over a database with the entire U.S. address file (130,000,000 rows) with as many as 5,000,000 subscriber records is going to be faster in SQLServer or MySQL than with the Big Data tools with multi-clusters involved. The opposite is true when querying more than 10 billion rows of clickstream data.
Here is where popularity and flock mentality are rampant in the markets. The name brands Tableau, Cognos, Microstrategy, and the Microsoft BI tools are getting the trade magazine exposure. Meanwhile, some very good niche players, such as the United Kingdom’s BlueVenn and its end-to-end family marketing tools, are lost.
Here, I feel the decision must focus on keeping your number of vendors to a minimum and keeping your analytical (or marketing automation) endgame goal in mind. There are also the smaller players such as MapReduce, Pig and Lipstick, Amazon QuickSight, Spotfire, Qlik, Zoomdata, Pariscope, and about lots of others. The top-tier tools have amazing capabilities, so do not fall in love.
You have to buy tools to support your individual company goals, not just the shiny toy. The query/analysis tools area is where the largest gains of the past few years were made. If you picked a query tool or analysis tool more than three years ago, you really should revisit this area.
There will be some duplication of brands and tools in here from the query/analysis section because reporting is often seen as an extension of query. The vendors are starting to build packages that combine query, analysis, and visualisation into one package or integrated suite. The big players are Tableau, Domo, SAS, and Cognos, with some nice niche tools from SparkR, Plot.ly, and BlueVenn.
Tableau, Cognos, and Domo are very flexible and nice for packaging and distribution of reports. BlueVenn has done an excellent job of taking its tool and extending it to a marketers dream solution. It is a complete end-to-end product: from market analysis to omni-channel campaign execution with integration to ESP, CMS, and CRM tools, and, finally, true data-driven, real-time triggered “next steps.”
Vendors are building complete suites of solutions. Everyone from SAS (seen by many as a statistical tool vendor) to Salesforce is building end-to-end marketing and analysis tools. Look at your long-term goals: Are they analytical or are they marketing engagement using data?
Pick a tool that gets you moving to the goal and not the shiny new thing. Shiny may be nice, but when all is said and done, did shiny move you to the goal or off onto a tangent?
Space and time limit the vendors mentioned in this review. I know there are some incredible tools out there I did not mention. If you want to add any to the list, feel free to post a comment.
For those who made a tool choice three or four years ago, the perfect choice then isn’t the same as a perfect choice today. You should take another look and get moving toward that ultimate goal. If you’re struggling with getting ROI from your choices, it’s time to look again.
Return-on-data-investment is there. Maybe it isn’t the data but the tools holding you back.