Upon reading his obituary in a local paper, noted author Mark Twain supposedly wrote to them, “The reports of my death are greatly exaggerated.”
Judging by articles in the tech press lately, Hadoop must feel a kinship with Mr. Twain.
I wish I could say I’m shocked. Given the Hadoop implementation projects we have participated in lately there would certainly be cause for surprise, but the truth is I’m intimately familiar with the implications of the Gartner Hype Curve so none of this is news. We are definitely past the Peak of Inflated Expectations concerning big data technologies, in particular the Hadoop platform, and that places us squarely in the Trough of Disillusionment. Which means, of course, the Plateau of Productivity is coming when we will realize Hadoop may not solve poverty but it won’t cause the sky to fall, either — and we can settle into getting some work done.
Part of the Plateau of Productivity will be a revising (and reviving) of plans to use Hadoop in organizations that have tried to embrace it before, so to help those who want a second try at the benefits of big data here are the most common causes of failure and what to do about them.
Lack of a Compelling Business Use Case
There have been a few different studies done on the leading causes of big data project failures, and at the top of everyone’s list is some variation on “Lack of Business Case/Lack of Executive Sponsorship.” Not surprisingly, this is also near the top for projects in general.
Hadoop clusters are too expensive–not just in terms of equipment cost but of time invested–to simply stand one up and “see what’s in our data.” There should be a business use case driving the initial purpose behind using Hadoop and that one use case should become a steady stream of projects, one building on another.
To make that possible executive sponsorship is essential. Someone, somewhere in the upper echelons of management has to recognize the potential of big data for their business unit and champion the cause. Science experiments rarely produce any return on the investment.
Trying to Boil the Ocean
“Go big or go home.””Aim high.””Zero to Hero in 90 days.”
While these slogans may sell a lot of career planning and self-help books, they are ridiculously damaging to the success of a big data initiative. Here’s a much more relevant one: “Rome wasn’t built in a day.” Yahoo spent years developing and implementing what would become Hadoop, and industry giants like Wal-Mart, UPS, and P&G, who have all reported great returns on their big data strategies, didn’t experience overnight success either.
It may seem counter-intuitive, but the secret to success with big data is actually to start small. It’s easy to get caught up in success stories from massive organizations telling how they reaped millions in benefits from analyzing petabytes of data, but the odds of anyone getting that kind of return early on are slim to none. Better to have a long-term plan that involves a series of small, quick successes, each building upon the previous one.
Hadoop as an ecosystem has been around for over ten years, but new components and capabilities are added constantly. As a result the number of people who know Hadoop in any meaningful depth is small to say the least and compounding the problem is that administering Hadoop and using it are two completely different skill sets, so finding both can feel as impossible as finding a politically unbiased news outlet. Add in data analytic abilities as a requirement and you are truly looking for the proverbial needle in a field of haystacks.
Organizations with successful Hadoop implementations gave up on the unicorn search a long time ago, and rightfully so. Rather, they have put together analytics teams of which a Hadoop expert (or multiple experts) is only one part.
Unfortunately, finding candidates with the right skill set is only the beginning of your problems; because these people are rare and they know it, the salary requirement will be high and they may not stay around long. Even if you bring on someone at the beginning of their career, once they have gained enough experience working with the company’s data environment they become prime targets for recruiters and have a tendency to leave for greener pastures.
Many times the solution to the skills gap is to use a consultant, as consultants have typically seen multiple approaches to common use cases and tend to stay around as long as they’re being paid. If you find a firm that is equally as good at teaching or mentoring as they are at implementation, the benefits can long outlast the length of the engagement because of their involvement in helping the firm develop analytic competency and infrastructure know-how.
Pulling it Together
If there is a secret recipe to success with big data initiatives, it would have to include the following:
Have a worthwhile but easily obtainable business goal in mind for the first use case
Use one success to build upon another
Find a partner that can help not only with implementation, but education
Don’t put too many roles into one person
Now, while this article is mainly aimed at Hadoop projects that failed and need to be revived, almost all of the points here can serve as best practices for new deployments as well. If you are just starting out, consider it a cautionary tale and proceed accordingly.