You’re going to need minions for this one. Let me explain…
Hadoop inevitably comes into every Big Data discussion, even thought it isn’t a part of every Big Data solution. Even people who know very little about about it have at least heard about it.
Now, while mastering Hadoop can be complicated, having a basic grasp of it is not. Hadoop consists of two main parts: a file system name Hadoop Distributed File System (HDFS) and something called MapReduce. There’s not much to say about HDFS other than the fact that it’s a file system, so let’s discuss MapReduce instead.
As it turns out, understanding MapReduce from a high level is ridiculously easy. The two parts are Map (breaking a big task into smaller tasks and distributing them individual servers in a cluster) and Reduce (reassembling the results of those smaller tasks back into the completion of the big task). When I explain this I like to think of minions, because, as every evil scientist knows, taking over the world is a big task. You need to break it up into smaller tasks and assign it to your minions.
You assign a big job (like conquering London) to your main henchman, who we’ll call “JobTracker”. He in turn assigns smaller jobs (like contaminating the water supply with mind-controlling drugs, getting the cooperation of the taxi drivers, kidnapping the mayor) to lower minions who we’ll call “TaskTrackers”. This is the “Map” part of MapReduce. The TaskTrackers then report back to the JobTracker that they are finished and he puts everything together and hands you the keys to the city. This is the “Reduce” part of map reduce.
So basically MapReduce is ideal for huge jobs (like sorting petabytes of data) that would take a single server way too long. Teamwork makes the dream work, right?
And if you decide to setup a Hadoop cluster, you can paint your servers yellow if you want but please don’t feed them bananas.