Synopsis: An overview of Big Data technologies like MapReduce, Hive, and Mahout, with occasional code examples.
Difficulty: Beginner. No, Intermediate. Maybe.
A member of the new Data and Analytics Series by Addison-Wesley publishing, Michael Manoocheri’s “Data Just Right” covers a lot of ground for such a short book, but I’m not sure that it does so very well.
I guess the problem is that I can’t seem to figure out who the book is for. There are some simple explanations of certain technologies that lead me to believe it’s for beginners, but then a few pages over you’ll find code for implementing a MapReduce function. Which begs the question: if you just taught me what Hadoop is and now we’re using MapReduce, how did we skip the step where I set everything up?
Then Manoocheri uses the now well-worn example of spam classification for explaining Bayesian probability (is this really the only thing that Bayes is good for?) and the Netflix competition example for recommendation systems. Note to future book writers: these have been done to death. Move on.
But my biggest peeve about the book is the fact that we are treated to a total of nine pages about Machine Learning and Mahout. Really? Entire books have been written about each of those, but you’re going to give us nine pages?
I guess this book is aimed at people who have experience in, say, database management and want to know about dealing with large datasets. Or maybe it’s for web programmers (or others with a technical background) considering a career change. At any rate it’s hard for me to recommend this book, because I don’t know who to recommend it to. So I won’t.