Synopsis: Reference book for working with Hadoop on a daily basis
Hadoop in Practice isn’t so much a tutorial on how to learn Hadoop as it is an at-work reference for how to accomplish common tasks. Holmes does a good job covering the mechanics of moving data in and out (including how to use Sqoop for import and export with MySQL), utilizing Hive and Pig, and even how to integrate Mahout for predictive analytics. Special kudos for an entire chapter that covers streamlining HDFS using compression.
The book is not without its problems, however. I came across several issues when trying to run the examples, most due to version incompatibilities. In retrospect it is not a problem with the book per se, but a testament to how quickly the Hadoop ecosystem is evolving. Just be sure to reference the following link when running into problems with the practical exercises:
The other problem is closely related to the first: the book is already dated by the fact that it does not reference YARN or Spark. These technologies have quickly gained significant followings and need a place in the second edition of Hadoop in Practice, which I hope is not long in coming.