Hadoop: Operations

Hadoop: Operations CoverMapReduce and Hadoop itself is a relatively new topic. This causes each book to be inspected carefully. While the “Hadoop. Definitive Guide” is regarded as THE Hadoop Book, it is very thorough, describing every single detail of the Hadoop, HDFS and their APIs. The “Hadoop Operations” by Eric Sammer seems to be a much better position for administrators, but I found it useful as a brief introduction for complete newbies to MapReduce.

The content is divided into 11 chapters, which makes each chapter 30 pages long in average. Thanks to that the book is much more pleasant to read, as each chapter is an easy to bite chunk. Each of the chapters covers completely different topic, not only with a raw interpretation of Hadoop decumentation (which is widely known as somehow incomplete) or “The Definitive Guide”, but also a solid piece of advice.

Three hundred pages may seem not enough for such a broad topic, but believe me – it’s more than sufficient. As mentioned, aach of the chapters is dedicated to different aspect of Hadoop cluster administration. Even though they seem short, they are packed with information, which makes “Hadoop Operations” useful also as a administrator’s reference.

People interested in the development of MapReduce jobs or developing any other tasks will find the book rather useless, or as a mere teaser. Except for one chapter, describing the high-level concept of MapReduce, it does not provide any other hints or instructions on using a Hadoop cluster as a tool.

When talking about installing and configuring a Hadoop cluster, it is difficult to omit projects from Hadoop ecosystem. While I could understand why the projects like Hive or Pig were not mentioned, the lack of HBase (which is very often installed along with the HDFS and MapReduce) is a large oversight. The greatest misunderstanding though is not describing the Apache ZooKeeper, which assists in configuring and maintaining each Hadoop cluster.

“Hadoop Operations” does good when it comes to being up-to-date with Hadoop development. It describes the new API and the features that come with the 2.x releases, which are currently alpha. The mentioned features are high availability and HDFS federation, and reading about them is a really great thing.

I believe “Hadoop Operations” is a great position for anyone responsible for installing, configuring and maintaining a Hadoop cluster. It is also a wonderful companion for “Hadoop. Definitive Guide”, with lots of hints and common knowledge. If you are a MapReduce developer only, you’d better skip over to books concentrating on the algorithms and Hadoop in use.

Review author: Mateusz Haligowski

Mark: 4.5 / 5