Tuesday, September 15, 2015

Sunday, June 7, 2015

Apache Mesos Essential is now available

I am extremely excited to announce that my book Apache Mesos Essentials is now available. Here are the links :

Packt - https://www.packtpub.com/big-data-and-business-intelligence/apache-mesos-essentials

Oreilly - http://shop.oreilly.com/product/9781783288762.do

Amazon - http://www.amazon.com/Apache-Mesos-Essentials-Dharmesh-Kakadia/dp/1783288760

If you're curious what's in the book, here is the table of contents:

Sunday, April 26, 2015

LSPE Presentation on Mesos

Slide Deck from my presentation on Apache Mesos at Large Scale Production Engineering India (LSPE-In) Meetup that happened in June last year. Just noticed that this post stayed in my drafts folder somehow for almost a year.

https://speakerdeck.com/dharmeshkakadia/managing-resources-at-scale-with-apache-mesos

Friday, August 15, 2014

Compiling hive for a non-release hadoop version


We have been working on many interesting things around Perforator like extending the core model to other systems like hive, tez etc. At MSR, we developed on hadoop-yarn trunk and have deployed that with a version name 3.0.0-SNAPSHOT. and for a lot of other reasons, we can't just rename the version. I have struggled a bit in last few days to run hive on top of the non-release version like ours and this blog highlights my solution.

While hive documentation have step to compile from source I could not find any documentation on how to compile for a version not yet "integrated" with hive. I was hoping to find at least some information on this from developer page of hive. Surprisingly enough, I didn't get anything there. Looking at the pom.xml, I had an early impression that, just changing the 0.23-hadoop-version in the pom.xml would do the trick. But it turns out that hive on starting, calls the hadoop version command to decide what shims to load. and thus will fail with unrecognized hadoop version error, like following.

There is an important piece of information as highlighted above, ShimLoads.java is the culprit, as before loading the Shims for a given hadoop version it does a sanity check that the hadoop version number is valid. so just go ahead an make sure that this tests passes. I know this test is important, but if you are in situation like me,  just go ahead and add the following lines in the file.

 $ vim shims/common/src/main/java/org/apache/hadoop/hive/shims/ShimLoader.java
and add the case statement for your version number. I will add "case 3:" as shown below.

Now you can build hive with following command.

 $ mvn clean install -Phadoop-2,dist
After successful completion of the above command, you would find the packages hive distribution in the packaging/target/ folder.

Gotchas

  1. Don't use JDK 7 otherwise you might hive will fail to compile.
  2. https://issues.apache.org/jira/browse/HIVE-3197
    https://issues.apache.org/jira/browse/HIVE-3384

  3. Contrib module in current hive trunk is broken by the dependencies on the package org.apache.hadoop.record, which is moved to hadoop-streaming project and then reverted.
  4. https://issues.apache.org/jira/browse/HIVE-7077
    https://issues.apache.org/jira/browse/HADOOP-10474

    If you encounter this, there is a simple workaround for that. You can just build the contrib module with version of hadoop that is not effected by above change. Instead of doing any changes to the pom file etc, I simply built it with the hadoop-1. While this is likely to be fixed in the future version,here are the commands to workaround,

     $ cd contrib
     $ mvn clean install -Phadoop-1,dist
     $ cd ..
     $ mvn install -Phadoop-2,dist
    
    Note that we have removed clean from the goals, so as to avoid compiling the contrib module with the hadoop-2. Since mvn sees that the module is already compiled, it just goes ahead with the rest of hive.

Sunday, March 23, 2014

Interesting Data/Infrastructure Projects

There are so many things happening in this space, its not easy to keep track of many interesting projects. I am trying to compile a list of such interesting projects on Github at https://github.com/dharmeshkakadia/Data-Infra-Projects. While currently its very minimal and obvious list, I plan to add many significant things to it over time, like,

  1. Feature/Performance comparison of different projects with similar goals.
  2. Categorize them better.
  3. Links to understanding these projects better.
I am sure that I have missed many many interesting projects. So, help me complete it.

Sunday, December 15, 2013

Book on Mesos

Update : The book is available.

I will be writing a book on Apache Mesos focusing on practical aspects as well as internals. The book will be published by Pack Publishing. Here is the rough outline of the book.

I welcome any suggestions and feedback.

Chapter 1: Running Mesos

At the end of this chapter, user should be able to run Mesos cluster. We will introduce modern data center and the problem Mesos trying to solve. Then we will move on to installation with different configuration.

List of topics that will be covered in the chapter:
• Modern data centers
• Requirement of a resource manager
• Introducing Mesos
• Mesos vs resource management frameworks
• Running a single node Mesos setup
• Spawn Mesos cluster on Amazon cloud
• Setting up Mesos on a cluster

Chapter 2: Running Spark on Mesos

This chapter covers how to run Spark framework on Mesos. Running Spark on Mesos

List of topics that will be covered in the chapter:
• Spark introduction
• Running Spark locally
• Installing spark on Mesos
• Configuring specific Mesos and Spark options
• Avoiding common traps

Chapter 3: Running Hadoop on Mesos

This chapter covers a short introduction of Hadoop and how to run Hadoop framework on Mesos.

List of topics that will be covered in the chapter:
• Introduction to Hadoop
• Use cases for running Hadoop on Mesos
• Data locality in Hadoop on Mesos
• Installing Hadoop on Mesos
• Optimizing Hadoop deployment on Mesos

Chapter 4: Complex Data Analysis on Mesos

This chapter explains Strom – a real time computing framework and covers how to run it with Mesos.

List of topics that will be covered in the chapter:
• Real time analysis using Storm
• Installing and configuring Strom
• Hypertable: big data processing system
• Running Hypertable on Mesos

Chapter 5: Chronos and Marathon

In this chapter we introduce, Chronos and Marathon as important components for data center Operating System.

List of topics that will be covered in the chapter:
• Chronos as a cron for cluster
• Installing Chronos
• Marathon for managing long running services on Mesos
• Installing Marathon

Chapter 6: Understanding Mesos Internals

This chapter gives a deep-dive into the working of Mesos. We will introduce Mesos architecture and as well as various design choices made by Mesos and their implications.
List of topics that will be covered in the chapter:
• Architecture
• Resource sharing between frameworks
• Offer based scheduling
• Resource isolation
• Fault Tolerance

Chapter 7: Porting a Application framework for Mesos

This chapter explains how to port an existing framework on Mesos using Jenkins plugin as an example.

List of topics that will be covered in the chapter:
• Good candidate frameworks for Mesos
• What does it take to port a framework
• An Example: Jenkins
• Writing Scheduler
• Writing Executer
• Testing your framework
• Debugging

Chapter 8: Administering Mesos

This chapter is targeted towards system administrators and devops and will discuss various best practices while running Mesos clusters.

List of topics that will be covered in the chapter:
• Hardware considerations
• Automating Cluster Management
• High-availability Considerations
• Logging and Monitoring a Mesos cluster
• Recovery in Mesos
• Locating and correcting problems

Update : I am happy to share that I have finished all the chapter drafts and now acting on reviewers feedback. The book is called "Apache Mesos Essentials".


Sunday, November 24, 2013

Copyright © 2014 Dharmesh Kakadia. Powered by Blogger.