Apache Hama on Mesos

This post describes how you can set up Apache Hama to work with Apache Mesos. In another post I also describe how you can set up Apache Hadoop (http://wp.me/p3pTv0-j) and Apache Spark (http://wp.me/p3pTv0-c) to work on Mesos.

*The instructions have been tested with mesos 0.20.0 and Hama 0.7.0. My cluster is a Eucalyptus private cloud (Version 3.4.2) with an Ubuntu 12.04.4 LTS image, but the instructions should work for any cluster running Ubuntu or even different Linux distributions after some small changes.


  • I assume you have already set up HDFS CDH 5.1.2 to your cluster. If not, follow my post here
  • I also assume you have already installed Mesos. If not, follow the instructions here

Installation Steps:

  • IMPORTANT: The current git repo has a bug working with CDH5 on Mesos.  If you compile with this version you will get an error similar to the following:
    ERROR bsp.MesosExecutor: Caught exception, committing suicide.
    java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.fs.LocalFileSystem not found
            at java.util.ServiceLoader.fail(ServiceLoader.java:231)
            at java.util.ServiceLoader.access$300(ServiceLoader.java:181)
            at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:365)
            at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
            at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2364)
            at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2375)
            at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392)
            at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
            at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
            at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
            at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
            at org.apache.hadoop.fs.FileSystem.getLocal(FileSystem.java:339)
            at org.apache.hama.bsp.GroomServer.deleteLocalFiles(GroomServer.java:483)
            at org.apache.hama.bsp.GroomServer.initialize(GroomServer.java:321)
            at org.apache.hama.bsp.GroomServer.run(GroomServer.java:860)
            at org.apache.hama.bsp.MesosExecutor$1.run(MesosExecutor.java:92)
  • SOLUTION: Thanks to the Hama community and in particular Jeff Fenchel the bug was quickly fixed. You can get a patched Hama 0.7.0 version from Jeff’s github repo here  or my forked repo here in case Jeff deletes it in the future. When the patch is merged I will update the links.

So now we are good to go:

  1.  Remove any previous versions on Hama on your HDFS:
    $ hadoop fs -rm /hama.tar.gz
  2. Build Hama for the particular HDFS and Mesos version you are using:
    $ mvn clean install -Phadoop2 -Dhadoop.version=2.3.0-cdh5.1.2 -Dmesos.version=0.20.0 -DskipTests
  3. Put it to HDFS (Careful with the naming of your tar file – Has to be the same with your configuration file)
    $ hadoop fs -put dist/target/hama-0.7.0-SNAPSHOT.tar.gz /hama.tar.gz
  4. Move to the distribution directory:
    $ cd dist/target/hama-0.7.0-SNAPSHOT/hama-0.7.0-SNAPSHOT/
  5. Make sure LD_LIBRARY_PATH or the MESOS_NATIVE_LIBRARY environment variables are set correctly to your mesos installation libraries. This can be under usr/lib/mesos (default) or anywhere you specified when installing mesos. For example:
    $ export LD_LIBRARY_PATH=/root/mesos-installation/lib/

    If you don’t set them correctly then you will get an ugly stack trace like this one on your bspmaster logs and the bspmaster won’t start:

    2014-11-08 01:23:50,646 FATAL org.apache.hama.BSPMasterRunner: java.lang.UnsatisfiedLinkError: Expecting an absolute path of the library:
            at java.lang.Runtime.load0(Runtime.java:792)
            at java.lang.System.load(System.java:1062)
            at org.apache.mesos.MesosNativeLibrary.load(MesosNativeLibrary.java:50)
            at org.apache.mesos.MesosNativeLibrary.load(MesosNativeLibrary.java:79)
            at org.apache.mesos.MesosSchedulerDriver.(MesosSchedulerDriver.java:61)
            at org.apache.hama.bsp.MesosScheduler.start(MesosScheduler.java:63)
            at org.apache.hama.bsp.MesosScheduler.init(MesosScheduler.java:46)
            at org.apache.hama.bsp.SimpleTaskScheduler.start(SimpleTaskScheduler.java:273)
            at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:523)
            at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:500)
            at org.apache.hama.BSPMasterRunner.run(BSPMasterRunner.java:46)
            at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
            at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
            at org.apache.hama.BSPMasterRunner.main(BSPMasterRunner.java:56)
  6. Configure hama with Mesos. I found that the instructions from the Hama wiki  are missing some things. For example it is not described to add the fs.default.name property and without this you will get an error like this on the master when running your code:
    Error reading task output: http://euca-x-x-x-x.eucalyptus.race.cs.ucsb.edu:40015/tasklog?plaintext=true&taskid=attempt_201411051242_0001_000000_0&filter=stdout

    and an ugly stack trace on the Groom Server executor log like this one:

    14/11/05 12:43:08 INFO bsp.GroomServer: Launch 1 tasks.
    14/11/05 12:43:38 WARN bsp.GroomServer: Error initializing attempt_201411051242_0001_000000_0:
    java.io.FileNotFoundException: File file:/mnt/bsp/system/submit_6i82le/job.xml does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:397)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
    at org.apache.hadoop.fs.LocalFileSystem.copyToLocalFile(LocalFileSystem.java:88)
    at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1913)
    at org.apache.hama.bsp.GroomServer.localizeJob(GroomServer.java:707)
    at org.apache.hama.bsp.GroomServer.startNewTask(GroomServer.java:562)
    at org.apache.hama.bsp.GroomServer.access$000(GroomServer.java:86)
    at org.apache.hama.bsp.GroomServer$DispatchTasksHandler.handle(GroomServer.java:173)
    at org.apache.hama.bsp.GroomServer$Instructor.run(GroomServer.java:237)14/11/05 12:43:39 INFO bsp.GroomServer: Launch 1 tasks.
    14/11/05 12:43:40 INFO bsp.MesosExecutor: Killing task : Task_0
    14/11/05 12:43:40 INFO ipc.Server: Stopping server on 31000
    14/11/05 12:43:40 INFO ipc.Server: IPC Server handler 0 on 31000: exiting
    14/11/05 12:43:40 INFO ipc.Server: Stopping IPC Server listener on 31000
    14/11/05 12:43:40 INFO ipc.Server: Stopping IPC Server Responder
    14/11/05 12:43:40 INFO ipc.Server: Stopping server on 50001
    14/11/05 12:43:40 INFO ipc.Server: IPC Server handler 1 on 50001: exiting
    14/11/05 12:43:40 INFO ipc.Server: IPC Server handler 3 on 50001: exiting
    14/11/05 12:43:40 INFO ipc.Server: IPC Server handler 4 on 50001: exiting
    14/11/05 12:43:40 INFO ipc.Server: IPC Server handler 0 on 50001: exiting
    14/11/05 12:43:40 INFO ipc.Server: IPC Server handler 2 on 50001: exiting
    14/11/05 12:43:40 INFO ipc.Server: Stopping IPC Server listener on 5000114/11/05 12:43:40 INFO ipc.Server: Stopping IPC Server Responder
    14/11/05 12:43:42 WARN mortbay.log: /tasklog: java.io.IOException: Closed

    To be fair you are able to find all the configuration you need from the Hama configuration instructions but its not everything on one page.

    So, to save you from some of the issues you might encounter configuring Hama, the hama-site.xml configuration in my case looks like this:

     <description>The address of the bsp master server. Either the
     literal string "local" or a host[:port] (where host is a name or
     IP address) for distributed mode.
     <description>The port master should bind to.</description>
     <description>Instructs the scheduler to use Mesos to execute tasks of each job
     The name of the default file system. Either the literal string
     "local" or a host:port for HDFS.
     <description>This is the URI of the Hama distribution
     <!-- Hama requires one cpu and memory defined by bsp.child.java.opts for each slot.
     This means that a cluster with bsp.tasks.maximum.total set to 2 and bsp.child.jova.opts set to -Xmx1024m
     will need at least 2 cpus and and 2048m of memory. -->
     <description>This is an override for the total maximum tasks that may be run.
     The default behavior is to determine a value based on the available groom servers.
     However, if using Mesos, the groom servers are not yet allocated.
     So, a value indicating the maximum number of slots available in the cluster is needed.
     <description>This is the address of the Mesos master instance.
     If you're using Zookeeper for master election, use the Zookeeper address here (i.e.,zk://zk.apache.org:2181/hadoop/mesos).
     <description>Java opts for the groom server child processes.
     <description>The shared directory where BSP stores control files.
     <description>local directory for temporal store.</description>
     <description>Temporary directory on the local filesystem.</description>
     <description>Temporary directory on the local message buffer on disk.</description>
     <description>Comma separated list of servers in the ZooKeeper Quorum.
     For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
     By default this is set to localhost for local and pseudo-distributed modes
     of operation. For a fully-distributed setup, this should be set to a full
     list of ZooKeeper quorum servers. If HAMA_MANAGES_ZK is set in hama-env.sh
     this is the list of servers which we will start/stop zookeeper on.
     <description>The port to which the zookeeper clients connect
  7. You can also configure hama-env.xml if you want for example to output the logs to a different directory than the default. *Note: You don’t need to ship the configuration to your slaves.
  8. Now we are ready to run bspmaster:
    $ ./bin/hama-daemon.sh start bspmaster

    Make sure that the bspmaster started without problems by checking the log files

    $ less hama-root-bspmaster-$HOSTNAME.log 
    $ less hama-root-bspmaster-$HOSTNAME.out
  9. If everything went ok you are ready to run some examples:
    $ $ ./bin/hama jar hama-examples-0.7.0-SNAPSHOT.jar gen fastgen 100 10 randomgraph 2

    Listing the directories on your HDFS to make sure everything went fine. You should get something like this:

    Found 2 items
    -rw-r--r-- 3 root hadoop 2241 2014-11-07 16:49 /user/root/randomgraph/part-00000
    -rw-r--r-- 3 root hadoop 2243 2014-11-07 16:49 /user/root/randomgraph/part-00001

    These two files are two partitions of a graph with 100 nodes and 1K edges.

  10. $ ./bin/hama jar hama-examples-0.7.0-SNAPSHOT.jar pagerank randomgraph pagerankresult 4

    And again if you list your files in HDFS

    $ hadoop fs -ls /user/root/randomgraph

    you should get something like this:

    Found 2 items
    -rw-r--r-- 3 root hadoop 1194 2014-11-05 16:02 /user/root/pagerankresult/part-00000
    -rw-r--r-- 3 root hadoop 1189 2014-11-05 16:02 /user/root/pagerankresult/part-00001

    -You can find more examples to run on the Apache Hama website example’s page here


Cloudera HDFS CDH5 Installation to use with Mesos

This post is a guide for installing Cloudera HDFS CDH5 on Eucalyptus (Version 3.4.2) in order to use it later with Apache Mesos. The difference of installing HDFS for any kind of use is that we don’t start the Tasktracker on the slave nodes. This is something that Mesos will do each time a hadoop job is running.

The steps you should take are the following:

  • Disable iptables firewall:
    $ ufw disable
  • Disable selinux:
    $ setenforce 0
  • Make sure instances have unique hostnames
  • Make sure the /etc/hosts file on each system has the IP addresses and fully-qualified domain names (FQDN) of all the members of the cluster.
    • hostname –fqdn
    • An example configuration is the following:
      • For the datanode: localhost euca-10-2-24-25.eucalyptus.internal
        x.x.x.x euca-10-2-24-25
      • For the namenode: localhost euca-10-2-85-213.eucalyptus.internal
        y.y.y.y euca-10-2-85-213
      • where x.x.x.x and y.y.y.y are the external IPs of your namenode and datanode respectively.
  • Be careful to create the directories that the data  node and name node are using andwill be set later on the configuration files inside the /etc/hadoop/conf.name directory. Also remember to change ownership tohdfs:hdfs
    • on data node:
      $ mkdir -p /mnt/cloudera-hdfs/1/dfs/dn /mnt/cloudera-hdfs/2/dfs/dn /mnt/cloudera-hdfs/3/dfs/dn /mnt/cloudera-hdfs/4/dfs/dn
      $ chown -R hdfs:hdfs /mnt/cloudera-hdfs/1/dfs/dn /mnt/cloudera-hdfs/2/dfs/dn /mnt/cloudera-hdfs/3/dfs/dn /mnt/cloudera-hdfs/4/dfs/dn
      • Typically each of the /1 /2 /3 /4 directories should be different mounted devices. Though using Eucalyptus volumes to do so might add up some latency.
    • on name node:
      $ mkdir -p /mnt/cloudera-hdfs/1/dfs/nn /nfsmount/dfs/nn
      $ chown -R hdfs:hdfs /mnt/cloudera-hdfs/1/dfs/nn /nfsmount/dfs/nn
      $ chmod 700 /mnt/cloudera-hdfs/1/dfs/nn /nfsmount/dfs/nn
  • Deploy configuration to all nodes in the cluster
  • Set alternatives to each node:
    $ update-alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.mesos-cluster 50
    $ update-alternatives --set hadoop-conf /etc/hadoop/conf.mesos-cluster
    $ update-alternatives --set hadoop-conf /etc/hadoop/conf.mesos-cluster
  • Add configuration to core-site.xml located under /etc/hadoop/
    • Make sure they have the hostnames – not the IP address of the NameNode
  • Install cloudera CDH5: There are multiple ways to do this. Probably the easiest is the following:
    $ wget http://archive.cloudera.com/cdh5/one-click-install/precise/amd64/cdh5-repository_1.0_all.deb
    $ dpkg -i cdh5-repository_1.0_all.deb
    • To master node:
      $ sudo apt-get update;
      $ sudo apt-get update; sudo apt-get install hadoop-hdfs-namenode
    • To slave nodes:
      $ sudo apt-get update; sudo apt-get install hadoop-hdfs-datanode
      $ sudo apt-get update; sudo apt-get install hadoop-client

      cp /etc/hadoop/conf.empty/log4j.properties /etc/hadoop/conf.name/log4j.properties

  • Upgrade/ Format namenode:
    $ service hadoop-hdfs-namenode upgrade
    $ sudo -u hdfs hdfs namenode -format
  • *If the storage directories change hdfs should be reformatted
  • Start HDFS:
    • On master:
      $ service hadoop-hdfs-namenode start
    • On slave:
      $ service hadoop-hdfs-datanode start
  • Optionally: Configure your cluster to start the services after a system restart
    • On master node:
      $ update-rc.d hadoop-hdfs-namenode defaults
      • If you have also setted up zookeeper then:
        $ update-rc.d zookeeper-server defaults
    • On the slaves:
      $ update-rc.d hadoop-hdfs-datanode defaults