Apache Hama on Mesos

This post describes how you can set up Apache Hama to work with Apache Mesos. In another post I also describe how you can set up Apache Hadoop (http://wp.me/p3pTv0-j) and Apache Spark (http://wp.me/p3pTv0-c) to work on Mesos.

*The instructions have been tested with mesos 0.20.0 and Hama 0.7.0. My cluster is a Eucalyptus private cloud (Version 3.4.2) with an Ubuntu 12.04.4 LTS image, but the instructions should work for any cluster running Ubuntu or even different Linux distributions after some small changes.

Prerequisites:

  • I assume you have already set up HDFS CDH 5.1.2 to your cluster. If not, follow my post here
  • I also assume you have already installed Mesos. If not, follow the instructions here

Installation Steps:

  • IMPORTANT: The current git repo has a bug working with CDH5 on Mesos.  If you compile with this version you will get an error similar to the following:
    ERROR bsp.MesosExecutor: Caught exception, committing suicide.
    java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.fs.LocalFileSystem not found
            at java.util.ServiceLoader.fail(ServiceLoader.java:231)
            at java.util.ServiceLoader.access$300(ServiceLoader.java:181)
            at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:365)
            at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
            at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2364)
            at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2375)
            at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392)
            at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
            at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
            at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
            at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
            at org.apache.hadoop.fs.FileSystem.getLocal(FileSystem.java:339)
            at org.apache.hama.bsp.GroomServer.deleteLocalFiles(GroomServer.java:483)
            at org.apache.hama.bsp.GroomServer.initialize(GroomServer.java:321)
            at org.apache.hama.bsp.GroomServer.run(GroomServer.java:860)
            at org.apache.hama.bsp.MesosExecutor$1.run(MesosExecutor.java:92)
    
  • SOLUTION: Thanks to the Hama community and in particular Jeff Fenchel the bug was quickly fixed. You can get a patched Hama 0.7.0 version from Jeff’s github repo here  or my forked repo here in case Jeff deletes it in the future. When the patch is merged I will update the links.

So now we are good to go:

  1.  Remove any previous versions on Hama on your HDFS:
    $ hadoop fs -rm /hama.tar.gz
  2. Build Hama for the particular HDFS and Mesos version you are using:
    $ mvn clean install -Phadoop2 -Dhadoop.version=2.3.0-cdh5.1.2 -Dmesos.version=0.20.0 -DskipTests
  3. Put it to HDFS (Careful with the naming of your tar file – Has to be the same with your configuration file)
    $ hadoop fs -put dist/target/hama-0.7.0-SNAPSHOT.tar.gz /hama.tar.gz
  4. Move to the distribution directory:
    $ cd dist/target/hama-0.7.0-SNAPSHOT/hama-0.7.0-SNAPSHOT/
  5. Make sure LD_LIBRARY_PATH or the MESOS_NATIVE_LIBRARY environment variables are set correctly to your mesos installation libraries. This can be under usr/lib/mesos (default) or anywhere you specified when installing mesos. For example:
    $ export LD_LIBRARY_PATH=/root/mesos-installation/lib/

    If you don’t set them correctly then you will get an ugly stack trace like this one on your bspmaster logs and the bspmaster won’t start:

    2014-11-08 01:23:50,646 FATAL org.apache.hama.BSPMasterRunner: java.lang.UnsatisfiedLinkError: Expecting an absolute path of the library:
            at java.lang.Runtime.load0(Runtime.java:792)
            at java.lang.System.load(System.java:1062)
            at org.apache.mesos.MesosNativeLibrary.load(MesosNativeLibrary.java:50)
            at org.apache.mesos.MesosNativeLibrary.load(MesosNativeLibrary.java:79)
            at org.apache.mesos.MesosSchedulerDriver.(MesosSchedulerDriver.java:61)
            at org.apache.hama.bsp.MesosScheduler.start(MesosScheduler.java:63)
            at org.apache.hama.bsp.MesosScheduler.init(MesosScheduler.java:46)
            at org.apache.hama.bsp.SimpleTaskScheduler.start(SimpleTaskScheduler.java:273)
            at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:523)
            at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:500)
            at org.apache.hama.BSPMasterRunner.run(BSPMasterRunner.java:46)
            at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
            at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
            at org.apache.hama.BSPMasterRunner.main(BSPMasterRunner.java:56)
  6. Configure hama with Mesos. I found that the instructions from the Hama wiki  are missing some things. For example it is not described to add the fs.default.name property and without this you will get an error like this on the master when running your code:
    Error reading task output: http://euca-x-x-x-x.eucalyptus.race.cs.ucsb.edu:40015/tasklog?plaintext=true&taskid=attempt_201411051242_0001_000000_0&filter=stdout

    and an ugly stack trace on the Groom Server executor log like this one:

    14/11/05 12:43:08 INFO bsp.GroomServer: Launch 1 tasks.
    14/11/05 12:43:38 WARN bsp.GroomServer: Error initializing attempt_201411051242_0001_000000_0:
    java.io.FileNotFoundException: File file:/mnt/bsp/system/submit_6i82le/job.xml does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:397)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
    at org.apache.hadoop.fs.LocalFileSystem.copyToLocalFile(LocalFileSystem.java:88)
    at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1913)
    at org.apache.hama.bsp.GroomServer.localizeJob(GroomServer.java:707)
    at org.apache.hama.bsp.GroomServer.startNewTask(GroomServer.java:562)
    at org.apache.hama.bsp.GroomServer.access$000(GroomServer.java:86)
    at org.apache.hama.bsp.GroomServer$DispatchTasksHandler.handle(GroomServer.java:173)
    at org.apache.hama.bsp.GroomServer$Instructor.run(GroomServer.java:237)14/11/05 12:43:39 INFO bsp.GroomServer: Launch 1 tasks.
    14/11/05 12:43:40 INFO bsp.MesosExecutor: Killing task : Task_0
    14/11/05 12:43:40 INFO ipc.Server: Stopping server on 31000
    14/11/05 12:43:40 INFO ipc.Server: IPC Server handler 0 on 31000: exiting
    14/11/05 12:43:40 INFO ipc.Server: Stopping IPC Server listener on 31000
    14/11/05 12:43:40 INFO ipc.Server: Stopping IPC Server Responder
    14/11/05 12:43:40 INFO ipc.Server: Stopping server on 50001
    14/11/05 12:43:40 INFO ipc.Server: IPC Server handler 1 on 50001: exiting
    14/11/05 12:43:40 INFO ipc.Server: IPC Server handler 3 on 50001: exiting
    14/11/05 12:43:40 INFO ipc.Server: IPC Server handler 4 on 50001: exiting
    14/11/05 12:43:40 INFO ipc.Server: IPC Server handler 0 on 50001: exiting
    14/11/05 12:43:40 INFO ipc.Server: IPC Server handler 2 on 50001: exiting
    14/11/05 12:43:40 INFO ipc.Server: Stopping IPC Server listener on 5000114/11/05 12:43:40 INFO ipc.Server: Stopping IPC Server Responder
    14/11/05 12:43:42 WARN mortbay.log: /tasklog: java.io.IOException: Closed
    
    

    To be fair you are able to find all the configuration you need from the Hama configuration instructions but its not everything on one page.

    So, to save you from some of the issues you might encounter configuring Hama, the hama-site.xml configuration in my case looks like this:

    <configuration>
     <property>
     <name>bsp.master.address</name>
     <value>euca-10-2-112-10.eucalyptus.internal:40000</value>
     <description>The address of the bsp master server. Either the
     literal string "local" or a host[:port] (where host is a name or
     IP address) for distributed mode.
     </description>
     </property>
     <property>
     <name>bsp.master.port</name>
     <value>40000</value>
     <description>The port master should bind to.</description>
     </property>
     <property>
     <name>bsp.master.TaskWorkerManager.class</name>
     <value>org.apache.hama.bsp.MesosScheduler</value>
     <description>Instructs the scheduler to use Mesos to execute tasks of each job
     </description>
     </property>
     <property>
     <name>fs.default.name</name>
     <value>hdfs://euca-10-2-112-10.eucalyptus.internal:9000</value>
     <description>
     The name of the default file system. Either the literal string
     "local" or a host:port for HDFS.
     </description>
     </property>
     <property>
     <name>hama.mesos.executor.uri</name>
     <value>hdfs://euca-10-2-112-10.eucalyptus.internal:9000/hama.tar.gz</value>
     <description>This is the URI of the Hama distribution
     </description>
     </property>
    
     <!-- Hama requires one cpu and memory defined by bsp.child.java.opts for each slot.
     This means that a cluster with bsp.tasks.maximum.total set to 2 and bsp.child.jova.opts set to -Xmx1024m
     will need at least 2 cpus and and 2048m of memory. -->
    
     <property>
     <name>bsp.tasks.maximum.total</name>
     <value>2</value>
     <description>This is an override for the total maximum tasks that may be run.
     The default behavior is to determine a value based on the available groom servers.
     However, if using Mesos, the groom servers are not yet allocated.
     So, a value indicating the maximum number of slots available in the cluster is needed.
     </description>
     </property>
    
     <property>
     <name>hama.mesos.master</name>
     <value>zk://euca-10-2-112-10.eucalyptus.internal:2181/mesos</value>
     <description>This is the address of the Mesos master instance.
     If you're using Zookeeper for master election, use the Zookeeper address here (i.e.,zk://zk.apache.org:2181/hadoop/mesos).
     </description>
     </property>
     <property>
     <name>bsp.child.java.opts</name>
     <value>-Xmx1024m</value>
     <description>Java opts for the groom server child processes.
     </description>
     </property>
    
     <property>
     <name>bsp.system.dir</name>
     <value>${hadoop.tmp.dir}/bsp/system</value>
     <description>The shared directory where BSP stores control files.
     </description>
     </property>
     <property>
     <name>bsp.local.dir</name>
     <value>/mnt/bsp/local</value>
     <description>local directory for temporal store.</description>
     </property>
     <property>
     <name>hama.tmp.dir</name>
     <value>/mnt/hama/tmp/hama-${user.name}</value>
     <description>Temporary directory on the local filesystem.</description>
     </property>
     <property>
     <name>bsp.disk.queue.dir</name>
     <value>${hama.tmp.dir}/messages/</value>
     <description>Temporary directory on the local message buffer on disk.</description>
     </property>
     <property>
     <name>hama.zookeeper.quorum</name>
     <value>euca-10-2-112-10.eucalyptus.internal</value>
     <description>Comma separated list of servers in the ZooKeeper Quorum.
     For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
     By default this is set to localhost for local and pseudo-distributed modes
     of operation. For a fully-distributed setup, this should be set to a full
     list of ZooKeeper quorum servers. If HAMA_MANAGES_ZK is set in hama-env.sh
     this is the list of servers which we will start/stop zookeeper on.
     </description>
     </property>
     <property>
     <name>hama.zookeeper.property.clientPort</name>
     <value>2181</value>
     <description>The port to which the zookeeper clients connect
     </description>
     </property>
    
    </configuration>
     
  7. You can also configure hama-env.xml if you want for example to output the logs to a different directory than the default. *Note: You don’t need to ship the configuration to your slaves.
  8. Now we are ready to run bspmaster:
    $ ./bin/hama-daemon.sh start bspmaster
    

    Make sure that the bspmaster started without problems by checking the log files

    $ less hama-root-bspmaster-$HOSTNAME.log 
    $ less hama-root-bspmaster-$HOSTNAME.out
    
  9. If everything went ok you are ready to run some examples:
    $ $ ./bin/hama jar hama-examples-0.7.0-SNAPSHOT.jar gen fastgen 100 10 randomgraph 2

    Listing the directories on your HDFS to make sure everything went fine. You should get something like this:

    Found 2 items
    -rw-r--r-- 3 root hadoop 2241 2014-11-07 16:49 /user/root/randomgraph/part-00000
    -rw-r--r-- 3 root hadoop 2243 2014-11-07 16:49 /user/root/randomgraph/part-00001

    These two files are two partitions of a graph with 100 nodes and 1K edges.

  10. $ ./bin/hama jar hama-examples-0.7.0-SNAPSHOT.jar pagerank randomgraph pagerankresult 4

    And again if you list your files in HDFS

    $ hadoop fs -ls /user/root/randomgraph
    

    you should get something like this:

    Found 2 items
    -rw-r--r-- 3 root hadoop 1194 2014-11-05 16:02 /user/root/pagerankresult/part-00000
    -rw-r--r-- 3 root hadoop 1189 2014-11-05 16:02 /user/root/pagerankresult/part-00001

    -You can find more examples to run on the Apache Hama website example’s page here

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s