Cloudera HDFS CDH5 Installation to use with Mesos

This post is a guide for installing Cloudera HDFS CDH5 on Eucalyptus (Version 3.4.2) in order to use it later with Apache Mesos. The difference of installing HDFS for any kind of use is that we don’t start the Tasktracker on the slave nodes. This is something that Mesos will do each time a hadoop job is running.

The steps you should take are the following:

  • Disable iptables firewall:
    $ ufw disable
  • Disable selinux:
    $ setenforce 0
  • Make sure instances have unique hostnames
  • Make sure the /etc/hosts file on each system has the IP addresses and fully-qualified domain names (FQDN) of all the members of the cluster.
    • hostname –fqdn
    • An example configuration is the following:
      • For the datanode: localhost euca-10-2-24-25.eucalyptus.internal
        x.x.x.x euca-10-2-24-25
      • For the namenode: localhost euca-10-2-85-213.eucalyptus.internal
        y.y.y.y euca-10-2-85-213
      • where x.x.x.x and y.y.y.y are the external IPs of your namenode and datanode respectively.
  • Be careful to create the directories that the data  node and name node are using andwill be set later on the configuration files inside the /etc/hadoop/ directory. Also remember to change ownership tohdfs:hdfs
    • on data node:
      $ mkdir -p /mnt/cloudera-hdfs/1/dfs/dn /mnt/cloudera-hdfs/2/dfs/dn /mnt/cloudera-hdfs/3/dfs/dn /mnt/cloudera-hdfs/4/dfs/dn
      $ chown -R hdfs:hdfs /mnt/cloudera-hdfs/1/dfs/dn /mnt/cloudera-hdfs/2/dfs/dn /mnt/cloudera-hdfs/3/dfs/dn /mnt/cloudera-hdfs/4/dfs/dn
      • Typically each of the /1 /2 /3 /4 directories should be different mounted devices. Though using Eucalyptus volumes to do so might add up some latency.
    • on name node:
      $ mkdir -p /mnt/cloudera-hdfs/1/dfs/nn /nfsmount/dfs/nn
      $ chown -R hdfs:hdfs /mnt/cloudera-hdfs/1/dfs/nn /nfsmount/dfs/nn
      $ chmod 700 /mnt/cloudera-hdfs/1/dfs/nn /nfsmount/dfs/nn
  • Deploy configuration to all nodes in the cluster
  • Set alternatives to each node:
    $ update-alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.mesos-cluster 50
    $ update-alternatives --set hadoop-conf /etc/hadoop/conf.mesos-cluster
    $ update-alternatives --set hadoop-conf /etc/hadoop/conf.mesos-cluster
  • Add configuration to core-site.xml located under /etc/hadoop/
    • Make sure they have the hostnames – not the IP address of the NameNode
  • Install cloudera CDH5: There are multiple ways to do this. Probably the easiest is the following:
    $ wget
    $ dpkg -i cdh5-repository_1.0_all.deb
    • To master node:
      $ sudo apt-get update;
      $ sudo apt-get update; sudo apt-get install hadoop-hdfs-namenode
    • To slave nodes:
      $ sudo apt-get update; sudo apt-get install hadoop-hdfs-datanode
      $ sudo apt-get update; sudo apt-get install hadoop-client

      cp /etc/hadoop/conf.empty/ /etc/hadoop/

  • Upgrade/ Format namenode:
    $ service hadoop-hdfs-namenode upgrade
    $ sudo -u hdfs hdfs namenode -format
  • *If the storage directories change hdfs should be reformatted
  • Start HDFS:
    • On master:
      $ service hadoop-hdfs-namenode start
    • On slave:
      $ service hadoop-hdfs-datanode start
  • Optionally: Configure your cluster to start the services after a system restart
    • On master node:
      $ update-rc.d hadoop-hdfs-namenode defaults
      • If you have also setted up zookeeper then:
        $ update-rc.d zookeeper-server defaults
    • On the slaves:
      $ update-rc.d hadoop-hdfs-datanode defaults

7 thoughts on “Cloudera HDFS CDH5 Installation to use with Mesos

  1. Pingback: Spark on Mesos Installation Guide | Strat0sphere

  2. Pingback: Apache Hama on Mesos | Strat0sphere

  3. Whenever I run this command:
    $ update-alternatives –install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.mesos-cluster 50
    It gives me an error of:
    update-alternatives: error: alternative path /etc/hadoop/conf.mesos-cluster doesn’t exist

    • Hi slomka – Sorry for the late reply. I guess you have found out until now that you need to create the configuration directory for your cluster first before running the update-alternatives command. The “conf.mesos-cluster” name is specific to my configuration. Please change the names as needed to adapt to your cluster configuration.

  4. Hi,

    I am trying to install HDFS, mesos and hadoop. I started first on above URL to install and configure HDFS. I have completed the steps until creation of directories on name node and data node. after that I found that deploy configuration to all nodes on the cluster. can you please provide me steps to deploy the configuration to all nodes in the cluster?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s