This guide is not meant to be comprehensive and I am not trying to list simple and very frequently used commands like hadoop fs -ls that are very similar to what we use in a linux shell. If you need a comprehensive guide you can always check the Hadoop commands manual. Instead, I am listing some more “special” commands that I use frequently to manage a Hadoop cluster (HDFS, MapReduce, Zookeepers etc) in the long term and to detect/ fix problems. This is a work in progress post – I intend to update it with new commands based on how often I am using them on my clusters. Feel free to add a comment with commands you find more useful to perform administrative tasks.
HDFS
- Set the dfs replication recursively for all existing files
hadoop dfs -setrep -w 1 -R /
- Create a report to check HDFS health
hadoop dfsadmin -report
- Check HDFS file system
hadoop fsck /
- Run cluster balancer – make sure files are distributed in a balanced way across slaves.
sudo -u hdfs hdfs balancer
- Use after removing or adding a datanode
hadoop dfsadmin -refreshNodes sudo -u mapred hadoop mradmin -refreshNodes
- When hadoop master enters safe node (often because disk space is not enough to support your desired replication factor)
hadoop dfsadmin -safemode leave
- Display the datanodes that store a particular file with name “filename”.
hadoop fsck /file-path/filename -files -locations -blocks
Sample output:
FSCK started by root (auth:SIMPLE) from /10.2.31.174 for path /spark-1.2.1-bin-2.3.0-mr1-cdh5.1.2.tgz at Wed May 27 17:31:15 PDT 2015 /spark-1.2.1-bin-2.3.0-mr1-cdh5.1.2.tgz 186192499 bytes, 2 block(s): OK 0. BP-323016323-10.2.31.174-1424856295335:blk_1073799439_60141 len=134217728 repl=3 [ip1:50010, ip2:50010, ip3:50010] 1. BP-323016323-10.2.31.174-1424856295335:blk_1073799440_60142 len=51974771 repl=3 [ip1:50010, ip2:50010, ip3:50010]Status: HEALTHY Total size: 186192499 B Total dirs: 0 Total files: 1 Total symlinks: 0 Total blocks (validated): 2 (avg. block size 93096249 B) Minimally replicated blocks: 2 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 3.0 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 6 Number of racks: 1 FSCK ended at Wed May 27 17:31:15 PDT 2015 in 1 milliseconds
Map Reduce
- List active map-reduce jobs
hadoop job -list
- Kill a job
hadoop job -kill jobname
- Get jobtrackers state (active – standby)
sudo -u mapred hadoop mrhaadmin -getServiceState jt1
– where jt1 is the name of each of the jobtracker as configured on you mapred-site.xml file.
Zookeeper
- Initialize the High Availability state on zookeeper
hdfs zkfc -formatZK
- Check mode of each zookeeper server:
echo srvr | nc localhost 2181 | grep Mode