Tuesday, 24 March 2015

Frequently used hadoop administration commands

In this post, I will explain about some of the commonly used HDFS shell commands. Hadoop is installed in "/usr/local/hadoop" folder of this server.

Navigate to the folder "/usr/local/hadoop/bin".  You will be able to see binary hadoop, which will be used in commands from here on.

Command 1: To check which version of Hadoop is installed.
hadoop version

Command 2: List the contents of root directory in HDFS.

Sample output

[root@MANINMANOJ]#hadoop fs -ls  /
Found 3 items
drwxr-xr-x   - root supergroup          2015-03-16 23:37 /data
drwxr-xr-x   - root supergroup          2015-03-16 23:37 /user
drwxr-xr-x   - root supergroup          2015-03-16 23:11 /usr

Command 3: Count the number of directories,files and bytes under the paths that match the specified file pattern

Sample Output:
[root@MANINMANOJ]#hadoop fs -count hdfs:/
          11            1                  4 hdfs://

Command 4: Run a DFS filesystem checking utility

Sample Output:
[root@MANINMANOJ]#hadoop fsck - /
FSCK started by root from / for path / at Tue Mar 17 23:43:51 IST 2015
/usr/local/hadoop/tmp/mapred/system/jobtracker.info:  Under replicated blk_-9110710984033906000_1001. Target Replicas is 3 but found 1 replica(s).
 Total size:    4 B
 Total dirs:    11
 Total files:   1
 Total blocks (validated):      1 (avg. block size 4 B)
 Minimally replicated blocks:   1 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       1 (100.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     1.0
 Corrupt blocks:                0
 Missing replicas:              2 (200.0 %)
 Number of data-nodes:          1
 Number of racks:               1
FSCK ended at Tue Mar 17 23:43:51 IST 2015 in 3 milliseconds

The filesystem under path '/' is HEALTHY

Command 5: Run a cluster balancing utility
[root@MANINMANOJ]#hadoop balancer
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved
15/03/17 23:46:58 INFO net.NetworkTopology: Adding a new node: /default-rack/
15/03/17 23:46:58 INFO balancer.Balancer: 0 over utilized nodes:
15/03/17 23:46:58 INFO balancer.Balancer: 1 under utilized nodes:
The cluster is balanced. Exiting...
Balancing took 1.136 seconds

Command 6:  Right now, I am logged in as user root. Hence I am having a directory "/user/root"  in hdfs. I will be creating a new directory "new" in this location using the below command.
[root@MANINMANOJ]#hadoop fs -mkdir /user/root/new

[root@MANINMANOJ]#hadoop fs -ls /user/root/
Found 1 item
drwxr-xr-x   - root supergroup          2015-03-17 23:51 /user/root/new

Command 7: Add a sample text file from the local directory named "test.txt" to the new directory created in previous step:
[root@MANINMANOJ]#hadoop fs -put /sample/test.txt /user/root/new
[root@MANINMANOJ]#hadoop fs -ls /user/root/new
Found 1 items
-rw-r--r--   3 root supergroup           2015-03-18 00:06 /user/root/new/test.txt

Command 8: Add a sample directory "sample"  to a directory " "/user/root/data/" in HDFS
[root@MANINMANOJ]#hadoop fs -put /sample/ /user/root/data
[root@MANINMANOJ]#hadoop fs -ls /user/root/data/
Found 1 items
drwxr-xr-x   - root supergroup           2015-03-18 00:13 /user/root/data/sample

Command 9:  The space utilized by the directory "/user/root/data/" 
[root@MANINMANOJ]#hadoop fs -du /user/root/data/
Found 1 items
73          hdfs://

Command 10:  To delete a file "test,.txt" from HDFS file system.
[root@MANINMANOJ]#hadoop fs -rm /user/root/data/sample/test.txt
Deleted hdfs://

Command 11:  Remove the entire sample directory and all of its contents in HDFS.
[root@MANINMANOJ]#hadoop fs -rmr  /user/root/data/sample/
Deleted hdfs://

Command 12: . Add the file "testing.txt" from the local directory named "/var/tmp/testing.txt" to the directory "/user/root/data/sample" in HDFS

[root@MANINMANOJ]#hadoop fs -copyFromLocal /var/tmp/testing.txt /user/root/data/sample
[root@MANINMANOJ]#hadoop fs -ls  /user/root/data/sample
Found 1 items
-rw-r--r--   3 root supergroup        409 2015-03-24 22:58 /user/root/data/sample/testing.txt

Command 13: To view the contents of text file testing.txt which is present in "sample" directory in HDFS.

hadoop fs -cat /user/root/data/sample/testing.txt

Command 14:  Add the testing.txt file from "sample" directory which is present in HDFS to the directory "/home/manoj" which is present in the local directory.
hadoop fs -copytoLocal /user/root/data/sample/testing.txt /home/manoj

Command 15: cp is used to copy files between directories present in HDFS
hadoop fs -cp /home/manoj/*.txt  /user/root/data/sample/

Command 16:  '-get' command can be used alternatively to '-copyToLocal' command

hadoop fs -get /user/root/data/sample/testing.txt /home/manoj

Command 17:  Display last 10 lines of the file "testing.txt" to stdout.
hadoop fs -tail  /user/root/data/sample/testing.txt

Command 18: Default file permissions are 666 in HDFS. Use '-chmod' command to change permissions of a file
hadoop fs -chmod 600 /user/root/data/sample/testing.txt

Command 19: Default names of owner and group can be changed using chown command:

hadoop fs -chown root:root /user/root/data/sample/testing.txt

Command 20:  Move a directory from one location to other
hadoop fs -mv  /user/root/data/sample1/testing.txt /user/root/data/sample2/testing2.txt

Command 21:  Default replication factor to a file is 3. Use '-setrep' command to change replication factor of a file

hadoop fs -setrep -w 2   /user/root/data/sample/testing.txt

Command 22: Copy a directory from one node in the cluster to another. Use

1) '-distcp' command to copy,
2) -overwrite option to overwrite in an existing files
3) -update command to synchronize both directories

hadoop fs -distcp hdfs://namenode1/apache_hadoop  hdfs://namenode2/hadoop

Command 23:  List all the hadoop file system shell commands
hadoop fs

Command 24: To get help
hadoop fs -help

