Tuesday 24 March 2015

Frequently used hadoop administration commands

In this post, I will explain about some of the commonly used HDFS shell commands. Hadoop is installed in "/usr/local/hadoop" folder of this server.

Navigate to the folder "/usr/local/hadoop/bin".  You will be able to see binary hadoop, which will be used in commands from here on.

Command 1: To check which version of Hadoop is installed.
--------
hadoop version
--------

Command 2: List the contents of root directory in HDFS.

Sample output

--------
[root@MANINMANOJ]#hadoop fs -ls  /
Found 3 items
drwxr-xr-x   - root supergroup          2015-03-16 23:37 /data
drwxr-xr-x   - root supergroup          2015-03-16 23:37 /user
drwxr-xr-x   - root supergroup          2015-03-16 23:11 /usr
---------

Command 3: Count the number of directories,files and bytes under the paths that match the specified file pattern

Sample Output:
------
[root@MANINMANOJ]#hadoop fs -count hdfs:/
          11            1                  4 hdfs://192.168.150.210:10001/
-------

Command 4: Run a DFS filesystem checking utility

Sample Output:
-----
[root@MANINMANOJ]#hadoop fsck - /
FSCK started by root from /192.168.150.210 for path / at Tue Mar 17 23:43:51 IST 2015
.
/usr/local/hadoop/tmp/mapred/system/jobtracker.info:  Under replicated blk_-9110710984033906000_1001. Target Replicas is 3 but found 1 replica(s).
Status: HEALTHY
 Total size:    4 B
 Total dirs:    11
 Total files:   1
 Total blocks (validated):      1 (avg. block size 4 B)
 Minimally replicated blocks:   1 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       1 (100.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     1.0
 Corrupt blocks:                0
 Missing replicas:              2 (200.0 %)
 Number of data-nodes:          1
 Number of racks:               1
FSCK ended at Tue Mar 17 23:43:51 IST 2015 in 3 milliseconds


The filesystem under path '/' is HEALTHY
[root@MANINMANOJ]#
-----------------

Command 5: Run a cluster balancing utility
------
[root@MANINMANOJ]#hadoop balancer
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved
15/03/17 23:46:58 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.150.210:50010
15/03/17 23:46:58 INFO balancer.Balancer: 0 over utilized nodes:
15/03/17 23:46:58 INFO balancer.Balancer: 1 under utilized nodes:  192.168.150.210:50010
The cluster is balanced. Exiting...
Balancing took 1.136 seconds
-------


Command 6:  Right now, I am logged in as user root. Hence I am having a directory "/user/root"  in hdfs. I will be creating a new directory "new" in this location using the below command.
----------
[root@MANINMANOJ]#hadoop fs -mkdir /user/root/new

[root@MANINMANOJ]#hadoop fs -ls /user/root/
Found 1 item
drwxr-xr-x   - root supergroup          2015-03-17 23:51 /user/root/new
[root@MANINMANOJ]#
-----------

Command 7: Add a sample text file from the local directory named "test.txt" to the new directory created in previous step:
----------
[root@MANINMANOJ]#hadoop fs -put /sample/test.txt /user/root/new
[root@MANINMANOJ]#
[root@MANINMANOJ]#hadoop fs -ls /user/root/new
Found 1 items
-rw-r--r--   3 root supergroup           2015-03-18 00:06 /user/root/new/test.txt
-----------

Command 8: Add a sample directory "sample"  to a directory " "/user/root/data/" in HDFS
----------
[root@MANINMANOJ]#hadoop fs -put /sample/ /user/root/data
[root@MANINMANOJ]#
[root@MANINMANOJ]#hadoop fs -ls /user/root/data/
Found 1 items
drwxr-xr-x   - root supergroup           2015-03-18 00:13 /user/root/data/sample
-------


Command 9:  The space utilized by the directory "/user/root/data/" 
-----------
[root@MANINMANOJ]#hadoop fs -du /user/root/data/
Found 1 items
73          hdfs://192.168.150.210:10001/user/root/data/sample
------------

Command 10:  To delete a file "test,.txt" from HDFS file system.
------
[root@MANINMANOJ]#hadoop fs -rm /user/root/data/sample/test.txt
Deleted hdfs://192.168.150.210:10001/user/root/data/sample/test.txt
[root@MANINMANOJ]#
-------

Command 11:  Remove the entire sample directory and all of its contents in HDFS.
----------
[root@MANINMANOJ]#hadoop fs -rmr  /user/root/data/sample/
Deleted hdfs://192.168.150.210:10001/user/root/data/sample
[root@MANINMANOJ]#
-----------


Command 12: . Add the file "testing.txt" from the local directory named "/var/tmp/testing.txt" to the directory "/user/root/data/sample" in HDFS

-----------
[root@MANINMANOJ]#hadoop fs -copyFromLocal /var/tmp/testing.txt /user/root/data/sample
[root@MANINMANOJ]#
[root@MANINMANOJ]#hadoop fs -ls  /user/root/data/sample
Found 1 items
-rw-r--r--   3 root supergroup        409 2015-03-24 22:58 /user/root/data/sample/testing.txt
-----------


Command 13: To view the contents of text file testing.txt which is present in "sample" directory in HDFS.

---------
hadoop fs -cat /user/root/data/sample/testing.txt
----------

Command 14:  Add the testing.txt file from "sample" directory which is present in HDFS to the directory "/home/manoj" which is present in the local directory.
-------
hadoop fs -copytoLocal /user/root/data/sample/testing.txt /home/manoj
-------


Command 15: cp is used to copy files between directories present in HDFS
--------
hadoop fs -cp /home/manoj/*.txt  /user/root/data/sample/
--------

Command 16:  '-get' command can be used alternatively to '-copyToLocal' command

------
hadoop fs -get /user/root/data/sample/testing.txt /home/manoj
-------


Command 17:  Display last 10 lines of the file "testing.txt" to stdout.
---------
hadoop fs -tail  /user/root/data/sample/testing.txt
--------

Command 18: Default file permissions are 666 in HDFS. Use '-chmod' command to change permissions of a file
------------
hadoop fs -chmod 600 /user/root/data/sample/testing.txt
------------

Command 19: Default names of owner and group can be changed using chown command:

-----------
hadoop fs -chown root:root /user/root/data/sample/testing.txt
------------

Command 20:  Move a directory from one location to other
------
hadoop fs -mv  /user/root/data/sample1/testing.txt /user/root/data/sample2/testing2.txt
-------

Command 21:  Default replication factor to a file is 3. Use '-setrep' command to change replication factor of a file

--------
hadoop fs -setrep -w 2   /user/root/data/sample/testing.txt
---------

Command 22: Copy a directory from one node in the cluster to another. Use

1) '-distcp' command to copy,
2) -overwrite option to overwrite in an existing files
3) -update command to synchronize both directories

--------
hadoop fs -distcp hdfs://namenode1/apache_hadoop  hdfs://namenode2/hadoop
--------


Command 23:  List all the hadoop file system shell commands
----------
hadoop fs
---------

Command 24: To get help
--------
hadoop fs -help
--------


No comments:

Post a Comment

Note: only a member of this blog may post a comment.