Tuesday 21 April 2015

Procedure to add a new data node to a hadoop cluster

Inorder to add a new datanode to a hadoop cluster, we need to follow the steps below:

Prerequisites: 

1)  Make sure passwordless login is enabled from master to new datanode.

2)  The name resolution ( DNS) is working fine for the hostname of datanode.

PROCEDURE: 

Step 1: First add the new cluster to "$HADOOP_PREFIX/conf/slaves" file of master.

Step 2: Copy the configurations from the hadoop master to new data node.  The best option is to perform an rsync of  "$HADOOP_PREFIX/conf/" directory from master to new slave.

Step 3: Now run the below command  to new datanode.

-------
hadoop-daemon.sh start datanode
-------

This will start Datanode.

Step 4:  Now start task-tracker in new datanode as below:
------
hadoop-daemon.sh start tasktracker 
------

Now , go to master node and perform a refresh
-------
hadoop mradmin -refreshNodes   --> This refresh map reduce on all nodes.

hadoop dfsadmin -refreshNodes   ---> This will refresh DFS of all nodes.
----------


Now, we have add a new datanode to the cluster without any interruption.

We have to run a balancer to reallocate the data in the cluster. Run the below command:
---------
start-balancer.sh
--------

Kool :) 

No comments:

Post a Comment

Note: only a member of this blog may post a comment.