Friday 27 March 2015

Building a multi node hadoop cluster

In this post, I will explain how to setup Multi-Node Hadoop cluster on a distributed environment.

We will consider 5 servers here,

They are:

Hadoop Master:                          192.168.150.205 (Hadoop-Master)
Hadoop Secondary namenode:  192.168.150.206(hadoop-secondary-Namenode)
Hadoop DataNode1:                  192.168.150.207(Hadoop-DataNode1)
Hadoop DataNode2:                  192.168.150.208(Hadoop-DataNode2)
Hadoop DataNode3:                  192.168.150.209 (Hadoop-DataNode3)

Follow the steps given below to have Hadoop Multi-Node cluster setup.

I suggest, you to read the below post regarding single node hadoop cluster before proceeding with this one:
-------
http://www.maninmanoj.com/2015/03/setting-up-single-node-hadoop-cluster.html
-------

Pre-requisites are same as single node hadoop cluster.

Prerequisite 1:  Check whether java is installed in all servers of Hadoop cluster. Else, install it via the link below:
----
http://www.maninmanoj.com/2015/03/installing-java-7-jdk-7u75-on-linux.html
----

Prerequisite 2 : Passwordless login has to be enabled within HADOOP-MASTER. Also, from Hadoop-Master to Secondary Namenode and Datanode.

To setup ssh key authentication follow the link below:
--------
http://www.maninmanoj.com/2013/08/how-to-perform-ssh-login-without.html
---------

Prerequsite 3: "/etc/hosts" file has been edited in following format.









BELOW CONFIGURATIONS  NEED TO BE DONE ON ALL NODES.
--------------------

Let us start hadoop installation if prerequisites are met.

Step 1: Download desired hadoop package from the link below:
-----
http://apache.claz.org/hadoop/common/
----

cd /usr/local/
wget http://apache.claz.org/hadoop/common/hadoop-1.2.1/hadoop-1.2.1-bin.tar.gz
----

Now, rename the extracted folder as below:
----
mv hadoop-1.2.1 hadoop
-----

Once done, edit ~/.bashrc and enter the below variables:
-----
export HADOOP_PREFIX=/usr/local/hadoop
export PATH=$PATH:$HADOOP_PREFIX/bin
-----

Then run the below command to make it effective.
-----
exec bash
-----


Step 2:
------
Now, we need to set  environment variables in the following file:

vi /usr/local/hadoop/conf/hadoop-env.sh

Edit the below variable with java path installed.
------
export JAVA_HOME=/opt/jdk1.7.0_75/
------


Step 3:
------

vi /usr/local/hadoop/conf/core-site.xml

Enter the below parameters inside "configuration" that is already present:
---------
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.150.205:10001</value>
</property>

<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>

</configuration>
--------------


Step 4:
---------
vi /usr/local/hadoop/conf/mapred-site.xml
----------

Enter the below entries :
----------
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>192.168.150.205:10002</value>
</property>

</configuration>
-----------

Step 5:
---------
In the "masters" file present in "/usr/local/hadoop/conf/masters" enter the IP address of "SECONDARY NAMENODE".


Step 6:
--------
In the "slaves" file present in "/usr/local/hadoop/conf/slaves"  enter the IP addresses of  all DATA-NODES.
--------



Step 7: 
--------
Now, run the below command on NAMENODE to format HDFS filesystem.
--------
 hadoop namenode -format
---------

Procedure to start the cluster:

Step 1: Start dfs using the command "start-dfs.sh".


Step 2: Start mapred using the command "start-mapred.sh".



Now, to see the details of namenode use the below URL:
-------
http://192.168.150.205:50070/ ---> Details of Name Node.
------

Kool :)

No comments:

Post a Comment

Note: only a member of this blog may post a comment.