Tuesday 1 December 2015

Hadoop: Install and manage Zookeeper

In this post, I will explain how to install and setup Zookeeper cluster.

Zookeeper is a distributed co-ordination service for distributed application. It is a centralized repository where applications can put data in and out. Its general role is synchronization,serialization and co-ordination.

In a cluster, normally zookeeper is run on odd number 3 or 5 etc. The reason for opting for odd number is for majority and to prevent split brain scenario. This will prevent data inconsistencies.

Scenario: We have a 3 node cluster as below:
-----
hadoop-master-test         192.168.151.221
hadoop-secondary-test   192.168.151.222
data-node-1-test             192.168.151.223
-------

We have to setup a zookeeper cluster using these 3 nodes

Installation steps:

Step 1: Download the package from the apache's offical website: http://www.eu.apache.org/dist/zookeeper/

Here, I downloaded the version zookeeper-3.4.6 to "/usr/local" of server "192.168.151.221" .

wget http://www.eu.apache.org/dist/zookeeper/stable/zookeeper-3.4.6.tar.gz

Extract the tarball and rename the folder to "zookeeper" :

--------
cd /usr/local/
tar -xvzf zookeeper-3.4.6.tar.gz
mv  zookeeper-3.4.6.tar.gz  zookeeper
--------

Now, create a directory call "/usr/local/zookeeper/data/" which will be the data directory for zookeeper.

mkdir /usr/local/zookeeper/data/

Append to the PATH variable:

export PATH=$PATH:/usr/local/zookeeper/bin

Step 2: Editing zookeeper configuration files:

vim  /usr/local/zookeeper/conf/zoo.cfg

-------
dataDir=/usr/local/zookeeper/data/
dataLogDir=/usr/local/zookeeper/logs/                                                    
clientPort=2181
tickTime=2000
initLimit=10
syncLimit=5
server.1=hadoop-master-test:2888:3888
server.2=hadoop-secondary-test:2889:3889
server.3=data-node-1-test:2890:3890
-------

Explanation:
---------
tickTime:
-----------

tickTime---> This is in milliseconds

initLimit X ticktime = 20000ms ( 20 seconds)

This  means that, anytime a quorum member comes in, it has 20 seconds to download  the data initally. If it could not download within 20 seconds, it is timed out.

SyncLimit:
-------------
syncLimit X ticktime = 10000 milliseconds ( 10 seconds).

If a follower, not able to connect to leader in 10 seconds then master is going to be considered as dead and election process takes place.

Generally synclimit lesser than initlimit. Because, it is going to take more time to download the data initally.

Ports:
-------
2888 --> This is peer to peer port
3888---> Leader election port.
-------

STEP 3:
-------
Create my id file:

vim /usr/local/zookeeper/data/myid

Put, the number "1".


Perform STEP 1, 2 and 3 on other two nodes "192.168.151.222" and "192.168.151.223", with only difference of:

In "192.168.151.222"

vim  /usr/local/zookeeper/data/myid ---> Put number "2"

In "192.168.151.223"

vim  /usr/local/zookeeper/data/myid ---> Put number "3"

STEP 4: Starting zookeeper

Run the below command on all nodes.
----
zkServer.sh start
----

To check status
------
zkServer.sh status
------

Connecting to shell:
--------
zkCli.sh -server hadoop-master-test:2181
> help
--------

I will discuss more about zookeeper management in next post.

Keep reading :)

No comments:

Post a Comment

Note: only a member of this blog may post a comment.