Solutions for BigData Engineers: Hadoop: Install and manage Zookeeper

In this post, I will explain how to install and setup Zookeeper cluster.

Zookeeper is a distributed co-ordination service for distributed application. It is a centralized repository where applications can put data in and out. Its general role is synchronization,serialization and co-ordination.

In a cluster, normally zookeeper is run on odd number 3 or 5 etc. The reason for opting for odd number is for majority and to prevent split brain scenario. This will prevent data inconsistencies.

Scenario: We have a 3 node cluster as below:
-----
hadoop-master-test 192.168.151.221
hadoop-secondary-test 192.168.151.222
data-node-1-test 192.168.151.223
-------

We have to setup a zookeeper cluster using these 3 nodes

Installation steps:

Step 1: Download the package from the apache's offical website: http://www.eu.apache.org/dist/zookeeper/

Here, I downloaded the version zookeeper-3.4.6 to "/usr/local" of server "192.168.151.221" .

wget http://www.eu.apache.org/dist/zookeeper/stable/zookeeper-3.4.6.tar.gz

Extract the tarball and rename the folder to "zookeeper" :

--------
cd /usr/local/
tar -xvzf zookeeper-3.4.6.tar.gz
mv zookeeper-3.4.6.tar.gz zookeeper
--------

Now, create a directory call "/usr/local/zookeeper/data/" which will be the data directory for zookeeper.

mkdir /usr/local/zookeeper/data/

Append to the PATH variable:

export PATH=$PATH:/usr/local/zookeeper/bin

Step 2: Editing zookeeper configuration files:

vim /usr/local/zookeeper/conf/zoo.cfg

-------
dataDir=/usr/local/zookeeper/data/
dataLogDir=/usr/local/zookeeper/logs/
clientPort=2181
tickTime=2000
initLimit=10
syncLimit=5
server.1=hadoop-master-test:2888:3888
server.2=hadoop-secondary-test:2889:3889
server.3=data-node-1-test:2890:3890
-------

Explanation:
---------
tickTime:
-----------

tickTime---> This is in milliseconds

initLimit X ticktime = 20000ms ( 20 seconds)

This means that, anytime a quorum member comes in, it has 20 seconds to download the data initally. If it could not download within 20 seconds, it is timed out.

SyncLimit:
-------------
syncLimit X ticktime = 10000 milliseconds ( 10 seconds).

If a follower, not able to connect to leader in 10 seconds then master is going to be considered as dead and election process takes place.

Generally synclimit lesser than initlimit. Because, it is going to take more time to download the data initally.

Ports:
-------
2888 --> This is peer to peer port
3888---> Leader election port.
-------

STEP 3:
-------
Create my id file:

vim /usr/local/zookeeper/data/myid

Put, the number "1".

Perform STEP 1, 2 and 3 on other two nodes "192.168.151.222" and "192.168.151.223", with only difference of:

In "192.168.151.222"

vim /usr/local/zookeeper/data/myid ---> Put number "2"

In "192.168.151.223"

vim /usr/local/zookeeper/data/myid ---> Put number "3"

STEP 4: Starting zookeeper

Run the below command on all nodes.
----
zkServer.sh start
----

To check status
------
zkServer.sh status
------

Connecting to shell:
--------
zkCli.sh -server hadoop-master-test:2181
> help
--------

I will discuss more about zookeeper management in next post.

Keep reading :)

Solutions for BigData Engineers

Pages

Labels

Tuesday, 1 December 2015

Hadoop: Install and manage Zookeeper

No comments:

Post a Comment