In this post, I will explain how to install and setup Zookeeper cluster.
Zookeeper is a distributed co-ordination service for distributed application. It is a centralized repository where applications can put data in and out. Its general role is synchronization,serialization and co-ordination.
In a cluster, normally zookeeper is run on odd number 3 or 5 etc. The reason for opting for odd number is for majority and to prevent split brain scenario. This will prevent data inconsistencies.
Scenario: We have a 3 node cluster as below:
-----
hadoop-master-test 192.168.151.221
hadoop-secondary-test 192.168.151.222
data-node-1-test 192.168.151.223
-------
We have to setup a zookeeper cluster using these 3 nodes
Installation steps:
Step 1: Download the package from the apache's offical website: http://www.eu.apache.org/dist/zookeeper/
Here, I downloaded the version zookeeper-3.4.6 to "/usr/local" of server "192.168.151.221" .
wget http://www.eu.apache.org/dist/zookeeper/stable/zookeeper-3.4.6.tar.gz
Extract the tarball and rename the folder to "zookeeper" :
--------
cd /usr/local/
tar -xvzf zookeeper-3.4.6.tar.gz
mv zookeeper-3.4.6.tar.gz zookeeper
--------
Now, create a directory call "/usr/local/zookeeper/data/" which will be the data directory for zookeeper.
mkdir /usr/local/zookeeper/data/
Append to the PATH variable:
export PATH=$PATH:/usr/local/zookeeper/bin
Step 2: Editing zookeeper configuration files:
vim /usr/local/zookeeper/conf/zoo.cfg
-------
dataDir=/usr/local/zookeeper/data/
dataLogDir=/usr/local/zookeeper/logs/
clientPort=2181
tickTime=2000
initLimit=10
syncLimit=5
server.1=hadoop-master-test:2888:3888
server.2=hadoop-secondary-test:2889:3889
server.3=data-node-1-test:2890:3890
-------
Explanation:
---------
tickTime:
-----------
tickTime---> This is in milliseconds
initLimit X ticktime = 20000ms ( 20 seconds)
This means that, anytime a quorum member comes in, it has 20 seconds to download the data initally. If it could not download within 20 seconds, it is timed out.
SyncLimit:
-------------
syncLimit X ticktime = 10000 milliseconds ( 10 seconds).
If a follower, not able to connect to leader in 10 seconds then master is going to be considered as dead and election process takes place.
Generally synclimit lesser than initlimit. Because, it is going to take more time to download the data initally.
Ports:
-------
2888 --> This is peer to peer port
3888---> Leader election port.
-------
STEP 3:
-------
Create my id file:
vim /usr/local/zookeeper/data/myid
Put, the number "1".
Perform STEP 1, 2 and 3 on other two nodes "192.168.151.222" and "192.168.151.223", with only difference of:
In "192.168.151.222"
vim /usr/local/zookeeper/data/myid ---> Put number "2"
In "192.168.151.223"
vim /usr/local/zookeeper/data/myid ---> Put number "3"
STEP 4: Starting zookeeper
Run the below command on all nodes.
----
zkServer.sh start
----
To check status
------
zkServer.sh status
------
Connecting to shell:
--------
zkCli.sh -server hadoop-master-test:2181
> help
--------
I will discuss more about zookeeper management in next post.
Keep reading :)
Zookeeper is a distributed co-ordination service for distributed application. It is a centralized repository where applications can put data in and out. Its general role is synchronization,serialization and co-ordination.
In a cluster, normally zookeeper is run on odd number 3 or 5 etc. The reason for opting for odd number is for majority and to prevent split brain scenario. This will prevent data inconsistencies.
Scenario: We have a 3 node cluster as below:
-----
hadoop-master-test 192.168.151.221
hadoop-secondary-test 192.168.151.222
data-node-1-test 192.168.151.223
-------
We have to setup a zookeeper cluster using these 3 nodes
Installation steps:
Step 1: Download the package from the apache's offical website: http://www.eu.apache.org/dist/zookeeper/
Here, I downloaded the version zookeeper-3.4.6 to "/usr/local" of server "192.168.151.221" .
wget http://www.eu.apache.org/dist/zookeeper/stable/zookeeper-3.4.6.tar.gz
Extract the tarball and rename the folder to "zookeeper" :
--------
cd /usr/local/
tar -xvzf zookeeper-3.4.6.tar.gz
mv zookeeper-3.4.6.tar.gz zookeeper
--------
Now, create a directory call "/usr/local/zookeeper/data/" which will be the data directory for zookeeper.
mkdir /usr/local/zookeeper/data/
Append to the PATH variable:
export PATH=$PATH:/usr/local/zookeeper/bin
Step 2: Editing zookeeper configuration files:
vim /usr/local/zookeeper/conf/zoo.cfg
-------
dataDir=/usr/local/zookeeper/data/
dataLogDir=/usr/local/zookeeper/logs/
clientPort=2181
tickTime=2000
initLimit=10
syncLimit=5
server.1=hadoop-master-test:2888:3888
server.2=hadoop-secondary-test:2889:3889
server.3=data-node-1-test:2890:3890
-------
Explanation:
---------
tickTime:
-----------
tickTime---> This is in milliseconds
initLimit X ticktime = 20000ms ( 20 seconds)
This means that, anytime a quorum member comes in, it has 20 seconds to download the data initally. If it could not download within 20 seconds, it is timed out.
SyncLimit:
-------------
syncLimit X ticktime = 10000 milliseconds ( 10 seconds).
If a follower, not able to connect to leader in 10 seconds then master is going to be considered as dead and election process takes place.
Generally synclimit lesser than initlimit. Because, it is going to take more time to download the data initally.
Ports:
-------
2888 --> This is peer to peer port
3888---> Leader election port.
-------
STEP 3:
-------
Create my id file:
vim /usr/local/zookeeper/data/myid
Put, the number "1".
Perform STEP 1, 2 and 3 on other two nodes "192.168.151.222" and "192.168.151.223", with only difference of:
In "192.168.151.222"
vim /usr/local/zookeeper/data/myid ---> Put number "2"
In "192.168.151.223"
vim /usr/local/zookeeper/data/myid ---> Put number "3"
STEP 4: Starting zookeeper
Run the below command on all nodes.
----
zkServer.sh start
----
To check status
------
zkServer.sh status
------
Connecting to shell:
--------
zkCli.sh -server hadoop-master-test:2181
> help
--------
I will discuss more about zookeeper management in next post.
Keep reading :)
No comments:
Post a Comment
Note: only a member of this blog may post a comment.