Tuesday, 24 November 2015

Hadoop: Hbase and HDFS data replication

Recently, I came across a scenario where I had a two slaves and one master hadoop cluster. The replication factor of the cluster was set as three.

It is a known fact that, the replication factor of a cluster should be at max equal to the number of slaves in the cluster. In my case, it should have been two but the cluster had a replication factor set as three. Cluster was unstable because of this mismatch.

So, I changed the dfs.replication property in hdfs.site.xml as below:

Property:
----------
<property>
    <name>dfs.replication</name>
    <value>2</value>
    <description>Replication factor.</description>
</property>
-----------

Now, after making the changes restart the cluster. This will make sure that, all the files written to HDFS from now on will have a replication factor of three.

We need to run setrep command to change the replication factor of all files, which are already stored in HDFS.

Command:  
------------
hadoop fs -setrep -R -w 2 /
-------------

The above command will change the replication factor of all files that were stored in hdfs to two, remember it was three before.

Relation: HBase and HDFS replication
--------------

HBase is no-sql data base which is under-lyingly used HDFS. As the underlying file system is HDFS, HBase leverages the benefits provided by the HDFS.

Default replication factor of HDFS is three hence if you create a HBase table and put some data on it, the data written on the HDFS and HDFS created three copies of that data.

Scenario: under-replicated blocks increasing on changing replication factor.

In my case replication factor of hadoop cluster has been changed to two and hbase is running on top of it. The default replication factor that HBase uses is three, hence hadoop's fsck command will report under replicated blocks and this will keep on increasing during the course of time.

To avoid this problem we need to bind the Hbase's replication factor with Hadoop's replication factor. We can do this by setting  dfs.replication property to hbase-site.xml as below:

----------
<property>
    <name>dfs.replication</name>
    <value>2</value>
    <description>Replication factor.</description>
</property>
-----------

Once done, restart the cluster. You will see no more under-replicated blocks.

Kool :) Keep reading..

Ref: http://amitstechnicalblog.blogspot.in/2013/02/understand-hbase-data-replication.html

No comments:

Post a comment

Note: only a member of this blog may post a comment.