Thursday, 14 May 2015

Configuring replication factor and block size in hadoop cluster

In this post, I will explain about replication factor and block size of HDFS.

CASE 1: REPLICATION FACTOR:

HDFS stores files as data blocks and distribute these blocks across the entire cluster. Replication property will help us to change the replication of data blocks across HDFS. For example, if the replication factor is set to 3 (default value in HDFS) there would be one original block and two replicas.

Inorder to change the replication factor, follow the steps below:

STEP 1: Open the hdfs-site.xml file( usually found in the conf/ folder of the Hadoop installation directory),

STEP 2: Add the following property to hdfs-site.xml:

<property>
<name>dfs.replication</name>
<value>3</value>
<description>Block Replication</description>
</property>

Now onwards, all the files that are being copied to HDFS, will have a replication factor of 3.

Scenario: In some cases, we may have to change the replication factor of a file, that is already present in HDFS.

[root@MANINMANOJ ~]$ hadoop fs –setrep –w 3 /path/to/file


In the same way, we can change the replication factor of all the files present in a directory.

[root@MANINMANOJ ~]$ hadoop fs –setrep –w 3 -R /path/to/dir

CASE 2: BLOCK SIZE

The block size of a file in my laptop is 4KB, but a block present in HDFS is 64MB. This means that, HDFS is designed to manage and store large data.

For example, if a cluster is using a block size of 64 MB, and a 256-MB file was copied to HDFS. Now, HDFS would split the file into  (256 MB/64 MB)=4 blocks and distribute the four blocks to the data nodes in the cluster.

Normally the block size is 64 MB, and if you need to change it to 128 MB, then follow the steps below:


STEP 1: Open the hdfs-site.xml file. This file is usually found in the conf/ folder of the Hadoop installation directory.

STEP 2: Set the following property. Note that, the value "134217728 bytes = 128 MB"

<property>
<name>dfs.block.size</name>
<value>134217728</value>
<description>Block size</description>
</property>

Now onwards, all the files that is copied to HDFS will have a block size of 128 MB. Please note that, it will only affect the block size of files placed into HDFS from now on.

NOTE: YOU DON'T HAVE TO RESTART CLUSTER TO MAKE THIS EFFECTIVE.

Kool :)

No comments:

Post a comment

Note: only a member of this blog may post a comment.