Thursday 28 May 2015

PART 2: HADOOP OPTIMIZATION: MEMORY RELATED PARAMETER

Parameter : mapred.child.java.opts 

This is a parameter for JVM tuning.

The default value is -Xmx200m, which gives each child task thread 200 MB of memory. You can increase this value if the job is large, but make sure it will not swap, which will reduce performance.

The value for above parameter will depend on total mapper and reducer task per tasktracker. If you need to know more about these parameters, refer to following post:
--------
http://www.maninmanoj.com/2015/05/part-1-hadoop-optimization-cpu-related.html
--------

Some tips to calculate this parameter are:

Scenario 1: 

1) Lets consider there are 3 mappers and 3 reducer per tasktracker with 16GB total RAM in each machine.
2) In this scenario there will be total 6 tasks running in any tasktracker.
3) Let us imagine 2-4 GB RAM is required for Tasktracker to perform other jobs, so there is about ~12GB RAM available for Hadoop Tasks
4) Now we can divide 12/6 and get 2GB per task RAM.
5) The value in this case will be -Xmx2048M

Scenario 2:

1) Let us consider there are 12 mappers and 4 reducer per tasktracker with 32GB total RAM
2) In this scenario there will be total 16 tasks running in a tasktracker.
3) Let us imagine about 2-4 GB RAM is required for Tasktracker to perform other jobs, so there is about ~28GB RAM available for Hadoop Tasks
4) Now we can divide 28/16 and get 1.75GB per task RAM
5) The value in this case will be -Xmx1750M

The value to be added in "mapred-site.xml" is as below:
----------
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx1750m</value>
</property>
----------

In my next post, I will explain about Disk I/O related parameter.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.