Wednesday 27 May 2015

PART 1: HADOOP OPTIMIZATION : CPU RELATED PARAMETERS

Performance tuning and bench marking is an interesting area for any Hadoop Administrators. Inorder to find the bottleneck, the administrator should have a through knowledge on hadoop properties. There are numerous properties in Hadoop. But, there are quit a few which plays an important role in finding bottlenecks. We will discuss about those in this post and the posts which follows this one.

For performance tuning, we can consider four main parameters CPU, disk I/O, memory and network.

In this post, we will discuss on CPU related parameters:

----------------
mapred.tasktracker.map.tasks.maximum --->The maximum number of map tasks that will be run
simultaneously by a task tracker.

mapred.tasktracker.reduce.tasks.maximum--->The maximum number of reduce tasks that will be run
simultaneously by a task tracker.
-----------------

Before starting of with tuning, please go through the following post and make sure, you understand clearly about tuning tools.

--------
http://www.maninmanoj.com/2015/05/bench-marking-tools-in-hadoop.html
--------

Decide the maximum number of map/reduce tasks that will be run simultaneously by a task tracker. These two parameters are the most relative ones to CPU utilization. The default value of both parameters is 2.

Increasing their values according to the cluster condition increases the CPU utilization and therefore improves the performance.

For example, let us assume that each node of the cluster has 4 CPUs supporting simultaneous multi-threading, and each CPU has 2 cores, then the total number of daemons should be no more than 4x2x2=16. Considering DataNode and TaskTracker would take 2 slots, there are at most 14 slots for map/reduce tasks, so the best value is 7 for both parameters.


The property hence will look as below:

-----------
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>7</value>
<description>The maximum number of map tasks that will be run simultaneously by a task tracker.
</description>
</property>

<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>7</value>
<description>The maximum number of reduce tasks that will be run simultaneously by a task tracker.
</description>
</property>
-----------

I will discuss about the memory related parameters in my next post.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.