Friday 23 March 2018

How to troubleshoot a failed job in yarn


This article will cover the basic workflow of a MapReduce job with an example on how to troubleshoot the failing MapReduce jobs. We will explain in detail about how a NodeManager runs and maintains all of the container logs.

Job Client

Job Client is used by the user to facilitate execution of the MapReduce job. When a user writes a MapReduce job they will typically invoke job client in their main class to configure and launch the job. In this example, we will be using SleepJob to cause mapper tasks to sleep for an extended period of time so we can see how the NodeManager is executing the jobs. 

The function call job.WaitForCompletion will first create /user/gpadmin/.staging directory if it does not exist and create job.xml job.<timestamp>.conf.xmlcontaining all the Hadoop params used to execute the job.
It also uploads "hadoop-mapreduce-client-jobclient.jar" jar file used in the Hadoop jar command into this directory renaming it to job.jar. "job.jar" will then be used by all the containers to execute the MapReduce job. 

Note.staging directory will be created under the path /user/${username}. In this article, gpadmin is the user.

After .staging is created, job client will submit the job to the resource manager service (application manager port 8032). Then job client will continue to monitor the execution of the job and report back to the console with the progress of the map and reduce containers. That is why you see the "map 5% reduce 0%" while the job is running. Once the job completes, job client will return some statistics about the job that it collected during execution. Remember that job client gets map and reduce container statuses from the Application Master directly. We will talk a bit more about that later but for now here is an example of running the sleep job, so it hangs for a really long time while we observe the map containers execute. 

Note: You can kill the MapReduce job using the following command:

Application Master

Once the application manager service has decided to the start running the job, it then chooses one of the NodeManagers to launch the MapReduce application master class, which is called
The application master service will be launched on one of the NodeManager servers running in the environment. The NodeManager selected by the resource manager is largely dependent on the available resources within the cluster. The Node manager service will generate shell scripts in the local application cache, which are used to execute the application master container.  
Here we see application master server container directory located in the NodeManagers ${yarn.nodemanager.local-dirs} defined in yarn-site.xml

The container executer class running in the NodeManager service will then use to execute the Application Master class. As per below you can see all logs for stdout and stderr are getting redirected to ${yarn.nodemanager.log-dirs} defined in yarn-site.xml

[gpadmin@hdw3 yarn]# tail -1 nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_1389385968629_0025_01_000001/ exec /bin/bash -c "$JAVA_HOME/bin/java -Dhadoop.root.logger=INFO,CLA  -Xmx1024m 1>/data/dn/yarn/userlogs/application_1389385968629_0025/container_1389385968629_0025_01_000001/stdout 2>/data/dn/yarn/userlogs/application_1389385968629_0025/container_1389385968629_0025_01_000001/stderr  "

Once launched the Application Master will issue resource allocation requests for the map and reduce containers in the queue to the ResourceManager service. When the resource manager determines that there are enough resources on the cluster to grant the allocation request, it will inform the application master, which NodeManager service is available to execute the container. The application master will send a request to the NodeManager to launch the container.

Map or Reduce Container

The container executer class in the NodeManager will do the same for a map or reduce container as it did with the Application Master class. All files and shell scripts will be added into the containers application class within the nm-local-dir


Note: job.jar is only a soft link that points to the actual job.jar in the applications filecache directory. This is how yarn handles distributed cache for containers:

[root@hdw1 yarn]# ls -l nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_1389385968629_0025_01_000003/
total 96
-rw-r--r-- 1 yarn yarn   108 Feb  1 15:25 container_tokens
-rwx------ 1 yarn yarn   450 Feb  1 15:25
lrwxrwxrwx 1 yarn yarn   122 Feb  1 15:25 job.jar -> /data/dn/yarn/nm-local-dir/usercache/gpadmin/appcache/application_1389385968629_0025/filecache/4395983903529068766/job.jar
-rw-r----- 1 yarn yarn 76430 Feb  1 15:25 job.xml
-rwx------ 1 yarn yarn  2898 Feb  1 15:25
drwx--x--- 2 yarn yarn  4096 Feb  1 15:25 tmp 

Note: By setting this param, the above container launches scripts and user cache will remain on the system for a specified period of time; otherwise these files get deleted after application completes. 


Resulting log file locations
During run time you will see all the container logs in the ${yarn.nodemanager.log-dirs} 

[root@hdw3 yarn]# find userlogs/ -print

Once the job has completed the NodeManager will keep the log for each container for ${yarn.nodemanager.log.retain-seconds} which is 10800 seconds by default ( 3 hours ) and delete them once they have expired.  But if ${yarn.log-aggregation-enable} is enabled then the NodeManager will immediately concatenate all of the containers logs into one file and upload them into HDFS in ${yarn.nodemanager.remote-app-log-dir}/${}/logs/<application ID> and delete them from the local userlogs directory. Log aggregation is enabled by default in PHD and it makes log collection convenient. 

Example when log aggregation is enabled. We know there were 4 containers executed in this MapReduce job because "-m" specified 3 mappers and the fourth container is the application master. Each NodeManager got at least one container so all of them uploaded a log file. 
[gpadmin@hdm1 ~]$ hdfs dfs -ls /yarn/apps/gpadmin/logs/application_1389385968629_0025/
Found 3 items
-rw-r-----   3 gpadmin hadoop       4496 2014-02-01 16:54 /yarn/apps/gpadmin/logs/application_1389385968629_0025/hdw1.hadoop.local_30825
-rw-r-----   3 gpadmin hadoop       5378 2014-02-01 16:54 /yarn/apps/gpadmin/logs/application_1389385968629_0025/hdw2.hadoop.local_36429
-rw-r-----   3 gpadmin hadoop    1877950 2014-02-01 16:54 /yarn/apps/gpadmin/logs/application_1389385968629_0025/hdw3.hadoop.local_42186


Targeting failed mapped tasks 

Attached to this article there is a modified version of the apache SleepJob MapReduce job. This job can be modified so the mapper tasks will randomly fail 25% of the time, just so we can see how to locate the logs for the failed map container. If you try to run this yourself job might fail if the same map task fails to execute for 3 times.
The following code was introduced to the Sleepmapper class:
      // Introduce a 25% chance of failing the maptask"I am checking to see if i should punch myself and fail");
      Random rand = new Random();
      int  n = rand.nextInt(4);
      if ( n == 1 ) {"KAPOW!! owwwwwyy... i failed");
      }"Punching myself seems like a bad idea.. i will just go to sleep instead");
  1. We see that containers are starting to fail part way through the job. However they get retried and the MapReduce job moves on.
    14/02/01 17:05:42 INFO client.YarnClientImpl: Submitted application application_1389385968629_0026 to ResourceManager at hdm1.hadoop.local/
    14/02/01 17:05:42 INFO mapreduce.Job: The url to track the job: http://hdm1.hadoop.local:8088/proxy/application_1389385968629_0026/
    14/02/01 17:05:42 INFO mapreduce.Job: Running job: job_1389385968629_0026
    14/02/01 17:05:52 INFO mapreduce.Job: Job job_1389385968629_0026 running in uber mode : false
    14/02/01 17:05:52 INFO mapreduce.Job:  map 0% reduce 0%
    14/02/01 17:06:28 INFO mapreduce.Job:  map 10% reduce 0%
    14/02/01 17:06:29 INFO mapreduce.Job:  map 15% reduce 0%
    14/02/01 17:06:29 INFO mapreduce.Job: Task Id : attempt_1389385968629_0026_m_000007_0, Status : FAILED
    14/02/01 17:06:30 INFO mapreduce.Job:  map 20% reduce 0%
    14/02/01 17:06:31 INFO mapreduce.Job:  map 25% reduce 0%
    14/02/01 17:06:31 INFO mapreduce.Job: Task Id : attempt_1389385968629_0026_m_000008_0, Status : FAILED
  2. Let us take a closer look at the attempt 8 and find out why this application fails. First we need to download the container logs from HDFS. You can cut the logs directly from HDFS if they are too large to download.
    [gpadmin@hdm1 logs]$ hdfs dfs -copyToLocal /yarn/apps/gpadmin/logs/application_1389385968629_0026/* ./
    [gpadmin@hdm1 logs]$ ls
    hdw1.hadoop.local_30825  hdw2.hadoop.local_36429  hdw3.hadoop.local_42186
  3. Search for attempt id 8. Sometimes the container executer will put non-asci data into the files so grep -a will allow text search of a binary file. We first grep for "to attempt_1389385968629_0026_m_000008_0" to see what container was assigned this attempt ID. We find that hdw1 is running the application master service so it has all the history related to this attempt. It is important to know that the application master can run on the same node as the map container.
    [gpadmin@hdm1 logs]$ egrep -a "to attempt_1389385968629_0026_m_000008_0" *
    hdw1.hadoop.local_30825:2014-02-01 17:03:30,509 INFO [RMCommunicator Allocator] Assigned container container_1389385968629_0026_01_000010 to attempt_1389385968629_0026_m_000008_0
  4. Then check hdw1 application master log to find out where container 10 was launched. We learn this container was running on hdw2. So let's find the details in the hdw2 log file.
    [gpadmin@hdm1 logs]$ egrep -a "container_1389385968629_0026_01_000010 on NM" hdw1.hadoop.local_30825
    2014-02-01 17:03:30,864 INFO [AsyncDispatcher event handler] TaskAttempt: [attempt_1389385968629_0026_m_000008_0] using containerId: [container_1389385968629_0026_01_000010 on NM: [hdw2.hadoop.local:36429] 
  5. hdw2 log file shows the root cause. This map task decided to punch himself (see the code in bold). It is always good to check the container logs for failed maps or reduce tasks because sometimes the java stack trace reported by job client to the user is from a subsequent error that is not related to the original cause of the fault.
    (container_1389385968629_0026_01_000010 stderr stdout Fsyslog 2014-02-01 17:03:48,809 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
    2014-02-01 17:04:11,481 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: eng.pivotal.hadoop.SleepJobFail$EmptySplit@61ab196b
    2014-02-01 17:04:11,580 INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    2014-02-01 17:04:13,883 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 134217724(536870896)
    2014-02-01 17:04:13,883 INFO [main] org.apache.hadoop.mapred.MapTask: 512
    2014-02-01 17:04:13,883 INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 429496736
    2014-02-01 17:04:13,883 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 536870912
    2014-02-01 17:04:13,883 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 134217724; length = 33554432
    2014-02-01 17:04:13,953 INFO [main] eng.pivotal.hadoop.SleepJobFail$SleepMapper: I am checking to see if i should punch myself and fail
    2014-02-01 17:04:13,953 INFO [main] eng.pivotal.hadoop.SleepJobFail$SleepMapper: KAPOW!! owwwwwyy... i failed

Some notes :

How to collect the logs for an yarn log from command line:

yarn logs -applicationId  <Application-id>

yarn logs -applicationId application_1520387241676_83284

How to submit a pi job to Marketing queue:-

hadoop jar /usr/hdp/ pi -Dmapreduce.job.queuename=Marketing 10 10

No comments:

Post a Comment

Note: only a member of this blog may post a comment.