Yarn临时目录不足导致Hive任务失败

从一张已有的Hive Table中创建新表及Partition出现如下问题

  1. 原有Hive Table中有160g数据(为三个月中所有应用和服务器的用户访问记录)
  2. 新表选取需要字段,并按照应用/服务器Ip/访问时间创建Partition
  3. //创建table set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; CREATE TABLE IF NOT EXISTS app_trace( trace_id string, client_ip string, user_device string, user_id string, user_account string, org_id string, org_name string, org_path string, org_parent_id string, url string, completed boolean, cost int, create_time bigint, parameters map<string,string>, subtrace array<string> ) PARTITIONED BY (app_id int,server_ip string,create_date string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\|' COLLECTION ITEMS TERMINATED BY '\$' MAP KEYS TERMINATED BY '\:' STORED AS SEQUENCEFILE //加载数据 insert OVERWRITE table app_trace partition(app_id,server_ip,craete_date) select trace_id, client_ip, user_device, user_id, user_account, org_id, org_name, org_path, org_parent_id, url, completed, cost, create_time, parameters, subtrace, app_id, server_ip, create_date from user_trace;
  4. Hive错误信息 写道
    Task with the most failures(4):
    -----
    Task ID:
    task_1418272031284_0203_r_000071

    URL:
    http://HADOOP-5-101:8088/taskdetails.jsp?jobid=job_1418272031284_0203&tipid=task_1418272031284_0203_r_000071
    -----
    Diagnostic Messages for this Task:
    Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in InMemoryMerger - Thread to merge in-memory shuffled map-outputs
    at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
    Caused by: org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
    at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:221)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
    at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
    at java.io.DataOutputStream.write(DataOutputStream.java:107)
    at org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:88)
    at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
    at java.io.DataOutputStream.write(DataOutputStream.java:107)
    at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:250)
    at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:208)
    at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl$InMemoryMerger.merge(MergeManagerImpl.java:476)
    at org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
    Caused by: java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:345)
    at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:219)
    ... 11 more


    FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
    MapReduce Jobs Launched:
    Job 0: Map: 282 Reduce: 80 Cumulative CPU: 12030.1 sec HDFS Read: 79178863622 HDFS Write: 15785449373 FAIL
    Total MapReduce CPU Time Spent: 0 days 3 hours 20 minutes 30 seconds 100 msec

经过排查,发现

  1. HDFS存储正常
    [jyzx@HADOOP-5-101 main_disk]$ hdfs dfs -df -h
    Filesystem Size Used Available Use%
    hdfs://HADOOP-5-101:8020 8.9 T 625.9 G 7.8 T 7%
  2. DataNode本地存储异常
    [jyzx@HADOOP-5-101 main_disk]$ df -h
    Filesystem Size Used Avail Use% Mounted on
    /dev/mapper/VolGroup-lv_root
    50G 46G 837M 99% /
    tmpfs 7.8G 56K 7.8G 1% /dev/shm
    /dev/cciss/c0d0p1 485M 32M 428M 7% /boot 
  3. 具体出现问题的目录
    /hadoop/yarn/local/usercache

    [root@HADOOP-6-199 local]# du -h --max-depth=1
    4.0K ./usercache_DEL_1411698127772
    4.0K ./usercache_DEL_1411700964513
    4.0K ./usercache_DEL_1411713191383
    4.0K ./usercache_DEL_1418272057670
    4.0K ./usercache_DEL_1411699568217
    628K ./filecache
    4.0K ./usercache_DEL_1411713338641
    7.2G ./usercache
    4.0K ./usercache_DEL_1411698079868
    4.0K ./usercache_DEL_1411713240205
    104K ./nmPrivate
    7.2G .
  4. /hadoop/yarn/local/usercache
    是yarn的node-manager本地目录
    yarn.nodemanager.local-dirs=/hadoop/yarn/local/usercache
     

解决方法

  • 只需要修改yarn的配置yarn.nodemanager.local-dirs,指定到更大的存储上即可
  • yarn.nodemanager.local-dirs=/mnt/disk1/hadoop/yarn/local/usercache
  • 重启yarn集群

 

你可能感兴趣的:(hadoop,hive,yarn)