今天有个需求,就是:指定200W用户(表meids_tmp),把这些用户最近15天的应用使用数据(表tb_yl_upload_info,按天分区)转移到另外一张表中(表upload_info_sub,按天分区)。
很直观,meids_tmp表63M,可以使用map端连接;要求目标表数据按日期组织,自然想到动态分区,使数据插入时自动按日期写入。最终,得到如下sql:
SELECT /*+mapjoin(b)*/ a.sn, a.isenabled, a.netflow, a.icount, a.pkg_name, a.isdelete, a.label, a.vername, a.vercode , a.ispreset, a.apksize, a.firsttime, a.usagetime, a.uptime, a.parsed, a.aid, a.pkgid, a.meid, a.wifi, a.mobile, a.batchid, a.parsed1, a.model, a.install_type, a.ds AS ds FROM tb_yl_upload_info a JOIN meids_tmp b ON (a.ds>='2015-09-05' AND a.ds<'2015-09-15' AND a.meid=b.meid);
首先配置好HIVE动态分区的参数:
set hive.exec.dynamic.partition.mode=nonstrict; set hive.exec.dynamic.partition=true;
但是运行之后,所有的map均失败,没有一个正常完成。命令行一直显示如下信息直到失败退出:
2015-09-22 13:41:40,794 Stage-1 map = 0%, reduce = 0% 2015-09-22 13:42:42,836 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 8708.64 sec 2015-09-22 13:43:43,635 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 12395.12 sec 2015-09-22 13:44:44,877 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 12781.82 sec 2015-09-22 13:45:45,270 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 12781.82 sec 2015-09-22 13:46:45,408 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 12781.82 sec 2015-09-22 13:47:47,019 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 12781.82 sec 2015-09-22 13:48:48,130 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 12781.82 sec 2015-09-22 13:49:48,657 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 12781.82 sec 2015-09-22 13:50:49,195 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 12781.82 sec
最先想到的就是mapjoin出的问题,显式加上mapjoin相关的参数:
set hive.auto.convert.join.noconditionaltask.size=200000000; set hive.mapjoin.smalltable.filesize=200000000; set hive.auto.convert.join.noconditionaltask=true; set hive.ignore.mapjoin.hint=false;//有以上三个配置了,其实这个配置是多余的 set mapred.max.split.size=100000000;//把每个map的输入数据配小点(默认bolcksize:256M)
再次跑程序,还是失败,现象完全一样。最后想到,既然map失败,那就去看看每个map的jvm到底在做什么:jmap -heap 查看每个map的内存使用情况,jstat -gc 查看每个map任务垃圾回收是否正常,jstack 查看每个map任务的工作线程,垃圾回收线程到底处于什么状态。最后挑选其中一个nodemanager上的map任务进行查看,结果部分 如下:
JMap -heap :
Heap Usage: PS Young Generation Eden Space: capacity = 262668288 (250.5MB) used = 262668288 (250.5MB) free = 0 (0.0MB) 100.0% used From Space: capacity = 43515904 (41.5MB) used = 0 (0.0MB) free = 43515904 (41.5MB) 0.0% used To Space: capacity = 43515904 (41.5MB) used = 0 (0.0MB) free = 43515904 (41.5MB) 0.0% used PS Old Generation capacity = 699400192 (667.0MB) used = 699265240 (666.8712997436523MB) free = 134952 (0.12870025634765625MB) 99.98070460924323% used
Jstat -gc:(一次Full垃圾回收很耗时,结合JMap -heap 的结果:老年代一直处于99.9%的使用率,说明map一直在进行垃圾回收,所以map工作现程根本没跑)
YGC YGCT FGC FGCT GCT 2 0.366 4 52.569 52.934 2 0.366 4 52.569 52.934 2 0.366 4 52.569 52.934 2 0.366 4 79.129 79.495 2 0.366 5 79.129 79.495 2 0.366 5 79.129 79.495 2 0.366 5 79.129 79.495 2 0.366 5 79.129 79.495 2 0.366 5 79.129 79.495
jstack结果(工作线程状态为blocked,多试几次发现一直是BLOCKED,所以也呼应了上面jstat和jmap的结果):
Thread 5863: (state = BLOCKED) - java.util.ArrayList.<init>(int) @bci=37, line=144 (Compiled frame) - org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer.<init>() @bci=25, line=48 (Compiled frame) - org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(java.io.ObjectInputStream) @bci=141, line=78 (Compiled frame) - org.apache.hadoop.hive.ql.exec.mr.HashTableLoader.load(org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainer[], org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe[]) @bci=208, line=98 (Interpreted frame) - org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable() @bci=78, line=155 (Interpreted frame) - org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp() @bci=17, line=181 (Interpreted frame) - org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() @bci=1, line=1032 (Interpreted frame) - org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() @bci=41, line=1036 (Interpreted frame) - org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() @bci=41, line=1036 (Interpreted frame) - org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() @bci=41, line=1036 (Interpreted frame)
解决办法:既然是内存不够用,那就增加map的内存再说,在hive的命令行设置如下参数:
set mapreduce.map.memory.mb=2000;//mapred-sit.xml里是1100 set mapreduce.map.java.opts=-Xmx1900m -Xms1900m -Xmn200m;
然后在运行sql语句,发现成功了。至此问题解决。但是有一个问题就是,其实map的内存本来设置的值为1G,已经相当大了,还存在浪费,为什么这次一定要这么大的内存配置呢?其实,解决办法也很简单,就是把map任务的jvm heap给dump出来:jmap -dump 。然后用可视化的jconsole等工具,具体查看就行。但是我没时间去分析。初步估计一下,应该是tb_yl_upload_info表里有很多中文字符,而且本来这张表的字段就很多,java jvm里存储字符串本来就要很多额外开销,因此会导致内存不够用。另一方面,map中配置的一些参数如mapreduce.task.io.sort.mb也会消耗掉一部分内存,所以。。。总之,问题都是有原因的。