尊敬的读者您好:笔者很高兴自己的文章能被阅读,但原创与编辑均不易,所以转载请必须注明本文出处并附上本文地址超链接以及博主博客地址:https://blog.csdn.net/vensmallzeng。若觉得本文对您有益处还请帮忙点个赞鼓励一下,笔者在此感谢每一位读者,如需联系笔者,请记下邮箱:[email protected],谢谢合作!
1、利用hive构建一个自己的map
select
mid,
hotel_type_map['1303'] as hotel_type,
from(
select
mid,str_to_map(concat_ws(',',collect_list(concat(key,":",value)))) as hotel_type_map
from (
select
mid,
key,
concat_ws('-',collect_list(value))as value
from
(
select 1 as mid,'1303' as key,'特色住宿' as value
union ALL
select 1 as mid,'1305' as key,'别墅' as value
union ALL
select 1 as mid,'1304' as key,'青年旅' as value
union ALL
select 1 as mid,'1306' as key,'客栈' as value
union ALL
select 1 as mid,'682' as key,'农家乐' as value
union ALL
select 1 as mid,'681' as key,'民宿' as value
union ALL
select 1 as mid,'680' as key,'公寓' as value
union ALL
select 1 as mid,'679' as key,'酒店' as value
) a
group by mid,key)t1
group by mid) t2
2、数据倾斜造成原因
①:空值比较多 (可以考虑用nvl进行处理)
②:大表join小表某类key值过多 (可以考虑换spark跑)
③:group by时某key对应的value过多(可以考虑分开group by, 在join)
3、SQL代码跑着跑着报错原因
①:内存过小
需要在代码最前方加入
set mapreduce.reduce.memory.mb=30064;
set mapreduce.map.memory.mb=20064;
set mapred.map.child.java.opts="-Xmx10240M";
set mapred.reduce.child.java.opts="-Xmx10240M";
set mapred.child.java.opts="-Xmx10240m";
set mapred.reduce.shuffle.memory.limit.percent = 0.06;
set mapred.reduce.shuffle.input.buffer.percent = 0.1;
②:数据倾斜问题
③:代码本身存在潜在错误或者写了冗余的嵌套,此时需要该查错查错,该优化优化。
④:代码已精简,但是很长,建议可以采用建临时表的思路降低计算量试试。
⑤:对空值较多的字段使用某些功能函数,例如"dense_rank() over( partition by t3.card_number order by t3.execute_time desc" t3.execute_time字段的空值很多,此时需要通过where条件(where executetime is not null and executetime <> '')提前过滤掉空值,然后再使用功能函数。
4、“LATERAL VIEW explode”的使用
使用了“LATERAL VIEW explode”的查询语句,最好是单独写,不要和其他字段查询语句混在写,不然相当于去掉了某字段为空的情况,例如:LATERAL VIEW explode (split(substring(hotel_brand_id,2,length(hotel_brand_id)-2),',')) ids as brand_id,这里潜在包含了去掉hotel_brand_id字段为空的情况 。
日积月累,与君共进,增增小结,未完待续。