hivesql的几种优化的方法

目前项目中由于数据量巨大,导致一个sql要跑一个多小时,所以找了些可以优化的点,记录下来,以后方便使用。

1.map的优化,job在map task的过程时间较长

set mapreduce.map.memory.mb=8240;
set mapreduce.reduce.memory.mb=8240;
set hive.merge.mapfiles=false;
set mapreduce.input.fileinputformat.split.maxsize=50000000;
set hive.exec.max.created.files=300000;

2.当把一个非分区表的数据,插入到一个分区表中的配置



set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
set hive.merge.mapfiles = true;
set hive.merge.mapredfiles = true;
set mapreduce.map.memory.mb=4096;
set mapreduce.reduce.memory.mb=4096;
set mapred.reduce.slowstart.completed.maps=1;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=100000;
set hive.exec.max.dynamic.partitions=100000;
set hive.exec.max.created.files=300000;



例如: 对于分区字段也可以使用 distribute by 分区字段
set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
set hive.merge.mapfiles = true;
set hive.merge.mapredfiles = true;
set mapreduce.map.memory.mb=4096;
set mapreduce.reduce.memory.mb=4096;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=100000;
set hive.exec.max.dynamic.partitions=100000;
set mapred.reduce.slowstart.completed.maps=1;
set hive.exec.max.created.files=300000;
insert overwrite table  dwd.indicator_system partition (dt)  
select * from temp.mbrq_qd_
distribute by dt;

3.动态分区直接可以把下面一起作为优化的设置

set mapreduce.map.memory.mb=8240;
set mapreduce.reduce.memory.mb=8240;
set hive.merge.mapfiles=false;
set hive.exec.dynamic.partition.mode=nonstrict;
set mapreduce.input.fileinputformat.split.maxsize=50000000;
set hive.exec.max.dynamic.partitions=500000;
set hive.exec.max.dynamic.partitions.pernode=500000;
set hive.exec.max.created.files=300000;

4.适用大数据量的初始化时的参数设置:

set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
set mapred.max.split.size=256000000;   -- 256M
set mapred.min.split.size.per.node=100000000;  -- 100M
set mapred.min.split.size.per.rack=100000000;  -- 100M
set hive.merge.mapfiles = true;
set hive.merge.mapredfiles = true;
set hive.merge.size.per.task = 256000000;   -- 256M
set hive.merge.smallfiles.avgsize=16000000;   -- 16M
set mapreduce.map.memory.mb=4096;
set mapreduce.reduce.memory.mb=4096;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.dynamic.partitions=10000;
set mapred.reduce.slowstart.completed.maps=1;
set mapreduce.job.running.map.limit=2000;
--设置hive单个任务最大reduce并发数量
set mapreduce.job.running.reduce.limit=500;

你可能感兴趣的:(Hive,hive)