Oracle数据库大表更新优化记录

业务环境：为相应根据国家脱贫攻坚政策，年底达到完全脱贫的艰巨任务，全国上下齐心协力，为这一历史性艰巨任务砥砺前行。目前以接近尾声，陕西精准扶贫大数据平台做数据抽取和各种统计任务，平台所具有的特点是数据量非常大，因此在做相关数据抽取加上多表关联就会导致执行速度很慢，而且非常消耗系统资源。

业务场景：本次业务是有关大数据平台将帮扶措施中的9项措施反推更新大户表中，着两个表的数据量都特别大，措施大概有1900万条数据，户表当前年有140万条数据，历史大概有800万条数据。

image.png

从2020-12-03下午15:00开始执行相关查询并更新

--更新fpxm_type_id值至aa01_2014表中属性
update aa01_2014 
set fcfs = (select wm_concat(distinct fpxm_type_id) || ','as fpxm_type_id from tbl_fpxm_poor_cs_temp 
where aa01_2014.aaa001 = tbl_fpxm_poor_cs_temp.poor_id 
and tbl_fpxm_poor_cs_temp.data_year = to_char(sysdate,'yyyy') 
group by poor_id,data_year)         --现根据户ID分组，在根据年拼接出这一户在不同年份享受的政策编码，
where exists (select 1 from tbl_fpxm_poor_cs_temp where aa01_2014.aaa001 = tbl_fpxm_poor_cs_temp.poor_id)

结果到第二天早晨来执行了1100多分钟还是没有执行完毕，于是抛弃这个方式。分析原因因为这个更新在查询有用字段的时候使用两个表id关联，这就会导致在oracle扫描次数多达140万*1900万次，100亿次之多，执行到入土了也可能执行不完，于是开始优化。
昨天更新的执行报告，what fuck,查看执行报告如下：

image.png

优化

通过在网上查找资料发现在更新的时候可以换一种书写方式少一次对库的扫描，
如下：

update customers a   
set    city_name=(select b.city_name from tmp_cust_city b where b.customer_id=a.customer_id)
where  exists (select 1
              from   tmp_cust_city b
              where  b.customer_id=a.customer_id
             )
                 
update (select a.city_name,b.city_name as new_name
            from   customers a,
                   tmp_cust_city b
            where  b.customer_id=a.customer_id
           )
    set    city_name=new_name

但是我没有使用这个例子。

我这里采用的是数据库开启并行执行

概念：

一、oracle 并行执行
优势：强制启动并行进程、分配任务与系统资源、合并结果集。大大缩短计算时间。在大表查询等操作中能够起到良好的效果。在ODS系统中报表统计等方面更有使用意义。

劣势：比较消耗资源，不建议在系统超负荷运行的情况下使用。

注意事项：/+parallel(t,n)/中，t代表表别名或者表明（没有起别名情况）；n代表进程数量，一般值为：cpu数量-1。

例如：SELECT /+parallel(a,16)/ distinct a.comcode FROM statcmain a where a.underwriteenddate BETWEEN DATE'2011-1-1' AND DATE'2014-1-31';
一般而言主要在如下情况使用parallel HINT：

1.表的数据量很大,超过一千万;
2.数据库主机是多个CPU;
3.系统的当前负载较低;

merge into使用模板

merge into 目标表 a
using 源表 b
on(a.条件字段1=b.条件字段1 and a.条件字段2=b.条件字段2 ……)  
when matched then update set a.更新字段=b.字段
when  not matched then insert into a(字段1,字段2……)values(值1,值2……)

explain plan FOR merge into aa01_2014 a
                 using (select /*+parallel(tbl_fpxm_poor_cs_temp,16)*/ poor_id,wm_concat(distinct fpxm_type_id) || ','as fpxm_type_id 
                        from tbl_fpxm_poor_cs_temp left join aa01_2014 on aa01_2014.aaa001 =tbl_fpxm_poor_cs_temp.poor_id 
                        where tbl_fpxm_poor_cs_temp.data_year = to_char(sysdate,'yyyy') 
                        group by poor_id,data_year) temp
                 on (a.aaa001 = temp.poor_id) 
                 when matched then update set a.fcfs = temp.fpxm_type_id;
--查看执行报告
select plan_table_output from TABLE(DBMS_XPLAN.DISPLAY('PLAN_TABLE'));

执行时间如下图：
开启并行执行后执行报告：

image.png

结果对比：update和merge into 都更新100亿条记录，update耗时999:59:59，逻辑读消耗2282027；merge into 耗时04.38s，消耗逻辑读964.相差太大了。
其实看着执行计划，这个结果也很容易理解：update采用的类似nested loop的方式，对更新的每一行，都会对查询的表扫描一次；merge into这里选择的是hash join，
则针对每张表都是做了一次 full table scan，对每张表都只是扫描一次。

快速游标方式：

begin
  for tbl_fpxm_poor_cs_temp in (select poor_id,wm_concat(distinct fpxm_type_id) || ','as fpxm_type_id 
          from tbl_fpxm_poor_cs_temp left join aa01_2014 on aa01_2014.aaa001=tbl_fpxm_poor_cs_temp.poor_id
          where tbl_fpxm_poor_cs_temp.data_year = to_char(sysdate,'yyyy') 
          group by poor_id,data_year) loop
          update aa01_2014 set aa01_2014.fcfs = tbl_fpxm_poor_cs_temp.fpxm_type_id where aa01_2014.aaa001=tbl_fpxm_poor_cs_temp.poor_id;
          end loop;
end;

快速游标方式：

image.png

但是这种方式提示wm_concat(distinct fpxm_type_id)连接函数中不能使用distinct关键字，如图

image.png

于是修改查询语句

begin
  for tbl_fpxm_poor_cs_temp in (select poor_id,listagg(fpxm_type_id,',') within group( order by fpxm_type_id) || ',' as fpxm_type_id,data_year from 
                                       (select DISTINCT fpxm_type_id,data_year,poor_id from tbl_fpxm_poor_cs_temp where tbl_fpxm_poor_cs_temp.data_year = to_char(sysdate,'yyyy'))
                                        group by poor_id,data_year) loop
          update aa01_2014 set aa01_2014.fcfs = tbl_fpxm_poor_cs_temp.fpxm_type_id where aa01_2014.aaa001=tbl_fpxm_poor_cs_temp.poor_id;
          end loop;
end;

oracle更新大量数据太慢，可以通过游标实现的例子

declare cursor city_cur is
select t.new_customer_id,t.old_customer_id from
citsonline.crm_customer_tmp6 t
where t.new_customer_id!=t.old_customer_id
order by new_customer_id;
begin
for my_cur in city_cur loop

update platform.crm_service_customer_bak s
set s.customer_id=my_cur.new_customer_id
where s.customer_id=my_cur.old_customer_id;

/** 此处也可以单条/分批次提交，避免锁表情况 **/
if mod(city_cur%rowcount,1000)=0 then
dbms_output.put_line('----');
commit;
end if;
end loop;
commit;
end;

根据案例修改后的语句

declare cursor fpxm_cur is
               select /*+parallel(tbl_fpxm_poor_cs_temp,16)*/ poor_id,
                      listagg(fpxm_type_id,',') within group( order by fpxm_type_id) || ',' as fpxm_type_id,
                      data_year
               from 
                  --去除重复
                 (select DISTINCT fpxm_type_id,data_year,poor_id from tbl_fpxm_poor_cs_temp where  tbl_fpxm_poor_cs_temp.data_year = to_char(sysdate,'yyyy'))
               group by poor_id,data_year;
begin for tbl_fpxm_poor_cs_temp in fpxm_cur loop
          update aa01_2014 set aa01_2014.fcfs = tbl_fpxm_poor_cs_temp.fpxm_type_id where aa01_2014.aaa001=tbl_fpxm_poor_cs_temp.poor_id;
          /** 此处也可以单条/分批次提交，避免锁表情况 **/
          if mod(fpxm_cur%rowcount,1000)=0 then
          dbms_output.put_line('----');
          commit;
          end if;
          end loop;
          commit;
end;

image.png

可以看出执行速度是相当可观，但是如果同时去掉并行执行速度更快3s不知道为啥。
如下图：

image.png

案例

这今天在研究kettle工具，是一款国外纯java开发的开源ETL工具，抽取数据确实非常方便，大家有空可以去下载下来试试看，方便之处在于它不用安装，解压完了就能直接用了（必须提前配置jdk和jre环境到系统环境中）。今天要说的不是这款软件，问题是由使用这个软件引起的，我在抽取数据后需要完成一个更新操作语句如下：

update case_person_saxx a set a.case_id=(select id from case_xzcf b where b.app_id = a.app_id) ；

update invole_case_unit_saxx a set a.case_id=(select id from case_xzcf b where b.app_id = a.app_id)；

上面的语句中case_person_saxx表和case_xzcf 表中数据量大概在16万条左右，说起来也不是特别大，但是这个语句执行起来特别的慢，我等了半个多小时都没执行完，后来建索引稍微快一点，在网上找到一种更快捷的更新语句（因为我数据库基础不好很多语句不熟悉，呵呵！大神来看到了别笑话我就行！）如下：

merge into case_person_saxx t
using (select max(id) as id, app_id from case_xzcf group by app_id) s
on (t.app_id = s.app_id)
when matched then
  update set t.case_id = s.id;


  merge into invole_case_unit_saxx t
using (select max(id) as id, app_id from case_xzcf group by app_id) s
on (t.app_id = s.app_id)
when matched then
  update set t.case_id = s.id;

记言：如果你的才华不能匹配你的野心，请努力！

参考文献：

一、

执行报告分析：

explain plan FOR select 1 from dual;

select plan_table_output from TABLE(DBMS_XPLAN.DISPLAY('PLAN_TABLE'));

二、快速游标法

快速游标法

方式二

三、merge into 和单个update 执行效率分析

merge into 和单个update 执行效率分析

Oracle数据库大表更新优化记录

Oracle数据库大表更新优化记录

优化

快速游标方式：

记言：如果你的才华不能匹配你的野心，请努力！

你可能感兴趣的:(Oracle数据库大表更新优化记录)