Oracle海量数据优化-01分区的渊源

  • 思维导图
  • 概述
  • DELETE与系统资源
    • delete 操作消耗的资源
    • truncate 和 drop 操作消耗的资源
    • DML操作会随着数据量的增加消耗的资源也会相应的增加
  • DELETE与释放空间


思维导图

Oracle海量数据优化-01分区的渊源_第1张图片


概述

delete from t where create_time < to_date('2017-01-01','yyyy-mm-dd');

当我们看到这条语句时,会想到什么呢? 一条再简单不过的按照条件删除数据库的操作。 如果大量存在,会不会引起系统性能问题呢?

当这个表的数据足够大时,按照这样的方式来清除数据,代价无疑是非常高昂的。

我们提出使用分区的方式来解决这个问题。比如 保留历史数据的骨折是,删除最早一个月的数据,按照这个需求,将需要删除数据的表,改为分区表。 按照每个月一个分区的方式创建分区表。

这样做的结果是,以前的delete操作就变成了下面的操作:

alter table t drop partition p_name

或者 

alter table t truncate partiton p_name ;

这个操作对已一个几GB数据的分区表来讲,执行时间不超过10S。

比起使用分区来操作数据,DELETE方式操作数据的问题是显而易见的,至少存在如下两点:

  • 消耗大量的系统资源
  • 无法释放空间

DELETE与系统资源

delete 操作消耗的资源

在DELETE操作中,SQL语句首先要扫描表或者索引找到符合条件的记录,然后把他们删除,这个过程中将消耗大量的CPU资源和产生大量的I/O,同时还会产生大量的undo数据。

下面是一个简单的栗子,我们可以看出delete相对于drop或者truncate对资源的消耗情况:

SQL> drop table t purge ;  --->  不放入回收站,直接删除

Table dropped

SQL> create table t as select a.OBJECT_ID,a.OBJECT_NAME from dba_objects a;

Table created

SQL> exec dbms_stats.gather_table_stats(user,'t');---->表分析

PL/SQL procedure successfully completed

SQL> 

注释:执行drop table xx 语句
drop后的表被放在回收站(user_recyclebin)里,而不是直接删除掉。这样,回收站里的表信息就可以被恢复,或彻底清除。
通过查询回收站user_recyclebin获取被删除的表信息,然后使用语句
flashback table to before drop [rename to
];
将回收站里的表恢复为原名称或指定新名称,表中数据不会丢失。
若要彻底删除表,则使用语句:drop table purge;
清除回收站里的信息
清除指定表:purge table ;
清除当前用户的回收站:purge recyclebin;
SQL> purge recyclebin;
Done
清除所有用户的回收站:purge dba_recyclebin;
不放入回收站,直接删除则是:drop table xx purge;


oracle@entel1:[/oracle]$sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Tue Jan 17 22:17:53 2017

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> conn xgj/smart
Connected.
SQL> 
SQL> set autotrace trace exp stat; -->显示执行计划和统计结果
SQL>  delete from t where t.object_id < 10000;

9811 rows deleted.


Execution Plan
----------------------------------------------------------
Plan hash value: 3335594643

---------------------------------------------------------------------------
| Id  | Operation      | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | DELETE STATEMENT   |      |  4420 | 22100 |    43   (0)| 00:00:01 |
|   1 |  DELETE        | T    |   |   |        |      |
|*  2 |   TABLE ACCESS FULL| T    |  4420 | 22100 |    43   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("T"."OBJECT_ID"<10000)


Statistics
----------------------------------------------------------
     18  recursive calls
      10465  db block gets
    162  consistent gets
      0  physical reads
    2640572  redo size
    831  bytes sent via SQL*Net to client
    797  bytes received via SQL*Net from client
      3  SQL*Net roundtrips to/from client
      2  sorts (memory)
      0  sorts (disk)
       9811  rows processed

SQL> 

SET AUTOT 用法

用法: SET AUTOT[RACE] {OFF | ON | TRACE[ONLY]} [EXP[LAIN]]
[STAT[ISTICS]]

SET AUTOT[RACE] OFF 停止AutoTrace SET AUTOT[RACE] ON
开启AutoTrace,显示AUTOTRACE信息和SQL执行结果 SET AUTOT[RACE] TRACEONLY
开启AutoTrace,仅显示AUTOTRACE信息 SET AUTOT[RACE] ON EXPLAIN
开启AutoTrace,仅显示AUTOTRACE的EXPLAIN信息 SET AUTOT[RACE] ON
STATISTICS开启AutoTrace,仅显示AUTOTRACE的STATISTICS信息

在这种情况下,DELETE操作产生了 10465 db block gets + 162 consistent gets 数据块读,同时产生了 2640572 redo size 的 redo日志 (2640572 /1024/1024 = 2.5M 大小)

即使我们给这个表创建了索引,依然无法避免这种资源消耗

SQL> rollback;

Rollback complete.

SQL> create index ind_t on t(object_id);

Index created.

SQL> exec dbms_stats.gather_index_stats(user,'ind_t');

PL/SQL procedure successfully completed.

SQL> delete from t where t.object_id < 10000;

9811 rows deleted.


Execution Plan
----------------------------------------------------------
Plan hash value: 3974964266

---------------------------------------------------------------------------
| Id  | Operation     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | DELETE STATEMENT  |   |  4420 | 22100 |    11   (0)| 00:00:01 |
|   1 |  DELETE       | T     |   |   |        |      |
|*  2 |   INDEX RANGE SCAN| IND_T |  4420 | 22100 |    11   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T"."OBJECT_ID"<10000)


Statistics
----------------------------------------------------------
     15  recursive calls
      10596  db block gets
     36  consistent gets
      0  physical reads
    2801592  redo size
    837  bytes sent via SQL*Net to client
    796  bytes received via SQL*Net from client
      3  SQL*Net roundtrips to/from client
      2  sorts (memory)
      0  sorts (disk)
       9811  rows processed

SQL> rollback;

Rollback complete.

可以看到即使加了索引,资源的消耗情况差不多,都是 10596 db block gets + 36 consistent gets 。


truncate 和 drop 操作消耗的资源

相反的 truncate 和 drop 操作消耗的资源就小很多。

案例

Connected to Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 
Connected as xxx@xgj

SQL> --->创建分区表
SQL> create table t_part (object_id int ,object_name varchar2(2000) )
  2         partition by range(object_id)
  3              (
  4               partition p1 values less than (6000),
  5               partition p2 values less than (12000),
  6               partition p3  values less than (18000),
  7               partition p4  values less than (24000),
  8               partition p5 values less than (30000),
  9               partition pm values less than (maxvalue)
 10              );

Table created

SQL> insert into t_part select object_id ,object_name from dba_objects;

35238 rows inserted

SQL> commit;

Commit complete

SQL> select count(1)  from t_part  partition(p1);

  COUNT(1)
----------
      5995

SQL> 

下面我们使用sql_trace来跟踪下 drop和truncate 之类的DDL操作的相关信息。

SQL> alter session set tracefile_identifier='xgj_20170118';

Session altered

SQL> alter session set sql_trace=true;

Session altered

SQL> alter table t_part truncate partition p1;

Table truncated

SQL> truncate table t_part;

Table truncated

SQL>  insert into t_part select object_id ,object_name from dba_objects;

35238 rows inserted

SQL> commit;

Commit complete

SQL> alter table t_part drop partition p1;

Table altered

SQL> alter session set sql_trace=false;

Session altered

SQL> 

开启SQL跟踪后,会生成一个trace文件,通过初始化参数user_dump_dest配置其所在目录,该参数的值可以通过下面方法获取到:

SQL> select name, value from v$parameter where name = 'user_dump_dest' ;

NAME                                   VALUE
--------------------------- -----------------------------
user_dump_dest               /oracle/diag/rdbms/cc/cc/trace

这里写图片描述

以上是产生的trace原文件,我们通过tkprof来查看一下:

oracle@entel1:[/oracle/diag/rdbms/cc/cc/trace]$tkprof cc_ora_14104_xgj_20170118.trc   cc_ora_14104_xgj_20170118.log explain=xxx/xgj sys=no ;

TKPROF: Release 11.2.0.4.0 - Development on Wed Jan 18 16:35:18 2017

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

oracle@entel1:[/oracle/diag/rdbms/cc/cc/trace]$

以下是截取cc_ora_14104_xgj_20170118.log中的关键信息:

truncate partition p1

********************************************************************************

SQL ID: 4gj9z8z8707sx Plan Hash: 491940592

alter table t_part truncate partition p1


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          5          0           0
Execute      1      0.01       0.38         20          1         69           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.01       0.38         20          6         69           0

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 38  (xxx)
********************************************************************************

truncate table

********************************************************************************

SQL ID: cv84y6hnt1dy2 Plan Hash: 491940592

truncate table t_part


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.02       1.54         80          4        266           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.02       1.54         80          4        266           0

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 38  (xxx)
********************************************************************************

drop partition p1

********************************************************************************

SQL ID: 7wwjnfwgcxgza Plan Hash: 0

alter table t_part drop 


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.69          1          1         10           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.00       0.69          1          1         10           0

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 38  (xxx)

从结果可以看到,DDL对分区操作消耗的资源要远小于DML的

操作 数据块的读取次数
delete 1000+
truncate 6
drop 1

可以看出DDL的操作要远小于DML消耗的资源。 DDL的操作主要是对数据字典的修改,这个值基本上是确定的,而DML会随着数据量的增加,消耗的资源也会相应的增加


DML操作会随着数据量的增加,消耗的资源也会相应的增加

oracle@entel1:[/oracle/diag/rdbms/cc/cc/trace]$sqlplus /nolog

SQL*Plus: Release 11.2.0.4.0 Production on Wed Jan 18 18:04:57 2017

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

SQL> conn xxx/xgj
Connected.
SQL> select count(1) from t;

  COUNT(1)
----------
     35249

SQL> set autotrace trace exp stat ; 
SQL> delete from t where object_id < 10000;

9811 rows deleted.


Execution Plan
----------------------------------------------------------
Plan hash value: 3974964266

---------------------------------------------------------------------------
| Id  | Operation     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | DELETE STATEMENT  |   |  4420 | 22100 |    11   (0)| 00:00:01 |
|   1 |  DELETE       | T     |   |   |        |      |
|*  2 |   INDEX RANGE SCAN| IND_T |  4420 | 22100 |    11   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"<10000)


Statistics
----------------------------------------------------------
     45  recursive calls
      10940  db block gets
     52  consistent gets
      0  physical reads
    2806724  redo size
    831  bytes sent via SQL*Net to client
    794  bytes received via SQL*Net from client
      3  SQL*Net roundtrips to/from client
      3  sorts (memory)
      0  sorts (disk)
       9811  rows processed

SQL> rollback;

Rollback complete.

SQL> delete from t where object_id < 20000;

13938 rows deleted.


Execution Plan
----------------------------------------------------------
Plan hash value: 3974964266

---------------------------------------------------------------------------
| Id  | Operation     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | DELETE STATEMENT  |   |  8842 | 44210 |    21   (0)| 00:00:01 |
|   1 |  DELETE       | T     |   |   |        |      |
|*  2 |   INDEX RANGE SCAN| IND_T |  8842 | 44210 |    21   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"<20000)


Statistics
----------------------------------------------------------
     17  recursive calls
      15016  db block gets
     47  consistent gets
      0  physical reads
    3988132  redo size
    836  bytes sent via SQL*Net to client
    794  bytes received via SQL*Net from client
      3  SQL*Net roundtrips to/from client
      2  sorts (memory)
      0  sorts (disk)
      13938  rows processed

SQL> rollback;

Rollback complete.

SQL> delete from t where object_id <30000;

14403 rows deleted.


Execution Plan
----------------------------------------------------------
Plan hash value: 3974964266

---------------------------------------------------------------------------
| Id  | Operation     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | DELETE STATEMENT  |   | 13263 | 66315 |    31   (0)| 00:00:01 |
|   1 |  DELETE       | T     |   |   |        |      |
|*  2 |   INDEX RANGE SCAN| IND_T | 13263 | 66315 |    31   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"<30000)


Statistics
----------------------------------------------------------
     17  recursive calls
      15508  db block gets
     48  consistent gets
      0  physical reads
    4121320  redo size
    837  bytes sent via SQL*Net to client
    793  bytes received via SQL*Net from client
      3  SQL*Net roundtrips to/from client
      2  sorts (memory)
      0  sorts (disk)
      14403  rows processed

SQL> 

从结果可以看到,消耗的资源和删除的数据量是同步增长的

删除的数据量 数据块的读取次数
object_id<10000 10940 + 52
object_id<20000 15016 +47
object_id<30000 15508+48

truncate的消耗

t_part 表
Oracle海量数据优化-01分区的渊源_第2张图片

t_part_2表

SQL> create table t_part_2 (object_id int ,object_name varchar2(2000) )
  2         partition by range(object_id)
  3              (
  4               partition p1 values less than (20000),
  5               partition pm values less than (maxvalue)
  6              );

Table created

SQL> insert into t_part_2 select object_id ,object_name from dba_objects;

35239 rows inserted

SQL> commit;

Commit complete

SQL> alter session set tracefile_identifier='xgj_truncate';

Session altered

SQL> alter session set sql_trace=true;

Session altered

SQL> alter table t_part_2 truncate partition p1;

Table truncated

SQL> alter session set sql_trace=false;

Session altered

SQL> 

Oracle海量数据优化-01分区的渊源_第3张图片

我们可以看到,truncate操作时,数据块读的次数10次左右,所以DDL操作并没有随着操作的数据增加而明显的增加,它只取决于需要操作的字典表的数据量。


DELETE与释放空间

DELETE操作并不能释放出空间,也就是说,删除了哪个表的数据,疼出来的空间还是只能是那个表使用,并不能让给其他的对象使用,因为delete操作并不能使高水位线下降。 如果需要通过清除一个表中的部分数据来腾出空间给其他的对象,这个操作时不行的。

而truncate或者drop操作,就可以释放出空间给其他的对象使用。

下面举例说明:

我们还是用我们上面创建的t_part分区表为例

SQL> select * from user_extents  a where a.segment_name='T_PART'; 

SEGMENT_NAME                                                  PARTITION_NAME SEGMENT_TYPE TABLESPACE_NAME  EXTENT_ID      BYTES     BLOCKS
-------------------------------------------------------------------------------- ------------------------------ ------------------ ------------------------------ ---------- ---------- ----------
T_PART                                                                   P1                           TABLE PARTITION TAB_ZMC                       0    8388608       1024
T_PART                                                                   P2                           TABLE PARTITION TAB_ZMC                       0    8388608       1024
T_PART                                                                   P3                           TABLE PARTITION TAB_ZMC                       0    8388608       1024
T_PART                                                                   PM                         TABLE PARTITION TAB_ZMC                       0    8388608       1024
T_PART                                                                   P5                           TABLE PARTITION TAB_ZMC                       0    8388608       1024

SQL> select count(1)  from user_extents  a where a.segment_name='T_PART'; 

  COUNT(1)
----------
         5

SQL> 

最初的时候,T_PART表的数据占据了5个EXTENTS(区)

我们通过delete删除部分数据

SQL> select count(1)  from t_part;

  COUNT(1)
----------
     35245

SQL> delete from t_part a where a.object_id<30000;

14403 rows deleted

SQL> commit;

Commit complete

SQL> select count(1)  from user_extents a where a.segment_name='T_PART';

  COUNT(1)
----------
         5

SQL> 

我们删除了14403 条数据后,发现T_PART仍然占据着5个EXTENDS,也就是说虽然表中的数据删除了,但是空间只能是T_PART自己使用,并没有将控件返回给数据库。

我们使用truncate 来试下 ,用T表的数据吧,比较多

SQL> select count(1)  from  user_extents a where a.segment_name='T';

  COUNT(1)
----------
        17

SQL> select count(1)  from t;

  COUNT(1)
----------
     35245

SQL> truncate table t;

Table truncated

SQL>  select count(1)  from  user_extents a where a.segment_name='T';

  COUNT(1)
----------
         1

SQL> select count(1)  from 
  2  t;

  COUNT(1)
----------
         0

SQL> 

可以看出 通过truncate 之后 T表占用的数据区从17个降为1个,大部分空间都返回给了数据库,这部分空间是可以分配给其他数据使用的。

还有一种方式,就是如果这个表上有许多碎片,可以通过整理表的空间来达到释放空间的目的,命令如下

alter table  t  move ;

我们用T表来做下试验,还原T表的数据

SQL> insert into t select object_id ,object_name from dba_objects;

35245 rows inserted
SQL> commit;

Commit complete

SQL> select count(1)  from user_extents a where a.segment_name='T';

  COUNT(1)
----------
        17

SQL> delete from t ; --->通过delete方式删除数据

35245 rows deleted

SQL> commit;

Commit complete

SQL> select count(1)  from user_extents a where a.segment_name='T';

  COUNT(1)
----------
        17

SQL> alter table t move;--->重新整理数据

Table altered

SQL>  select count(1)  from user_extents a where a.segment_name='T';

  COUNT(1)
----------
         1

SQL> 

我们看到,使用alter table move 操作,表的空间基本上已经全部释放出来了,只剩下一个EXTENT(因为当前已经是一个空表了)。

通过alter table move 方式让表中的数据重新存储,这样可以释放出来一些空间,但是没这个操作基本上等同于将表中的数据重新分布一次,如果表中的数据比较庞大,这将是一个非常耗时的操作。


实际上delete的操作只适用删除非常少量的数据,并且需要在索引存在的情况下适用。

案例说明:

SQL> select count(1)  from t;

  COUNT(1)
----------
         0
SQL> insert into t select object_id ,object_name from dba_objects;

35245 rows inserted
SQL> commit;

Commit complete
SQL> select  a.index_name ,a.index_type ,a.table_name  from user_indexes a where a.table_name = 'T'; --->查看表上的索引

INDEX_NAME       INDEX_TYPE      TABLE_NAME
------------------------------ --------------------------- ------------------------------
IND_T                    NORMAL           T

SQL> drop index ind_t ;-->先删掉索引

Index dropped

SQL>  select  a.index_name ,a.index_type ,a.table_name  from user_indexes a where a.table_name = 'T';

INDEX_NAME       INDEX_TYPE      TABLE_NAME
------------------------------ --------------------------- ------------------------------

SQL> 

看看执行计划吧

oracle@entel1:[/oracle]$sqlplus /nolog

SQL*Plus: Release 11.2.0.4.0 Production on Wed Jan 18 20:40:16 2017

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

SQL> conn xxx/xgj
Connected.
SQL> set autotrace trace exp stat;
SQL> delete from t where object_id=888;

1 row deleted.


Execution Plan
----------------------------------------------------------
Plan hash value: 3335594643

---------------------------------------------------------------------------
| Id  | Operation      | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | DELETE STATEMENT   |      | 1 | 5 |    43   (0)| 00:00:01 |
|   1 |  DELETE        | T    |   |   |        |      |
|*  2 |   TABLE ACCESS FULL| T    | 1 | 5 |    43   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OBJECT_ID"=888)


Statistics
----------------------------------------------------------
      4  recursive calls
      4  db block gets
    209  consistent gets
      0  physical reads
      0  redo size
    830  bytes sent via SQL*Net to client
    790  bytes received via SQL*Net from client
      3  SQL*Net roundtrips to/from client
      2  sorts (memory)
      0  sorts (disk)
      1  rows processed

SQL> 

在没有索引的情况下,即使删除很少的数据,Oracle也会将全表扫描一遍(分区表除外)。

此时,Oracle读取了209 个数据块,如果我们肩上索引,效果会好很多。

SQL> create index ind_t on t(object_id);

Index created.

SQL> delete from t where object_id=888;

1 row deleted.


Execution Plan
----------------------------------------------------------
Plan hash value: 3974964266

---------------------------------------------------------------------------
| Id  | Operation     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | DELETE STATEMENT  |   | 1 | 5 | 1   (0)| 00:00:01 |
|   1 |  DELETE       | T     |   |   |        |      |
|*  2 |   INDEX RANGE SCAN| IND_T | 1 | 5 | 1   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"=888)


Statistics
----------------------------------------------------------
      1  recursive calls
      5  db block gets
      2  consistent gets
      4  physical reads
      0  redo size
    834  bytes sent via SQL*Net to client
    790  bytes received via SQL*Net from client
      3  SQL*Net roundtrips to/from client
      2  sorts (memory)
      0  sorts (disk)
      1  rows processed

SQL> 

我们可以看到仅仅读取了2个数据块,2个一致性读,非常高效。

你可能感兴趣的:(【Oracle通用优化】,Oralce手札)