TimesTen的列压缩功能

本文出处为:http://ggsig.blogspot.jp/2012/02/columnar-compression-in-timesten.html
并参考了MOS Doc ID 1454651.1

Columnar Compression是11.2.2新增的功能,和Oracle一样,所谓的压缩其实就是重复数据删除,是通过类似于look-up table的技术实现的。
语法可参见SQL Reference. 原理如下图:
TimesTen的列压缩功能_第1张图片
在字典表中存放了此压缩列不同的值,原始表中存放的是指针,实际数据在字典表中
先来看一个例子,了解一下原理:

create table emp_comp (
id number not null,
val_1   varchar2(40),
val_2   varchar2(40),
val_3   varchar2(40))
compress ((val_1,val_2) by dictionary maxvalues = 255,
val_3        by dictionary maxvalues = 255) 
optimized for read;

Command> desc emp_comp;

Table ORACLE.EMP_COMP:
  Columns:
    ID                              NUMBER NOT NULL
    VAL_1                           VARCHAR2 (40) INLINE
    VAL_2                           VARCHAR2 (40) INLINE
    VAL_3                           VARCHAR2 (40) INLINE
  COMPRESS ( ( VAL_1, VAL_2 ) BY DICTIONARY MAXVALUES=255, 
             VAL_3 BY DICTIONARY MAXVALUES=255 ) OPTIMIZED FOR READ

1 table found.
(primary key columns are indicated with *)
Command> desc emp_comp;

Table ORACLE.EMP_COMP:
  Columns:
    ID                              NUMBER NOT NULL
    VAL_1                           VARCHAR2 (40) INLINE
    VAL_2                           VARCHAR2 (40) INLINE
    VAL_3                           VARCHAR2 (40) INLINE
  COMPRESS ( ( VAL_1, VAL_2 ) BY DICTIONARY MAXVALUES=255, 
             VAL_3 BY DICTIONARY MAXVALUES=255 ) OPTIMIZED FOR READ

1 table found.
(primary key columns are indicated with *)

在上例中, optimized for read启用了列压缩,compress定义了压缩的列/列组,在本例中,val_1和val_2为压缩组1,val_3为压缩组2
maxvalues定义了指针大小,255个不同值只需要一个字节存地址即可。
optimized for read定义了压缩的级别,目前仅支持一种级别,另外也说明了列压缩适用于读较多的查询分析,而不是DML较多的交易

Command> alltables;
  ORACLE.CD$_1086064_2
  ORACLE.CD$_1086064_4
  ORACLE.EMP_COMP

可以看到,系统为EMP_COMP表建立了两个字典表,因为有两个压缩列组
数据字典表表名的格式为:
“CD$” + the table identifier (SYS.TABLES.TBLID) + compressed column number

Command> select tblname, tblid from sys.tables where tblname = 'EMP_COMP';
< EMP_COMP                       , 1086064 >
1 row found.

知道了table ID为1086064后,可以查询出相关字典表的信息:

Command> select tblname from sys.tables where tblname like 'CD$_%1086064%';
< CD$_1086064_2                   >
< CD$_1086064_4                   >
2 rows found.
Command> desc CD$_1086064_2;

Table ORACLE.CD$_1086064_2:
  Columns:
   *VAL_1                           VARCHAR2 (40) INLINE
   *VAL_2                           VARCHAR2 (40) INLINE
    ##CD_REFCNT TT_INTEGER NOT NULL

1 table found.
(primary key columns are indicated with *)
Command> desc CD$_1086064_4;

Table ORACLE.CD$_1086064_4:
  Columns:
   *VAL_3                           VARCHAR2 (40) INLINE
    ##CD_REFCNT TT_INTEGER NOT NULL

1 table found.
(primary key columns are indicated with *)

以下是数据字典中关于表压缩的信息:

Command> 
VARIABLE TABNAME VARCHAR2(50) := 'EMP_COMP';
select TABLE_NAME, COMPRESSION, COMPRESS_FOR from sys.all_tables
where TABLE_NAME = :TABNAME;
< EMP_COMP, ENABLED, QUERY HIGH >
1 row found.

以下查询表示有3列被压缩

 select tblname, numcompress, valtblids  from sys.tables where TBLNAME=:TABNAME;
 < EMP_COMP                       , 3, 7892100088921000 >
1 row found.

以下是一个未启用压缩和启用压缩的表的空间使用对比
先建立一个没有压缩的表test2,并插入100万条数据

create table test2 ( id number not null, val_1 varchar2(40), val_2 varchar2(40), val_3 varchar2(40) );

begin for i in 1 .. 1000000 loop insert into test2 values (i,'1234567890123456789012345678901234567890', '1234567890123456789012345678901234567890', '1234567890123456789012345678901234567890');
 end loop;
end;
/

然后查看表的大小为171539960 bytes

Command> call ttComputeTabSizes('test2');
Command> tablesize test2;

Sizes of ORACLE.TEST2:

  INLINE_ALLOC_BYTES:   171470416
  NUM_USED_ROWS:        1000000
  NUM_FREE_ROWS:        192
  AVG_ROW_LEN:          171
  OUT_OF_LINE_BYTES:    0
  METADATA_BYTES:       69544
  TOTAL_BYTES:          171539960
  LAST_UPDATED:         2016-04-02 05:53:59.000000

1 table found.

再建一个只压缩一列的表test2_comp,插入100万条数据

create table test2_comp ( id number not null, val_1 varchar2(40), val_2 varchar2(40), val_3 varchar2(40)) compress (val_1 by dictionary maxvalues = 255) optimized for read;

begin for i in 1 .. 1000000 loop insert into test2_comp values (i,'1234567890123456789012345678901234567890', '1234567890123456789012345678901234567890', '1234567890123456789012345678901234567890');
 end loop;
end;

查看压缩表的空间占用

Command> call ttComputeTabSizes('test2_comp');
Command> tablesize test2_comp;

Sizes of ORACLE.TEST2_COMP:

  INLINE_ALLOC_BYTES:   131462736
  NUM_USED_ROWS:        1000000
  NUM_FREE_ROWS:        192
  AVG_ROW_LEN:          131
  OUT_OF_LINE_BYTES:    0
  METADATA_BYTES:       87612
  TOTAL_BYTES:          131550348
  LAST_UPDATED:         2016-04-02 05:59:20.000000

1 table found.

表的大小由171539960 减少到131550348字节,节省了38M
由于10万条数据是相同的,因此节省的空间为(1000000-1)*40=39999600字节

最后再做一个实验,注意maxvalues变为了1000000
先建表并仅插入一条记录

create table test2_comp ( id number not null, val_1 varchar2(40), val_2 varchar2(40), val_3 varchar2(40)) compress (val_1 by dictionary maxvalues = 1000000) optimized for read;
insert into test2_comp values (1,'1234567890123456789012345678901234567890', '1234567890123456789012345678901234567890', '1234567890123456789012345678901234567890');
Command> call ttComputeTabSizes('test2_comp');
Command> tablesize test2_comp;

Sizes of ORACLE.TEST2_COMP:

  INLINE_ALLOC_BYTES:   33648
  NUM_USED_ROWS:        1
  NUM_FREE_ROWS:        255
  AVG_ROW_LEN:          205
  OUT_OF_LINE_BYTES:    0
  METADATA_BYTES:       18832
  TOTAL_BYTES:          52480
  LAST_UPDATED:         2016-04-02 08:43:44.000000

1 table found.
Command> select tblname, tblid from sys.tables where tblname = 'TEST2_COMP';
< TEST2_COMP                     , 1086096 >
1 row found.
Command> select tblname from sys.tables where tblname like 'CD$_%1086096%';
< CD$_1086096_2                   >
1 row found.
Command> select count(*) from CD$_1086096_2;
< 1 >
1 row found.

其中33648约为一个页面的大小,可以存放256行,已占用1行。字典表占用的空间包含在18832字节元数据中

接着再插入256条数据,我们可以看到,数据页面增加了一页,而元数据即字典表的空间不变

begin
 for i in 2 .. 257 loop
 insert into test2_comp
 values (i,'1234567890123456789012345678901234567890',
 '1234567890123456789012345678901234567890',
 '1234567890123456789012345678901234567890');
 end loop;
end;
/
Command> call ttComputeTabSizes('test2_comp');
Command> tablesize test2_comp;

Sizes of ORACLE.TEST2_COMP:

  INLINE_ALLOC_BYTES:   67296
  NUM_USED_ROWS:        257
  NUM_FREE_ROWS:        255
  AVG_ROW_LEN:          168
  OUT_OF_LINE_BYTES:    0
  METADATA_BYTES:       18832
  TOTAL_BYTES:          86128
  LAST_UPDATED:         2016-04-02 08:54:14.000000

1 table found.
Command> select count(*) from CD$_1086096_2;                                    < 1 >
1 row found.

再插入255条数据,恰好把NUM_FREE_ROWS填满,我们可以看到数据页面和元数据空间都没有变化。注意此时插入了不同的值。结果中CD$_1086096_2行数为256,表示有256个不同的值

begin
 for i in 257 .. 511 loop
 insert into test2_comp
 values (i,'1234567890123456789012345678901000000'||i,
 '1234567890123456789012345678901234567890',
 '1234567890123456789012345678901234567890');
 end loop;
end;
/
Command> call ttComputeTabSizes('test2_comp');
Command> tablesize test2_comp;

Sizes of ORACLE.TEST2_COMP:

  INLINE_ALLOC_BYTES:   67296
  NUM_USED_ROWS:        512
  NUM_FREE_ROWS:        0
  AVG_ROW_LEN:          168
  OUT_OF_LINE_BYTES:    0
  METADATA_BYTES:       18832
  TOTAL_BYTES:          86128
  LAST_UPDATED:         2016-04-02 08:58:35.000000

1 table found.
Command> select count(*) from CD$_1086096_2;                                    < 256 >
1 row found.

总之,重复的数据越多,压缩比越高

列压缩也存在一些限制,如
* LOB列不支持压缩
* cache table,复制表也不支持压缩

你可能感兴趣的:(timesten,列压缩)