DB2 9: Row compression and large RIDs

Large RIDs

In DB2 Version 8, table and tablespace sizes are limited, as shown in Table 1. The table and tablespace size limits depend on the page size. The number of bytes used as a pointer is 3 bytes. Therefore, only 2 to the power of 24 units are available. This results in 16,777,216 pages. Because 1 byte is being used for the slot number in a single page, 255 multiplied by 16,777,216 rows can be addressed. Depending on the page size, the following limits exist:


Table 1. Tablespace limits on page size in DB2 V8
# of pages Page size Limit of table / tablespace
16,777,216 4 K 64 GB
16,777,216 8 K 128 GB
16,777,216 16 K 256 GB
16,777,216 32 K 512 GB

In DB2 9, the limits have been expanded. The number of bytes for the page address has been increased to 4 bytes, and the slot number now uses 2 bytes. Table 2 shows the table and tablespace limits in DB2 9.


Table 2. Tablespace limits on page size in DB2 9
# of pages Page size Limit of table / tablespace
536,870,912 4 K 2 TB
536,870,912 8 K 4 TB
536,870,912 16 K 8 TB
536,870,912 32 K 16 TB

Large RIDs are only supported in large tablespaces in DB2 9. This differs from DB2 Version 8, where a large tablespace was designed for LOBs and LONG data types only. Large tablespaces are the default mode in DB2 9. But when you migrate from DB2 8 to DB2 9, keep in mind that the regular tablespaces are not converted to large tablespaces. Consider in your migration plan that you may want to change the regular tablespaces into large tablespaces.

Row compression

The row compression feature in DB2 can be used to save storage space at the table level. The benefits are saving container space, smaller backup image size (and therefore reduced backup duration), and less page activity in the bufferpools. You can activate row compression for a single table. A dictionary is created that contains reusable patterns. For the patterns, a pointer is stored. It is possible to estimate the ratio of compression for each table by using the DB2 INSPECT command. You must reorganize a table to enforce the compression.

Compression and large RIDs

Row compression reduces the average size of a row. When using a regular tablespace (suppose you are migrating from DB2 Version 8 to DB2 9, where the regular tablespace will be kept), the limit is still fixed at 255 rows per page. Although you can reduce the average row length by using row compression, you are limited by the number of rows per page and you will waste storage space on the tablespace container.

Example on table ORDERS in database TPCH

The following example uses the table ORDER of a TPCH database (of 1 GB total size) to show the impact of the number of RIDs after using the row compression. A table is used with a small average row size. Therefore, the table has been modified by changing the initial maximum row length. The column O_COMMENT has been cut to a length of 20 characters (instead of 79). A page size of 16 K is used for the tablespace to store the table ORDERS. Using this page size, the effect of large RIDs can be explained very well.

The main steps in the following example are:

  1. Create a table ORDERS in a regular tablespace in DB2 Version 8, check the number of pages, average row size, and number of rows per page.
  2. Migrate the instance and the database to DB2 9.
  3. Create a new large tablespace in DB2 9.
  4. Create the table ORDERS2 like ORDER in the new large tablespace.
  5. Compress both tables, ORDERS and ORDERS2, and run reorg on them.
  6. Now you can compare the number of pages, average row size, and the number of rows per page of both tables. You can see the difference between a regular and a large tablespace.

Modify table design

Example 1 shows the structure of table ORDERS. The column O_COMMENT has been modified.


Example 1. DDL of table ORDERS
            db2inst1@mstar:~/test/compr_lrid> db2 describe table orders
            Column                         Type      Type
            name                           schema    name               Length   Scale Nulls
            ------------------------------ --------- ------------------ -------- ----- ------
            O_ORDERKEY                     SYSIBM    INTEGER                   4     0 No
            O_CUSTKEY                      SYSIBM    INTEGER                   4     0 No
            O_ORDERSTATUS                  SYSIBM    CHARACTER                 1     0 No
            O_TOTALPRICE                   SYSIBM    DECIMAL                  15     2 No
            O_ORDERDATE                    SYSIBM    DATE                      4     0 No
            O_ORDERPRIORITY                SYSIBM    CHARACTER                15     0 No
            O_CLERK                        SYSIBM    CHARACTER                15     0 No
            O_SHIPPRIORITY                 SYSIBM    INTEGER                   4     0 No
            O_COMMENT                      SYSIBM    VARCHAR                  20     0 No
            9 record(s) selected.
            db2inst1@mstar:~/test/compr_lrid>

The average row size (that is column AVG_ROW_SIZE in the query) decreases from 107 bytes of the original table to 79 bytes, see Example 2. The column ROWS_PER_PAGE has been calculated by dividing the number of rows (CARD) by the number of used pages (NPAGES).


Example 2. Numbers of table ORDERS in regular tablespace
            db2inst1@mstar:~/test/compr_lrid> db2 -tvf ars.sql
            SELECT SUBSTR(a.tabname,1,10) AS table, b.npages ,
            CASE WHEN (b.NPAGES  > 0) THEN (b.CARD / b.NPAGES) ELSE -1 END AS ROWS_PER_PAGE,
            SUM(AVGCOLLEN) AVG_ROW_SIZE
            FROM SYSCAT.COLUMNS a, SYSCAT.TABLES b, SYSCAT.TABLESPACES c
            WHERE a.tabschema = b.tabschema AND a.tabname = b.tabname AND b.tbspaceid = c.tbspaceid
            AND a.tabname = 'ORDERS'
            GROUP BY a.tabschema, a.tabname, pagesize, card, npages
            TABLE      NPAGES               ROWS_PER_PAGE        AVG_ROW_SIZE
            ---------- -------------------- -------------------- ------------
            ORDERS     8198                  182                           79
            1 record(s) selected.
            db2inst1@mstar:~/test/compr_lrid>

In the regular tablespace, in DB2 Version 8, in non-compressed mode, 182 rows per-page are stored. In total these are 8198 pages. This is near the limit of 255 pages per row, so the page size of 16 K was a good choice. After migrating the instance and the database to DB2 9, the statistics must be refreshed. A new large tablespace ltb16K with a page size of 16 K will be created and a new table ORDERS2 will be created like the table ORDER in the new large tablespace. No indexes or constraints are used in the TPCH database. Therefore, the creation of the table is very simple. The data will be loaded by cursor, see Example 3.

Create the large tablespace


Example 3. Creation of table ORDERS2
            db2inst1@mstar:~/test/compr_lrid> ./cr_tbspace_ltb16k.sh
            Database Connection Information
            Database server        = DB2/LINUX 9.1.0
            SQL authorization ID   = DB2INST1
            Local database alias   = TPCH
            DROP TABLESPACE ltb16K
            DB21034E  The command was processed as an SQL statement because it was not a
            valid Command Line Processor command.  During SQL processing it returned:
            SQL0204N  "LTB16K" is an undefined name.  SQLSTATE=42704
            CREATE LARGE TABLESPACE ltb16K PAGESIZE 16 K MANAGED BY DATABASE
            USING ( FILE '/db2/db2inst1/TPCH/ltb16K.001' 20000 ) BUFFERPOOL bp16k
            DB20000I  The SQL command completed successfully.
            CREATE TABLE orders2 LIKE orders IN ltb16K
            DB20000I  The SQL command completed successfully.
            DB20000I  The SQL command completed successfully.
            DECLARE c1 CURSOR FOR SELECT * FROM orders
            DB20000I  The SQL command completed successfully.
            LOAD FROM c1 OF CURSOR INSERT INTO orders2
            SQL3501W  The table space(s) in which the table resides will not be placed in
            backup pending state since forward recovery is disabled for the database.
            SQL1193I  The utility is beginning to load data from the SQL statement "
            SELECT * FROM orders".
            SQL3500W  The utility is beginning the "LOAD" phase at time "2007-02-09
            23.03.50.673543".
            SQL3519W  Begin Load Consistency Point. Input record count = "0".
            SQL3520W  Load Consistency Point was successful.
            SQL3110N  The utility has completed processing.  "1500000" rows were read from
            the input file.
            SQL3519W  Begin Load Consistency Point. Input record count = "1500000".
            SQL3520W  Load Consistency Point was successful.
            SQL3515W  The utility has finished the "LOAD" phase at time "2007-02-09
            23.04.17.263410".
            Number of rows read         = 1500000
            Number of rows skipped      = 0
            Number of rows loaded       = 1500000
            Number of rows rejected     = 0
            Number of rows deleted      = 0
            Number of rows committed    = 1500000
            db2inst1@mstar:~/test/compr_lrid>

In the large tablespace, in DB2 9, in non-compressed mode, also 182 rows per-page are stored. The average row size (79) and the number of pages (8198) are the same as in the regular tablespace, see Example 4.


Example 4. Numbers of ORDERS2 in large tablespace
            db2inst1@mstar:~/test/compr_lrid> db2 -tvf ars.orders2.sql
            SELECT SUBSTR(a.tabname,1,10) AS table, PAGESIZE, b.CARD, b.npages ,
            CASE WHEN (b.NPAGES  > 0) THEN (b.CARD / b.NPAGES) ELSE -1 END AS ROWS_PER_PAGE,
            SUM(AVGCOLLEN) AVG_ROW_SIZE FROM SYSCAT.COLUMNS a, SYSCAT.TABLES b, SYSCAT.TABLESPACES c
            WHERE a.tabschema = b.tabschema AND a.tabname = b.tabname AND b.tbspaceid = c.tbspaceid
            AND a.tabname = 'ORDERS2'
            GROUP BY a.tabschema, a.tabname, pagesize, card, npages
            TABLE      NPAGES               ROWS_PER_PAGE        AVG_ROW_SIZE
            ---------- -------------------- -------------------- ------------
            ORDERS2                    8198                  182           79
            1 record(s) selected.
            db2inst1@mstar:~/test/compr_lrid>

The catalog in DB2 9 contains a new column called AVGROWSIZE, which shows the average length (in bytes) of both compressed and uncompressed rows in a table. It is being retrieved for the tables ORDERS and ORDERS2 in Example 5. Both tables have 89 bytes as the average row size. This value differs a little from the calculated value of 79 bytes - calculated by the AVGCOLLEN of all columns.


Example 5. Retrieving the AVGROWSIZE in uncompressed mode
            db2inst1@mstar:~/test/compr_lrid> db2 "select AVGROWSIZE
            from syscat.tables where tabname = 'ORDERS2' "
            AVGROWSIZE
            ----------
            89
            1 record(s) selected.
            db2inst1@mstar:~/test/compr_lrid>
            db2inst1@mstar:~/test/compr_lrid> db2 "select AVGROWSIZE
            from syscat.tables where tabname = 'ORDERS' "
            AVGROWSIZE
            ----------
            89
            1 record(s) selected.
            db2inst1@mstar:~/test/compr_lrid>

Compress the tables

Now both tables are being compressed, see Example 6.

Note: A reorganization of the tables is necessary, the ALTER TABLE statement just updates the catalog content.


Example 6. Compressing the tables
            db2inst1@mstar:~/test/compr_lrid> db2 "alter table db2inst1.orders compress yes"
            DB20000I  The SQL command completed successfully.
            db2inst1@mstar:~/test/compr_lrid>
            db2inst1@mstar:~/test/compr_lrid> db2 "alter table db2inst1.orders2 compress yes"
            DB20000I  The SQL command completed successfully.
            db2inst1@mstar:~/test/compr_lrid>
            db2inst1@mstar:~/test/compr_lrid> time db2 -v "reorg table db2inst1.ORDERS
            resetdictionary"
            reorg table  db2inst1.ORDERS resetdictionary
            DB20000I  The REORG command completed successfully.
            real  0m58.335s
            user  0m0.018s
            sys 0m0.029s
            db2inst1@mstar:~/test/compr_lrid>
            db2inst1@mstar:~/test/compr_lrid> time db2 -v "reorg table  db2inst1.ORDERS2
            resetdictionary"
            reorg table  db2inst1.ORDERS2 resetdictionary
            DB20000I  The REORG command completed successfully.
            real  0m59.505s
            user  0m0.020s
            sys 0m0.028s
            db2inst1@mstar:~/test/compr_lrid> db2 "reorgchk update statistics
            on table db2inst1.orders " > reorgchk.orders.compr.out
            db2inst1@mstar:~/test/compr_lrid> db2 "reorgchk update statistics
            on table db2inst1.orders2 " > reorgchk.orders2.compr.out
            db2inst1@mstar:~/test/compr_lrid>

In Example 7, you can see that the number of pages in the regular tablespace is now 253, it is yet limited by 255 rows per page. However, the large tablespace allows 427 rows per-page for the table ORDERS2. Therefore, the number of pages used for table ORDERS2 is smaller than in table ORDERS. This shows the effect when compressing a table in a regular tablespace. You will still hit the limit of 255 pages. To avoid this, you have to use a large tablespace, then you are able to store more rows per page after the compression.


Example 7. Numbers after compressing
            db2inst1@mstar:~/test/compr_lrid> db2 -tvf ars.sql
            SELECT SUBSTR(a.tabname,1,10) AS table, PAGESIZE, b.CARD, b.npages ,
            CASE WHEN (b.NPAGES  > 0) THEN (b.CARD / b.NPAGES) ELSE -1 END AS ROWS_PER_PAGE,
            SUM(AVGCOLLEN) AVG_ROW_SIZE FROM SYSCAT.COLUMNS a, SYSCAT.TABLES b, SYSCAT.TABLESPACES c
            WHERE a.tabschema = b.tabschema AND a.tabname = b.tabname AND b.tbspaceid = c.tbspaceid
            AND a.tabname = 'ORDERS' GROUP BY a.tabschema, a.tabname, pagesize, card, npages
            TABLE      NPAGES               ROWS_PER_PAGE        AVG_ROW_SIZE
            ---------- -------------------- -------------------- ------------
            ORDERS                     5907                  253           79
            1 record(s) selected.
            db2inst1@mstar:~/test/compr_lrid> db2 -tvf ars.orders2.sql
            SELECT SUBSTR(a.tabname,1,10) AS table, PAGESIZE, b.CARD, b.npages ,
            CASE WHEN (b.NPAGES  > 0) THEN (b.CARD / b.NPAGES) ELSE -1 END AS ROWS_PER_PAGE,
            SUM(AVGCOLLEN) AVG_ROW_SIZE FROM SYSCAT.COLUMNS a, SYSCAT.TABLES b, SYSCAT.TABLESPACES c
            WHERE a.tabschema = b.tabschema AND a.tabname = b.tabname AND b.tbspaceid = c.tbspaceid
            AND a.tabname = 'ORDERS2' GROUP BY a.tabschema, a.tabname, pagesize, card, npages
            TABLE      NPAGES               ROWS_PER_PAGE        AVG_ROW_SIZE
            ---------- -------------------- -------------------- ------------
            ORDERS2                    3512                  427           79
            1 record(s) selected.
            db2inst1@mstar:~/test/compr_lrid>

The average row size (AVGROWSIZE) is the same, as shown in Example 8. It is now 38 bytes. Without compression it was 89 bytes (see Example 5).


Example 8. Retrieving the AVGROWSIZE
            db2inst1@mstar:~/test/compr_lrid>
            db2inst1@mstar:~/test/compr_lrid> db2 "select AVGROWSIZE
            from syscat.tables where tabname = 'ORDERS' "
            AVGROWSIZE
            ----------
            38
            1 record(s) selected.
            db2inst1@mstar:~/test/compr_lrid> db2 "select AVGROWSIZE
            from syscat.tables where tabname = 'ORDERS2' "
            AVGROWSIZE
            ----------
            38
            1 record(s) selected.
            db2inst1@mstar:~/test/compr_lrid>

Introduced with the compression feature are some new columns in the catalog. Specifically, the column PCTPAGESSAVED shows the ratio of savings of pages after the compression. In this example, 57 percent of the pages were saved in the large tablespace, because the number was reduced from 8198 to 3512. The regular tablespace needs some more pages. The number of pages could be reduced from 8198 to 5907 pages, see Example 9. Using the large tablespace saves 40.5 percent of the page space (from 5.907 down to 3.512 pages).


Example 9. Saving in the tables
            db2inst1@mstar:~/test/compr_lrid> db2 "select SUBSTR(tabname,1,20) AS table, npages,
            from syscat.tables where tabschema = 'DB2INST1' and tabname LIKE  'ORDERS%' "
            TABLE                NPAGES               PCTPAGESSAVED
            -------------------- -------------------- ------------------------ -------------
            ORDERS                               5907                   27
            ORDERS2                              3512                   57
            2 record(s) selected.
            db2inst1@mstar:~/test/compr_lrid>

Summary

In Table 3, the results are summarized. The best savings are reached by using the large tablespace. Only 3.512 pages are necessary after the compression, the regular tablespace still needs 5.907 pages. The difference between the regular and the large tablespace is 5.907 pages - 3.512 pages, which results in 2.395 pages of wasted space.


Table 3. Comparing the average rows per page page size 16 K
Table Tablespace Mode CARD NPAGES AVGROWSIZE AVG ROWS PER PAGE
ORDERS REGULAR No compression 1.500.000 8.198 89 182
ORDERS REGULAR Compressed 1.500.000 5.907 38 253
ORDERS2 LARGE No compression 1.500.000 8.198 89 182
ORDERS2 LARGE Compressed 1.500.000 3.512 28 427

Table 4 compares the savings. In the table ORDERS, 28 percent of pages have been saved. In the table ORDERS2, 57 percent of the pages have been saved by the compression.


Table 4. Comparing the savings
Table Tablespace PCTPAGESSAVED NPAGES before compression NPAGES after compression SAVINGS
ORDERS REGULAR 27 8.198 5.907 28
ORDERS2 LARGE 570 8.198 3.512 57

When using a page size of 32 K, the compression in the regular tablespace fails with the warning SQL2220W. There is a minimum record length of 127 bytes for a 32 K regular tablespace (see Table 6). The average row size of 89 bytes is too low.


Table 6. Min and max record length
Page size Regular tablespace min record length Regular tablespace max record length Large tablespace min record length Large tablespace max record length
4 K 14 251 12 287
8 K 30 253 12 580
16 K 62 254 12 1165
32 K 127 253 12 2335

Conclusion: Best practices

To fully take advantage of the benefits of row compression in DB2 9, follow these best practices:

  • Refresh the statistics after migrating to DB2 9 (by running RUNSTATS or REORGCHK UPDATE STATISTICS).
  • Use the DB2 INSPECT command to check the savings on the tables to decide which tables are worth compressing.
  • A reorganization of tables with row compression is necessary for the compression to be effective.
  • When migrating from DB2 Version 8 to DB2 9, plan to migrate the regular tablespaces to large tablespaces.
  • When migrating to large tablespaces, consider scheduling both the data movement and the reorganization of the data.

Resources

Learn

  • "DB2 Information Center Home" links to the IBM DB2 Database for Linux, UNIX, and Windows Information Center for DB2 9.

  • Visit the developerWorks resource page for DB2 for Linux, UNIX, and Windows to read articles and tutorials and connect to other resources to expand your DB2 skills.

  • Browse the technology bookstore for books on these and other technical topics.

  • developerWorks Information Management zone: Learn more about DB2. Find technical documentation, how-to articles, education, downloads, product information, and more.

  • Stay current with developerWorks technical events and webcasts.

Get products and technologies

  • Download a free trial version of DB2 Enterprise 9.

  • Build your next development project with IBM trial software, available for download directly from developerWorks.

Discuss

  • Participate in the discussion forum.

  • Participate in developerWorks blogs and get involved in the developerWorks community.

About the author

Author Photo: Peter Schurr

Peter Schurr works as an IT services specialist for IBM Software Group in IBM Deutschland. His area of expertise is administration and application development with DB2 for Linux, UNIX, and Windows. He has eight years of experience with DB2 and is an IBM Certified Advanced DBA and IBM Certified Application Developer. Peter's areas of special expertise are performance tuning, replication, federated databases, and data modeling

你可能感兴趣的:(DB2 9: Row compression and large RIDs)