Db2 threshold简介

环境

  • Ubuntu 22.04
  • Db2 v11.5.0.0

简介

Threshold可以用来识别并控制DB系统中不正常的工作情况。例如,在数据库中,某个查询语句消耗了大量的CPU时间,我们可以通过两种方式来监控该查询:在其运行之前,基于估算成本,或者当其在运行期间消耗了大于允许值的资源。

Threshold类型

  • Connection thresholds
    • CONNECTIONIDLETIME
  • Unit of work thresholds
    • UOWTOTALTIME
  • Activity thresholds
    • ACTIVITYTOTALTIME
    • ACTIVITYTOTALRUNTIME
    • ACTIVITYTOTALRUNTIMEINALLSC
    • CPUTIME
    • CPUTIMEINSC
    • DATATAGINSC
    • ESTIMATEDSQLCOST
    • SORTSHRHEAPUTIL
    • SQLROWSREAD
    • SQLROWSREADINSC
    • SQLROWSRETURNED
    • SQLTEMPSPACE
  • Aggregate thresholds
    • AGGSQLTEMPSPACE
    • CONCURRENTWORKLOADOCCURRENCES
    • CONCURRENTWORKLOADACTIVITIES
    • CONCURRENTDBCOORDACTIVITIES
    • TOTALMEMBERCONNECTIONS
    • TOTALSCMEMBERCONNECTIONS

Action

当违反(violate)了threshold时,可以做如下处理:

  • STOP EXECUTION :停止运行,并返回error code
  • CONTINUE :继续运行,不会返回error code,违规信息会记录在threshold violations event monitor里
  • FORCE APPLICATION :当 UOWTOTALTIME threshold违规时,可以force application
  • REMAP ACTIVITY TO :动态增加或减少资源,继续运行
  • COLLECT ACTIVITY DATA :收集数据。具体信息会被记录在threshold violations event monitor和activity event monitor里

Threshold domain

  • Database
  • Service superclass
  • Service subclass
  • Work action
  • Workload
  • Statement

Threshold enforcement scope

  • database
  • member
  • workload occurrence enforcement
  • ……

Threshold 评估顺序

  • TOTALMEMBERCONNECTIONS
  • CONCURRENTWORKLOADOCCURRENCES
  • TOTALSCMEMBERCONNECTIONS
  • 其它
    • 预测式
    • 交互式

创建threshold

参见 https://www.ibm.com/docs/en/db2/11.5?topic=statements-create-threshold

Db2 threshold简介_第1张图片

  • threshold-name:给threshold起一个名字
  • threshold domain:参见前面内容
  • enforcement scope:参见前面内容
  • enable/disable:默认enable,可以通过 ALTER THRESHOLD 语句来改变状态
  • threshold predicate:参见前面内容
  • actions:参见前面内容

下面我们用一些具体例子,来学习一下threshold的具体用法。

示例

测试场景:当DB中的SQL语句运行时间超过1分钟时,需要引起关注。我们可以通过threshold来监控SQL语句。

测试的SQL语句为: WITH TEMP1 (NUM,TS1,TS2) AS (VALUES (INT(1) ,TIMESTAMP(GENERATE_UNIQUE()) ,TIMESTAMP(GENERATE_UNIQUE())) UNION ALL SELECT NUM + 1 ,TS1 ,TIMESTAMP(GENERATE_UNIQUE()) FROM TEMP1 WHERE TIMESTAMPDIFF(2,CHAR(TS2-TS1)) < 300) SELECT MAX(NUM) AS #LOOPS ,MIN(TS2) AS BGN_TIMESTAMP ,MAX(TS2) AS END_TIMESTAMP FROM TEMP1

其中 300 指定了运行的时间秒数,即运行5分钟。

例1

需求:当SQL语句运行时间超过1分钟时,停止运行。

首先,创建threshold THLONGSQL1 如下:

➜  ~ db2 "CREATE THRESHOLD THLONGSQL1
     FOR DATABASE ACTIVITIES
     ENFORCEMENT DEFAULT
     WHEN ACTIVITYTOTALTIME > 1 MINUTE
     STOP EXECUTION"
DB20000I  The SQL command completed successfully.

创建成功后,可以在 SYSCAT.THRESHOLDS 视图中查询到该threshold:

➜  ~ db2 "select * from syscat.thresholds"

THRESHOLDNAME                                                                                                                    THRESHOLDID ORIGIN THRESHOLDCLASS THRESHOLDPREDICATE THRESHOLDPREDICATEID DOMAIN DOMAINID    ENFORCEMENT QUEUING MAXVALUE             DATATAGLIST                                                                                                                                                                                                                                                      QUEUESIZE   OVERFLOWPERCENT COLLECTACTDATA COLLECTACTPARTITION EXECUTION REMAPSCID VIOLATIONRECORDLOGGED CHECKINTERVAL ENABLED CREATE_TIME                ALTER_TIME                 REMARKS                                                                                                                                                                                                                                                       

SYSDEFAULTCONCURRENT                                                                                                              2147483647 U      A              CONCDBC                              90 SB               4 D           Y                         12 -                                                                                                                                                                                                                                                                         -1              -1 N              C                   S                 0 Y                                 0 N       2022-07-01-10.30.33.060823 2022-07-01-10.30.33.060823 -                                                                                                                                                                                                                                                             
THLONGSQL1                                                                                                                                 1 U      C              TOTALTIME                            30 DB              10 D           N                         60 -                                                                                                                                                                                                                                                                          0               0 N              C                   S                 0 Y                                -1 Y       2023-02-05-10.43.00.131082 2023-02-05-10.43.00.131082 -                                                                                                                                                                                                                                                             

  2 record(s) selected.

接下来,我们运行测试SQL语句:

➜  ~ db2 "WITH TEMP1 (NUM,TS1,TS2) AS (VALUES (INT(1) ,TIMESTAMP(GENERATE_UNIQUE()) ,TIMESTAMP(GENERATE_UNIQUE())) UNION ALL SELECT NUM + 1 ,TS1 ,TIMESTAMP(GENERATE_UNIQUE()) FROM TEMP1 WHERE TIMESTAMPDIFF(2,CHAR(TS2-TS1)) < 300) SELECT MAX(NUM) AS #LOOPS ,MIN(TS2) AS BGN_TIMESTAMP ,MAX(TS2) AS END_TIMESTAMP FROM TEMP1"

#LOOPS      BGN_TIMESTAMP              END_TIMESTAMP             
----------- -------------------------- --------------------------
SQL0347W  The recursive common table expression "DB2INST1.TEMP1" may contain 
an infinite loop.  SQLSTATE=01605

SQL4712N  The activity or request was stopped because the threshold 
"THLONGSQL1" has been exceeded. Reason code: "9".  SQLSTATE=5U026

当SQL运行时间超过1分钟时,运行被终止,并返回了 SQL4712N 错误信息。

例2

需求:当SQL语句运行时间超过1分钟时,停止运行,并收集违规信息。

为了收集违规信息,我们需要创建一个threshold violations event monitor:

➜  ~ db2 "CREATE EVENT MONITOR VIOLATIONS FOR THRESHOLD VIOLATIONS WRITE TO TABLE MANUALSTART"
DB20000I  The SQL command completed successfully.

注:本例没有指定表名,默认的表名为 THRESHOLDVIOLATIONS_VIOLATIONS

创建成功后,可以通过 SYSCAT.EVENTMONITORS 视图查看event monitor:

➜  ~ db2 "select event_mon_state(evmonname), * from syscat.eventmonitors"

1           EVMONNAME                                                                                                                        OWNER                                                                                                                            OWNERTYPE TARGET_TYPE TARGET                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     MAXFILES    MAXFILESIZE BUFFERSIZE  IO_MODE WRITE_MODE AUTOSTART DBPARTITIONNUM MONSCOPE EVMON_ACTIVATES NODENUM DEFINER                                                                                                                          VERSIONNUMBER MEMBER REMARKS                                                                                                                                                                                                                                                       

          1 DB2DETAILDEADLOCK                                                                                                                DB2INST1                                                                                                                         U         F           db2detaildeadlock
          0 VIOLATIONS                                                                                                                       DB2INST1                                                                                                                         U         T                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                -           -           4 B       -          N                      0 T                      0       0 DB2INST1                                                                                                                              11050000      0 -                                                                                                                                                                                                                                                             

  2 record(s) selected.

其中 event_mon_state(evmonname) 返回值为0表示event monitor处于disabled状态,1表示enabled状态。

接下来启动event monitor:

➜  ~ db2 "set event monitor VIOLATIONS state 1"
DB20000I  The SQL command completed successfully.

此时, THRESHOLDVIOLATIONS_VIOLATIONS 表中没有数据:

➜  ~ db2 "select count(*) from THRESHOLDVIOLATIONS_VIOLATIONS"

1          
-----------
          0

  1 record(s) selected.

现在,我们再次运行测试SQL:

➜  ~ db2 "WITH TEMP1 (NUM,TS1,TS2) AS (VALUES (INT(1) ,TIMESTAMP(GENERATE_UNIQUE()) ,TIMESTAMP(GENERATE_UNIQUE())) UNION ALL SELECT NUM + 1 ,TS1 ,TIMESTAMP(GENERATE_UNIQUE()) FROM TEMP1 WHERE TIMESTAMPDIFF(2,CHAR(TS2-TS1)) < 300) SELECT MAX(NUM) AS #LOOPS ,MIN(TS2) AS BGN_TIMESTAMP ,MAX(TS2) AS END_TIMESTAMP FROM TEMP1"

#LOOPS      BGN_TIMESTAMP              END_TIMESTAMP             
----------- -------------------------- --------------------------
SQL0347W  The recursive common table expression "DB2INST1.TEMP1" may contain 
an infinite loop.  SQLSTATE=01605

SQL4712N  The activity or request was stopped because the threshold 
"THLONGSQL1" has been exceeded. Reason code: "9".  SQLSTATE=5U026

同样,运行时间超过1分钟时,就会被终止。

此时,我们再次查询 THRESHOLDVIOLATIONS_VIOLATIONS 表,就会发现违规的具体信息:

➜  ~ db2 "select * from THRESHOLDVIOLATIONS_VIOLATIONS"       

PARTITION_KEY ACTIVATE_TIMESTAMP         ACTIVITY_COLLECTED ACTIVITY_ID          AGENT_ID             APPL_ID                                                          APPLICATION_NAME                                                                                                                                                                                                                                                CLIENT_ACCTNG                                                                                                                                                                                                                                                   CLIENT_APPLNAME                                                                                                                                                                                                                                                 CLIENT_HOSTNAME                                                                                                                                                                                                                                                 CLIENT_PID           CLIENT_PLATFORM CLIENT_PORT_NUMBER CLIENT_PRDID         CLIENT_PROTOCOL CLIENT_USERID                                                                                                                                                                                                                                                   CLIENT_WRKSTNNAME                                                                                                                                                                                                                                               CONNECTION_START_TIME      COORD_PARTITION_NUM DESTINATION_SERVICE_CLASS_ID PARTITION_NUMBER SESSION_AUTH_ID                                                                                                                  SOURCE_SERVICE_CLASS_ID SYSTEM_AUTH_ID                                                                                                                   THRESHOLD_ACTION THRESHOLD_MAXVALUE   THRESHOLD_PREDICATE                                              THRESHOLD_QUEUESIZE  THRESHOLDID TIME_OF_VIOLATION          UOW_ID      WORKLOAD_ID

            0 2023-02-05-11.00.20.229103 N                                     1                    7 *LOCAL.db2inst1.230205005447                                     db2bpding-ubuntu                                                                                                                                                                                                                                                                    14472 LINUXX8664                       0 SQL11050             LOCAL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           2023-02-05-08.54.47.846733                   0                            0                0 DB2INST1                                                                                                                                               0 DB2INST1                                                                                                                         Stop                               60 ActivityTotalTime                                                                   0           1 2023-02-05-11.25.40.000000          31           1

  1 record(s) selected.

注:可以通过其 thresholdid 字段来join SYSCAT.THRESHOLDS 视图,获取threshold的名字,方便过滤。本例中因为只有1条记录,没有做join。

可见, THRESHOLDVIOLATIONS_VIOLATIONS 表里包含了违规SQL的详细信息。

但是并没有包含SQL语句的信息,如果想要获取违规的SQL语句,则要借助于activity event monitor,参见下面的例子。

例3

需求:当SQL语句运行时间超过1分钟时,继续运行,并收集违规信息,以及违规的SQL语句。

为了收集违规的SQL语句,我们需要创建一个activity event monitor:

➜  ~ db2 "CREATE EVENT MONITOR ACTIVITIES 
   FOR ACTIVITIES
   WRITE TO TABLE
   MANUALSTART"
DB20000I  The SQL command completed successfully.

注:本例中没有指定表名,其默认的表名如下:

  • ACTIVITY_ACTIVITIES
  • ACTIVITYSTMT_ACTIVITIES
  • ACTIVITYMETRICS_ACTIVITIES
  • ACTIVITYVALS_ACTIVITIES

启动activity event monitor:

➜  ~ db2 "set event monitor ACTIVITIES state 1"                
DB20000I  The SQL command completed successfully.

接下来,我们需要修改threshold的定义,一个修改是添加收集信息,另一个修改是违规后继续运行:

➜  ~ db2 "ALTER THRESHOLD THLONGSQL1
   WHEN EXCEEDED COLLECT ACTIVITY DATAON COORDINATOR WITH DETAILS
   CONTINUE"
DB20000I  The SQL command completed successfully.

现在,我们再次运行测试SQL:

➜  ~ db2 "WITH TEMP1 (NUM,TS1,TS2) AS (VALUES (INT(1) ,TIMESTAMP(GENERATE_UNIQUE()) ,TIMESTAMP(GENERATE_UNIQUE())) UNION ALL SELECT NUM + 1 ,TS1 ,TIMESTAMP(GENERATE_UNIQUE()) FROM TEMP1 WHERE TIMESTAMPDIFF(2,CHAR(TS2-TS1)) < 300) SELECT MAX(NUM) AS #LOOPS ,MIN(TS2) AS BGN_TIMESTAMP ,MAX(TS2) AS END_TIMESTAMP FROM TEMP1"

#LOOPS      BGN_TIMESTAMP              END_TIMESTAMP             
----------- -------------------------- --------------------------
SQL0347W  The recursive common table expression "DB2INST1.TEMP1" may contain 
an infinite loop.  SQLSTATE=01605

  157740784 2023-02-05-04.01.58.948017 2023-02-05-04.06.58.948039

  1 record(s) selected with 1 warning messages printed.

这次,在运行300秒后,才会返回。

另外,activity event monitor的数据,貌似是在SQL运行结束后才生成的,即运行300秒之后。

这时,我们就可以获取违规的信息,同时也可以获取SQL语句的信息:

➜  ~ db2 "select cast(stmt.stmt_text as varchar(255)),
    th.appl_id,
    th.application_name,
    th.client_hostname,
    th.session_auth_id,
    th.threshold_maxvalue,
    th.time_of_violation,
    th.activate_timestamp
    -- other columns
from THRESHOLDVIOLATIONS_VIOLATIONS th
join ACTIVITYSTMT_ACTIVITIES stmt on th.appl_id = stmt.appl_id and th.uow_id = stmt.uow_id and th.activity_id = stmt.activity_id
join syscat.thresholds v on th.thresholdid = v.thresholdid
where v.thresholdname = 'THLONGSQL1'"

1                                                                                                                                                                                                                                                               APPL_ID                                                          APPLICATION_NAME                                                                                                                                                                                                                                                CLIENT_HOSTNAME                                                                                                                                                                                                                                                 SESSION_AUTH_ID                                                                                                                  THRESHOLD_MAXVALUE   TIME_OF_VIOLATION          ACTIVATE_TIMESTAMP        

WITH TEMP1 (NUM,TS1,TS2) AS (VALUES (INT(1) ,TIMESTAMP(GENERATE_UNIQUE()) ,TIMESTAMP(GENERATE_UNIQUE())) UNION ALL SELECT NUM + 1 ,TS1 ,TIMESTAMP(GENERATE_UNIQUE()) FROM TEMP1 WHERE TIMESTAMPDIFF(2,CHAR(TS2-TS1)) < 300) SELECT MAX(NUM) AS #LOOPS ,MIN(TS2) *LOCAL.db2inst1.230205005447                                     db2bp                                                                                                                                                                                                                                                           ding-ubuntu                                                                                                                                                                                                                                                     DB2INST1                                                                                                                                           60 2023-02-05-12.02.59.000000 2023-02-05-11.00.20.229103
SQL0445W  Value "WITH TEMP1 (NUM,TS1,TS2) AS (VALUES (INT(1) ,TIMESTAMP(GENER" 
has been truncated.  SQLSTATE=01004


  1 record(s) selected with 1 warning messages printed.

清理

清理 activity event monitor:

➜  ~ db2 "set event monitor ACTIVITIES state 0"
DB20000I  The SQL command completed successfully.
➜  ~ db2 "drop event monitor ACTIVITIES"
DB20000I  The SQL command completed successfully.
➜  ~ db2 "drop table ACTIVITY_ACTIVITIES"
DB20000I  The SQL command completed successfully.
➜  ~ db2 "drop table ACTIVITYSTMT_ACTIVITIES"
DB20000I  The SQL command completed successfully.
➜  ~ db2 "drop table ACTIVITYMETRICS_ACTIVITIES"
DB20000I  The SQL command completed successfully.
➜  ~ db2 "drop table ACTIVITYVALS_ACTIVITIES"   
DB20000I  The SQL command completed successfully.

清理 threshold violations event monitor:

➜  ~ db2 "set event monitor VIOLATIONS state 0"
DB20000I  The SQL command completed successfully.
➜  ~ db2 "drop table THRESHOLDVIOLATIONS_VIOLATIONS"
DB20000I  The SQL command completed successfully.

清理threshold:

➜  ~ db2 "drop threshold THLONGSQL1"
DB20000I  The SQL command completed successfully.

参考

  • https://www.ibm.com/docs/en/db2/11.5?topic=management-control-work-thresholds
  • https://www.ibm.com/docs/en/db2/11.5?topic=aem-example-capturing-activity-information-related-execution-specific-statement
  • https://www.ibm.com/docs/en/db2/11.5?topic=intervention-monitoring-threshold-violations

你可能感兴趣的:(Db2,DB,db2,threshold,event,monitor,数据库)