Threshold可以用来识别并控制DB系统中不正常的工作情况。例如,在数据库中,某个查询语句消耗了大量的CPU时间,我们可以通过两种方式来监控该查询:在其运行之前,基于估算成本,或者当其在运行期间消耗了大于允许值的资源。
CONNECTIONIDLETIME
UOWTOTALTIME
ACTIVITYTOTALTIME
ACTIVITYTOTALRUNTIME
ACTIVITYTOTALRUNTIMEINALLSC
CPUTIME
CPUTIMEINSC
DATATAGINSC
ESTIMATEDSQLCOST
SORTSHRHEAPUTIL
SQLROWSREAD
SQLROWSREADINSC
SQLROWSRETURNED
SQLTEMPSPACE
AGGSQLTEMPSPACE
CONCURRENTWORKLOADOCCURRENCES
CONCURRENTWORKLOADACTIVITIES
CONCURRENTDBCOORDACTIVITIES
TOTALMEMBERCONNECTIONS
TOTALSCMEMBERCONNECTIONS
当违反(violate)了threshold时,可以做如下处理:
STOP EXECUTION
:停止运行,并返回error codeCONTINUE
:继续运行,不会返回error code,违规信息会记录在threshold violations event monitor里FORCE APPLICATION
:当 UOWTOTALTIME
threshold违规时,可以force applicationREMAP ACTIVITY TO
:动态增加或减少资源,继续运行COLLECT ACTIVITY DATA
:收集数据。具体信息会被记录在threshold violations event monitor和activity event monitor里TOTALMEMBERCONNECTIONS
CONCURRENTWORKLOADOCCURRENCES
TOTALSCMEMBERCONNECTIONS
参见 https://www.ibm.com/docs/en/db2/11.5?topic=statements-create-threshold
:
ALTER THRESHOLD
语句来改变状态下面我们用一些具体例子,来学习一下threshold的具体用法。
测试场景:当DB中的SQL语句运行时间超过1分钟时,需要引起关注。我们可以通过threshold来监控SQL语句。
测试的SQL语句为: WITH TEMP1 (NUM,TS1,TS2) AS (VALUES (INT(1) ,TIMESTAMP(GENERATE_UNIQUE()) ,TIMESTAMP(GENERATE_UNIQUE())) UNION ALL SELECT NUM + 1 ,TS1 ,TIMESTAMP(GENERATE_UNIQUE()) FROM TEMP1 WHERE TIMESTAMPDIFF(2,CHAR(TS2-TS1)) < 300) SELECT MAX(NUM) AS #LOOPS ,MIN(TS2) AS BGN_TIMESTAMP ,MAX(TS2) AS END_TIMESTAMP FROM TEMP1
其中 300
指定了运行的时间秒数,即运行5分钟。
需求:当SQL语句运行时间超过1分钟时,停止运行。
首先,创建threshold THLONGSQL1
如下:
➜ ~ db2 "CREATE THRESHOLD THLONGSQL1
FOR DATABASE ACTIVITIES
ENFORCEMENT DEFAULT
WHEN ACTIVITYTOTALTIME > 1 MINUTE
STOP EXECUTION"
DB20000I The SQL command completed successfully.
创建成功后,可以在 SYSCAT.THRESHOLDS
视图中查询到该threshold:
➜ ~ db2 "select * from syscat.thresholds"
THRESHOLDNAME THRESHOLDID ORIGIN THRESHOLDCLASS THRESHOLDPREDICATE THRESHOLDPREDICATEID DOMAIN DOMAINID ENFORCEMENT QUEUING MAXVALUE DATATAGLIST QUEUESIZE OVERFLOWPERCENT COLLECTACTDATA COLLECTACTPARTITION EXECUTION REMAPSCID VIOLATIONRECORDLOGGED CHECKINTERVAL ENABLED CREATE_TIME ALTER_TIME REMARKS

SYSDEFAULTCONCURRENT 2147483647 U A CONCDBC 90 SB 4 D Y 12 - -1 -1 N C S 0 Y 0 N 2022-07-01-10.30.33.060823 2022-07-01-10.30.33.060823 -
THLONGSQL1 1 U C TOTALTIME 30 DB 10 D N 60 - 0 0 N C S 0 Y -1 Y 2023-02-05-10.43.00.131082 2023-02-05-10.43.00.131082 -
2 record(s) selected.
接下来,我们运行测试SQL语句:
➜ ~ db2 "WITH TEMP1 (NUM,TS1,TS2) AS (VALUES (INT(1) ,TIMESTAMP(GENERATE_UNIQUE()) ,TIMESTAMP(GENERATE_UNIQUE())) UNION ALL SELECT NUM + 1 ,TS1 ,TIMESTAMP(GENERATE_UNIQUE()) FROM TEMP1 WHERE TIMESTAMPDIFF(2,CHAR(TS2-TS1)) < 300) SELECT MAX(NUM) AS #LOOPS ,MIN(TS2) AS BGN_TIMESTAMP ,MAX(TS2) AS END_TIMESTAMP FROM TEMP1"
#LOOPS BGN_TIMESTAMP END_TIMESTAMP
----------- -------------------------- --------------------------
SQL0347W The recursive common table expression "DB2INST1.TEMP1" may contain
an infinite loop. SQLSTATE=01605
SQL4712N The activity or request was stopped because the threshold
"THLONGSQL1" has been exceeded. Reason code: "9". SQLSTATE=5U026
当SQL运行时间超过1分钟时,运行被终止,并返回了 SQL4712N
错误信息。
需求:当SQL语句运行时间超过1分钟时,停止运行,并收集违规信息。
为了收集违规信息,我们需要创建一个threshold violations event monitor:
➜ ~ db2 "CREATE EVENT MONITOR VIOLATIONS FOR THRESHOLD VIOLATIONS WRITE TO TABLE MANUALSTART"
DB20000I The SQL command completed successfully.
注:本例没有指定表名,默认的表名为 THRESHOLDVIOLATIONS_VIOLATIONS
。
创建成功后,可以通过 SYSCAT.EVENTMONITORS
视图查看event monitor:
➜ ~ db2 "select event_mon_state(evmonname), * from syscat.eventmonitors"
1 EVMONNAME OWNER OWNERTYPE TARGET_TYPE TARGET MAXFILES MAXFILESIZE BUFFERSIZE IO_MODE WRITE_MODE AUTOSTART DBPARTITIONNUM MONSCOPE EVMON_ACTIVATES NODENUM DEFINER VERSIONNUMBER MEMBER REMARKS
----------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- --------- ----------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ ----------- ----------- ----------- ------- ---------- --------- -------------- -------- --------------- ------- -------------------------------------------------------------------------------------------------------------------------------- ------------- ------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 DB2DETAILDEADLOCK DB2INST1 U F db2detaildeadlock 20 512 17 B A Y 0 G 0 0 DB2INST1 11050000 0 -
0 VIOLATIONS DB2INST1 U T - - 4 B - N 0 T 0 0 DB2INST1 11050000 0 -
2 record(s) selected.
其中 event_mon_state(evmonname)
返回值为0表示event monitor处于disabled状态,1表示enabled状态。
接下来启动event monitor:
➜ ~ db2 "set event monitor VIOLATIONS state 1"
DB20000I The SQL command completed successfully.
此时, THRESHOLDVIOLATIONS_VIOLATIONS
表中没有数据:
➜ ~ db2 "select count(*) from THRESHOLDVIOLATIONS_VIOLATIONS"
1
-----------
0
1 record(s) selected.
现在,我们再次运行测试SQL:
➜ ~ db2 "WITH TEMP1 (NUM,TS1,TS2) AS (VALUES (INT(1) ,TIMESTAMP(GENERATE_UNIQUE()) ,TIMESTAMP(GENERATE_UNIQUE())) UNION ALL SELECT NUM + 1 ,TS1 ,TIMESTAMP(GENERATE_UNIQUE()) FROM TEMP1 WHERE TIMESTAMPDIFF(2,CHAR(TS2-TS1)) < 300) SELECT MAX(NUM) AS #LOOPS ,MIN(TS2) AS BGN_TIMESTAMP ,MAX(TS2) AS END_TIMESTAMP FROM TEMP1"
#LOOPS BGN_TIMESTAMP END_TIMESTAMP
----------- -------------------------- --------------------------
SQL0347W The recursive common table expression "DB2INST1.TEMP1" may contain
an infinite loop. SQLSTATE=01605
SQL4712N The activity or request was stopped because the threshold
"THLONGSQL1" has been exceeded. Reason code: "9". SQLSTATE=5U026
同样,运行时间超过1分钟时,就会被终止。
此时,我们再次查询 THRESHOLDVIOLATIONS_VIOLATIONS
表,就会发现违规的具体信息:
➜ ~ db2 "select * from THRESHOLDVIOLATIONS_VIOLATIONS"
PARTITION_KEY ACTIVATE_TIMESTAMP ACTIVITY_COLLECTED ACTIVITY_ID AGENT_ID APPL_ID APPLICATION_NAME CLIENT_ACCTNG CLIENT_APPLNAME CLIENT_HOSTNAME CLIENT_PID CLIENT_PLATFORM CLIENT_PORT_NUMBER CLIENT_PRDID CLIENT_PROTOCOL CLIENT_USERID CLIENT_WRKSTNNAME CONNECTION_START_TIME COORD_PARTITION_NUM DESTINATION_SERVICE_CLASS_ID PARTITION_NUMBER SESSION_AUTH_ID SOURCE_SERVICE_CLASS_ID SYSTEM_AUTH_ID THRESHOLD_ACTION THRESHOLD_MAXVALUE THRESHOLD_PREDICATE THRESHOLD_QUEUESIZE THRESHOLDID TIME_OF_VIOLATION UOW_ID WORKLOAD_ID

0 2023-02-05-11.00.20.229103 N 1 7 *LOCAL.db2inst1.230205005447 db2bp ding-ubuntu 14472 LINUXX8664 0 SQL11050 LOCAL 2023-02-05-08.54.47.846733 0 0 0 DB2INST1 0 DB2INST1 Stop 60 ActivityTotalTime 0 1 2023-02-05-11.25.40.000000 31 1
1 record(s) selected.
注:可以通过其 thresholdid
字段来join SYSCAT.THRESHOLDS
视图,获取threshold的名字,方便过滤。本例中因为只有1条记录,没有做join。
可见, THRESHOLDVIOLATIONS_VIOLATIONS
表里包含了违规SQL的详细信息。
但是并没有包含SQL语句的信息,如果想要获取违规的SQL语句,则要借助于activity event monitor,参见下面的例子。
需求:当SQL语句运行时间超过1分钟时,继续运行,并收集违规信息,以及违规的SQL语句。
为了收集违规的SQL语句,我们需要创建一个activity event monitor:
➜ ~ db2 "CREATE EVENT MONITOR ACTIVITIES
FOR ACTIVITIES
WRITE TO TABLE
MANUALSTART"
DB20000I The SQL command completed successfully.
注:本例中没有指定表名,其默认的表名如下:
ACTIVITY_ACTIVITIES
ACTIVITYSTMT_ACTIVITIES
ACTIVITYMETRICS_ACTIVITIES
ACTIVITYVALS_ACTIVITIES
启动activity event monitor:
➜ ~ db2 "set event monitor ACTIVITIES state 1"
DB20000I The SQL command completed successfully.
接下来,我们需要修改threshold的定义,一个修改是添加收集信息,另一个修改是违规后继续运行:
➜ ~ db2 "ALTER THRESHOLD THLONGSQL1
WHEN EXCEEDED COLLECT ACTIVITY DATAON COORDINATOR WITH DETAILS
CONTINUE"
DB20000I The SQL command completed successfully.
现在,我们再次运行测试SQL:
➜ ~ db2 "WITH TEMP1 (NUM,TS1,TS2) AS (VALUES (INT(1) ,TIMESTAMP(GENERATE_UNIQUE()) ,TIMESTAMP(GENERATE_UNIQUE())) UNION ALL SELECT NUM + 1 ,TS1 ,TIMESTAMP(GENERATE_UNIQUE()) FROM TEMP1 WHERE TIMESTAMPDIFF(2,CHAR(TS2-TS1)) < 300) SELECT MAX(NUM) AS #LOOPS ,MIN(TS2) AS BGN_TIMESTAMP ,MAX(TS2) AS END_TIMESTAMP FROM TEMP1"
#LOOPS BGN_TIMESTAMP END_TIMESTAMP
----------- -------------------------- --------------------------
SQL0347W The recursive common table expression "DB2INST1.TEMP1" may contain
an infinite loop. SQLSTATE=01605
157740784 2023-02-05-04.01.58.948017 2023-02-05-04.06.58.948039
1 record(s) selected with 1 warning messages printed.
这次,在运行300秒后,才会返回。
另外,activity event monitor的数据,貌似是在SQL运行结束后才生成的,即运行300秒之后。
这时,我们就可以获取违规的信息,同时也可以获取SQL语句的信息:
➜ ~ db2 "select cast(stmt.stmt_text as varchar(255)),
th.appl_id,
th.application_name,
th.client_hostname,
th.session_auth_id,
th.threshold_maxvalue,
th.time_of_violation,
th.activate_timestamp
-- other columns
from THRESHOLDVIOLATIONS_VIOLATIONS th
join ACTIVITYSTMT_ACTIVITIES stmt on th.appl_id = stmt.appl_id and th.uow_id = stmt.uow_id and th.activity_id = stmt.activity_id
join syscat.thresholds v on th.thresholdid = v.thresholdid
where v.thresholdname = 'THLONGSQL1'"
1 APPL_ID APPLICATION_NAME CLIENT_HOSTNAME SESSION_AUTH_ID THRESHOLD_MAXVALUE TIME_OF_VIOLATION ACTIVATE_TIMESTAMP

WITH TEMP1 (NUM,TS1,TS2) AS (VALUES (INT(1) ,TIMESTAMP(GENERATE_UNIQUE()) ,TIMESTAMP(GENERATE_UNIQUE())) UNION ALL SELECT NUM + 1 ,TS1 ,TIMESTAMP(GENERATE_UNIQUE()) FROM TEMP1 WHERE TIMESTAMPDIFF(2,CHAR(TS2-TS1)) < 300) SELECT MAX(NUM) AS #LOOPS ,MIN(TS2) *LOCAL.db2inst1.230205005447 db2bp ding-ubuntu DB2INST1 60 2023-02-05-12.02.59.000000 2023-02-05-11.00.20.229103
SQL0445W Value "WITH TEMP1 (NUM,TS1,TS2) AS (VALUES (INT(1) ,TIMESTAMP(GENER"
has been truncated. SQLSTATE=01004
1 record(s) selected with 1 warning messages printed.
清理 activity event monitor:
➜ ~ db2 "set event monitor ACTIVITIES state 0"
DB20000I The SQL command completed successfully.
➜ ~ db2 "drop event monitor ACTIVITIES"
DB20000I The SQL command completed successfully.
➜ ~ db2 "drop table ACTIVITY_ACTIVITIES"
DB20000I The SQL command completed successfully.
➜ ~ db2 "drop table ACTIVITYSTMT_ACTIVITIES"
DB20000I The SQL command completed successfully.
➜ ~ db2 "drop table ACTIVITYMETRICS_ACTIVITIES"
DB20000I The SQL command completed successfully.
➜ ~ db2 "drop table ACTIVITYVALS_ACTIVITIES"
DB20000I The SQL command completed successfully.
清理 threshold violations event monitor:
➜ ~ db2 "set event monitor VIOLATIONS state 0"
DB20000I The SQL command completed successfully.
➜ ~ db2 "drop table THRESHOLDVIOLATIONS_VIOLATIONS"
DB20000I The SQL command completed successfully.
清理threshold:
➜ ~ db2 "drop threshold THLONGSQL1"
DB20000I The SQL command completed successfully.
https://www.ibm.com/docs/en/db2/11.5?topic=management-control-work-thresholds
https://www.ibm.com/docs/en/db2/11.5?topic=aem-example-capturing-activity-information-related-execution-specific-statement
https://www.ibm.com/docs/en/db2/11.5?topic=intervention-monitoring-threshold-violations