应用某个功能模块访问数据库卡住没反应,这个模块以前也慢,但都能正常访问,现在查询出不来。
根据客户的描述,在现场让客户重新操作了一下应用,发现确实卡住,在数据库后台收集ash报告,检查发现有条sql语句长时间不能执行出结果。
SELECT /*+ RULE */
'' as jkrwdm,
......略..........
HESUANXM,
SUM(ROUND(JIESUANJE, 2)) AS JIESUANJE
FROM MZ_SHOURU_RB_CB
WHERE RIBAORQ BETWEEN '20240101' AND '20241231'
GROUP BY SUBSTR(RIBAORQ, 1, 6),
RIBAORQ,
KAIDANKS,
KAIDANYS,
ZHIXINGKS,
HESUANXM
初步判断是由于错误的执行计划,导致性能问题。
客户执行应用有问题的模块,收集ash报告,查看top事件
查看到“SQLNet message from dblink”事件,主要存在dblink层面。进一步往下看,
Select语句和sql执行应该最符合这次的问题。检查具体的sql语句事件
通过后台查询发现,sql_id为“9sv04dqwk1hbb”的语句一直不能查询出来,把“/+ RULE */”去掉也不能查询出结果。
后台不能跟踪出执行计划,检查发现“MZ_SHOURU_RB_CB”是个同义词,最终是个由多个试图查询做“union all”连接,查询数据库内的几个试图,都能很快出来。
在查询有个视图通过dblink查询的很慢,一直不能出来
查看该视图如下:
CREATE OR REPLACE VIEW HISJC.HISMZ_TIJIAN AS
SELECT B.YIJIID AS TIJIANID,
........略...........
D.TIJIANBM
FROM MZ_SHOUFEI1 A,MZ_YIJI1 B,MZ_YIJI2 C,TJ_GERENJSZD D,TJ_TIJIANDZHXM E
WHERE A.SHOUFEIID = B.SHOUFEIID AND B.YIJIID = C.YIJIID and nvl(b.chongxiaobz,0) = 0
AND B.YIJIID = D.HISJSBM AND D.JIEZHANGBM = E.JIEZHANGBM
AND B.SHOUFEIFS = '12' AND B.KAIDANYS <> '九溪'
UNION ALL
........多个union all略...........
UNION ALL
SELECT B.YIJIID AS TIJIANID,
C.YIJIMXID AS TIJIANMXID,
........略...........
I.DANWEITJDBM
FROM MZ_SHOUFEI1 A,MZ_YIJI1 B,MZ_YIJI2 C,TJGLJX.TJ_DANWEIJSZD_MX D,TJGLJX.TJ_TIJIANDZHXM E,TJGLJX.TJ_TIJIANZHE F,MZ_YIJI1 H,TJGLJX.TJ_DANWEIJSZD I
WHERE A.SHOUFEIID = B.SHOUFEIID AND B.YIJIID = C.YIJIID AND B.YIJIID = H.YUANYIJID
AND H.YIJIID = D.HISJSBM AND d.Jiezhangbm1 = e.Jiezhangbm
AND E.TIJIANZBM = F.TIJIANZHEBM
AND D.JIEZHANGBM1 = I.JIEZHANGBM
AND H.SHOUFEIFS = '12' AND B.KAIDANYS = '九溪'
对“union all”的子查询挨个查了一遍,发现第一个和第二个子查询不能出结果,通过整个试图查看发现“TJ_TIJIANDZHXM”表是全表扫描,该表有700多万数据。
用到这张表有两个字段做外关联JIEZHANGBM,TIJIANZBM,检查两个字段在原表上都有索引创建
这两个字段的数据分布情况:
创建新的测试试图,分别用了“/+RULE/”和hint index强制索引,再查询试图,发现很快就能查询出来
SELECT /*+RULE*/ B.YIJIID AS TIJIANID,
C.YIJIMXID AS TIJIANMXID,
........略...........
D.TIJIANBM
FROM MZ_SHOUFEI1 A,MZ_YIJI1 B,MZ_YIJI2 C,TJ_GERENJSZD D,TJ_TIJIANDZHXM E
WHERE A.SHOUFEIID = B.SHOUFEIID AND B.YIJIID = C.YIJIID and nvl(b.chongxiaobz,0) = 0
AND B.YIJIID = D.HISJSBM AND D.JIEZHANGBM = E.JIEZHANGBM
AND B.SHOUFEIFS = '12' AND B.KAIDANYS <> '九溪'
正式调整试图,检查原查询还是慢,去掉查询中的“/*+ RULE */”,发现1秒不到数据就出来了。
SELECT '' as jkrwdm,
0 as jkzxxh,
..........略...........
HESUANXM,
SUM(ROUND(JIESUANJE, 2)) AS JIESUANJE
FROM MZ_SHOURU_RB_CB
WHERE RIBAORQ BETWEEN '20200101' AND '20191231'
GROUP BY SUBSTR(RIBAORQ, 1, 6),
RIBAORQ,
KAIDANKS,
KAIDANYS,
ZHIXINGKS,
HESUANXM
与客户交流,该客户端很比较旧,不能对客户端做修改。绑定“sql profile”,让sql语句按照我们正常的执行计划走,查询出sql_id
SQL> @ch_sql_profile.sql
Enter value for sql_id1(used to generate sql_text): 8xx8wqjjata3s --坏语句的sql_id
Enter value for child_no1 (used to generate sql_text) (0): --坏语句的child_number,如果有多个则选对应的,如果是0,则回车
Enter value for sql_id2(used to generate sql_hints):6va184knjrspn --想要走的sql_id
Enter value for child_no2(used to generate sql_hints) (0): --想要走的sql_id的child_number
该问题,主要是查询的sql未按正常的执行计划查询,查询超级慢,导致应用一直卡在那边。有些查询可以在试图中调整,就可以恢复。但是有些查询语句源于客户端,客户端不能修改,只能通过绑定sql profile去调整执行计划按正常的语句执行。
该问题如果要最终解决,必须从客户端上把rule hint去掉才行。