公司的业务系统中存在一个大的日志表,表大约是这样:
create table log
(
logtime date, -- PK
username varchar2(20)
);
现有需求如下:统计日志表中,两小时内使用过系统的用户在三天内的日志数。
最初编写的查询如下:
WITH
result1 AS
(
SELECT DISTINCT username
FROM log
WHERE logtime>=sysdate - 1/24*2
)
SELECT log.username, count(1) as times
FROM log INNER JOIN result1 ON log.username=result1.username
WHERE logtime>=sysdate - 3
GROUP BY log.username;
后来发现,log表记录达到300万后,查询非常慢。测试后发现是因为log表和临时视图result1连接的时候,采用NESTED LOOPS,于是增加提示,采用HASH连接:
WITH
result1 AS
(
SELECT DISTINCT username
FROM log
WHERE logtime>=sysdate - 1/24*2
)
SELECT /**+USE_HASH(log)*/ log.username, count(1) as times
FROM log INNER JOIN result1 ON
le="color: rgb(255, 0, 255);">log.username=result1.username
WHERE logtime>=sysdate - 3
GROUP BY log.username;
效率果然提高了很多,但是还是不令人满意。
测试的过程中发现,如果不先选取两小时内的用户,而直接选取三天内的所有用户来统计,效率非常高。看来,性能的瓶颈还是在表连接上。在已经选用HASH连接的情况下,只有想办法减少连接的记录数了。于是尝试写了如下查询:
WITH
result1 AS
(
SELECT DISTINCT username
FROM log
WHERE logtime>=sysdate - 1/24*2
),
result2 AS
(
SELECT username, count(1) AS times
FROM log
WHERE logtime>=sysdate - 3
GROUP BY username
)
SELECT result2.username, result1.times
FROM result1 INNER JOIN result2 ON result1.username=result2.username;
性能高了好多好多!!!
哦,还忘记了点东西:
WITH
result1 AS
(
SELECT DISTINCT username
FROM&nb
sp;log
WHERE logtime>=sysdate - 1/24*2
),
result2 AS
(
SELECT username, count(1) AS times
FROM log
WHERE logtime>=sysdate - 3
GROUP BY username
)
SELECT /**+USE_HASH(result1)*/ result2.username, result1.times
FROM result1 INNER JOIN result2 ON result1.username=result2.username;
测试一下,性能又高了一点点。
3天内的日志数是300万,第一个查询执行了1小时以上,优化后的最后一个查询只花了112秒。
总结下来,这个查询优化的思路为:大表变小表,小表再连接。
希望有高手能够提出更好的想法,谢谢!