1,先建立测试表和数据,
CREATE TABLE EMP AS
SELECT LEVEL EMPL_ID,
(MOD (ROWNUM, 20)+1) DEPT_ID,
SUBSTR(DBMS_RANDOM.STRING ('X', DBMS_RANDOM.VALUE (20, 50)),0,10) EMPNAME,
TRUNC (DBMS_RANDOM.VALUE (1000, 500000), 2) SALARY,
DECODE (ROUND (DBMS_RANDOM.VALUE (1, 2)), 1, 'M', 2, 'F') GENDER,
TO_DATE ( ROUND (DBMS_RANDOM.VALUE (1, 28))
|| '-'
|| ROUND (DBMS_RANDOM.VALUE (1, 12))
|| '-'
|| ROUND (DBMS_RANDOM.VALUE (1900, 2010)),
'DD-MM-YYYY') DOB
FROM DUAL
CONNECT BY LEVEL < 1001;
CREATE TABLE dept AS
SELECT LEVEL dept_id,
SUBSTR(DBMS_RANDOM.STRING ('X', DBMS_RANDOM.VALUE (20, 50)),0,10) manager,
DECODE (ROUND (DBMS_RANDOM.VALUE (1, 2)), 1, 'M', 2, 'F') gender,
TO_DATE ( ROUND (DBMS_RANDOM.VALUE (1, 28))
|| '-'
|| ROUND (DBMS_RANDOM.VALUE (1, 12))
|| '-'
|| ROUND (DBMS_RANDOM.VALUE (1900, 2010)),
'DD-MM-YYYY') estbdate
FROM DUAL
CONNECT BY LEVEL < 31;
2,没有索引,第一次执行SQL
SQL> SET AUTOTRACE ON;
SQL> SELECT EMPL_ID,EMPNAME,DEPT.DEPT_ID,MANAGER FROM EMP,DEPT WHERE EMP.DEPT_ID=DEPT.DEPT_ID;
Execution Plan
----------------------------------------------------------
Plan hash value: 615168685
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000 | 53000 | 8 (13)| 00:00:01 |
|* 1 | HASH JOIN | | 1000 | 53000 | 8 (13)| 00:00:01 |
| 2 | TABLE ACCESS FULL| DEPT | 30 | 600 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| EMP | 1000 | 33000 | 4 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("EMP"."DEPT_ID"="DEPT"."DEPT_ID")
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
504 recursive calls
0 db block gets
151 consistent gets
19 physical reads
0 redo size
39139 bytes sent via SQL*Net to client
1107 bytes received via SQL*Net from client
68 SQL*Net roundtrips to/from client
10 sorts (memory)
0 sorts (disk)
1000 rows processed
对这个计划的一些解释:
Plan hash value: 615168685,根据执行的SQL文本得到一个hash值,表示放在共享池中的地址,如果同样的SQL再次执行,直接用这个执行计划.
|* 1 | HASH JOIN | | 1000 | 53000 | 8 (13)| 00:00:01 |
| 2 | TABLE ACCESS FULL| DEPT | 30 | 600 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| EMP | 1000 | 33000 | 4 (0)| 00:00:01 |
两个表做连接的一种方式,这里是hash join,这里是是先将dept表的所有数据做扫描,对每一条数据,根据DEPT 的dept_id算出一个hash值,放在该hash值代表的内存join区域,然后扫描emp表,对每一条emp表的数据的dept_id计算hash值,然后按照hash值放到join区域,形成数据的连接.
表连接的实现方法还有排序合并和嵌套循环方式,在后面做介绍.
Predicate Information (identified by operation id):1 - access("EMP"."DEPT_ID"="DEPT"."DEPT_ID"),上面执行路径中某些步骤用到的过滤条件.
SQL中的某一个条件可能多个步骤中用到,如果我们把SQL语句改为:
SQL> SELECT EMPL_ID,EMPNAME,DEPT.DEPT_ID,MANAGER FROM EMP,DEPT WHERE EMP.DEPT_ID=DEPT.DEPT_ID and dept.dept_id=2;
|* 1 | HASH JOIN | | 50 | 2650 | 8 (13)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| DEPT | 1 | 20 | 3 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| EMP | 50 | 1650 | 4 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("EMP"."DEPT_ID"="DEPT"."DEPT_ID")
2 - filter("DEPT"."DEPT_ID"=2)
3 - filter("EMP"."DEPT_ID"=2)
可以看到"DEPT_ID"=2的过滤条件会在扫描dept和emp表的时候都会用到.
504 recursive calls:执行的SQL可能需要做别的操作,比如查询数据字典验证SQL里的列名是否存在,权限验证等等.
0 db block gets
151 consistent gets:这个是访问undo段的次数.(在缓冲区的数据如果有修改的话,这个值会很大)
19 physical reads:从磁盘读的数据.
0 redo size
39139 bytes sent via SQL*Net to client
1107 bytes received via SQL*Net from client
68 SQL*Net roundtrips to/from client:客户端和数据库网络交互的次数.
10 sorts (memory)
0 sorts (disk)
1000 rows processed
在第一次SQL语句执行后,同样的SQL语句在很短的时间再次执行,得到下面的信息:
0 recursive calls:次数为零,因为执行计划已经在共享池里面,不用再次访问数据字典等其他调用.
0 physical reads:不用从磁盘读数据,数据已经在数据缓冲区.
0 sorts (memory):
对表连接的实现方法的嵌套循环方式实验.
SELECT /*+ use_nl(emp,dept) */EMPL_ID,EMPNAME,DEPT.DEPT_ID,MANAGER FROM EMP,DEPT WHERE EMP.DEPT_ID=DEPT.DEPT_ID;
| 1 | NESTED LOOPS | | 1000 | 53000 | 65 (2)| 00:00:01 |
| 2 | TABLE ACCESS FULL| DEPT | 30 | 600 | 3 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| EMP | 33 | 1089 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------
用hint告诉Oracle用嵌套循环方式的做连接,执行方式为:全表扫描dept,对表dept的每一条数据,根据dept_id的值上emp表找与其dept_id值相同的数据.
对表连接的实现方法的排序合并方式实验.
SELECT /*+ USE_MERGE(emp,dept) */EMPL_ID,EMPNAME,DEPT.DEPT_ID,MANAGER FROM EMP,DEPT WHERE EMP.DEPT_ID=DEPT.DEPT_ID;
| 1 | MERGE JOIN | | 1000 | 53000 | 9 (23)| 00:00:01 |
| 2 | SORT JOIN | | 30 | 600 | 4 (25)| 00:00:01 |
| 3 | TABLE ACCESS FULL| DEPT | 30 | 600 | 3 (0)| 00:00:01 |
|* 4 | SORT JOIN | | 1000 | 33000 | 5 (20)| 00:00:01 |
| 5 | TABLE ACCESS FULL| EMP | 1000 | 33000 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------
用hint告诉Oracle用排序合并的方式做连接,执行方式为:全表扫描dept对数据进行排序,全表扫描emp表对数据进行排序,对排序之后的数据合并得到结果.
SQL> create index dept_deptid_idx on dept(dept_id);
SQL> SELECT EMPL_ID,EMPNAME,DEPT.DEPT_ID,MANAGER FROM EMP,DEPT WHERE EMP.DEPT_ID=DEPT.DEPT_ID;
| 1 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 20 | 1 (0)| 00:00:01 |
| 2 | NESTED LOOPS | | 1000 | 53000 | 6 (17)| 00:00:01 |
| 3 | TABLE ACCESS FULL | EMP | 1000 | 33000 | 4 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | DEPT_DEPTID_IDX | 1 | | 0 (0)| 00:00:01 |
同样的SQL查询,可以看到Oracle的执行计划不一样了,这时候,先全表扫描emp表,对每一条数据,根据dept_id通过索引的方式上dept表查找对应的数据,得到连接的结果集.
Hash join 的工作方式是将一个表(通常是数据少的那个表)作为查询条件的
列做hash运算,将数据存储到hash列表中,然后到另一个表中取记录,做hash运算,
然后和hash列表中的数据做匹配.这种方式通常是在两个表的数据都比较多的时候使用.
nested loops的方式是从驱动表(数据比较少)读取行,对于每一行,访问另一张表(通常走索引)
做匹配比较.这种方式通常在有一个表的数据行数较少的时候效率比较高.
merge join首先将关联表的关联列排序,排序后做匹配.通常能够使用merge join的地方,
hash join都能够发挥更好的性能.