最基本的connect by的用法:
需求1: 我需要下钻所有的树得到level和parent_name
create table test_lvl1 (id number, parent_id number, name varchar2(10));
insert into test_lvl1 values (1,null,'SLI1');
insert into test_lvl1 values (2,1,'SLI2');
insert into test_lvl1 values (3,1,'SLI3');
insert into test_lvl1 values (4,null,'SLI4');
insert into test_lvl1 values (5,2,'SLI5');
insert into test_lvl1 values (6,3,'SLI6');
insert into test_lvl1 values (7,5,'SLI7');
insert into test_lvl1 values (8,7,'SLI8');
insert into test_lvl1 values (9,4,'SLI9');
insert into test_lvl1 values (10,3,'SLI10');
insert into test_lvl1 values (11,1,'SLI11');
kl@k01> select * from test_lvl1;
ID PARENT_ID NAME
---------- ---------- ----------
1 SLI1
2 1 SLI2
3 1 SLI3
4 SLI4
5 2 SLI5
6 3 SLI6
7 5 SLI7
8 7 SLI8
9 4 SLI9
10 3 SLI10
11 1 SLI11
11 rows selected.
select a.*, b.name parent_name
from
(select name, ID, PARENT_ID, LEVEL
from test_lvl1
start with parent_id is null
connect by prior id=PARENT_ID) a,
test_lvl1 b
where a.parent_id=b.id(+)
NAME ID PARENT_ID LEVEL PARENT_NAM
---------- ---------- ---------- ---------- ----------
SLI11 11 1 2 SLI1
SLI3 3 1 2 SLI1
SLI2 2 1 2 SLI1
SLI5 5 2 3 SLI2
SLI10 10 3 3 SLI3
SLI6 6 3 3 SLI3
SLI9 9 4 2 SLI4
SLI7 7 5 4 SLI5
SLI8 8 7 5 SLI7
SLI4 4 1
SLI1 1 1
11 rows selected.
* 解释一下:
* 1. Start with表示从那一层开始的,后面跟表达式,如: pid=0,
寻找继承关系时,指定的顶点,如果需要对整个表进行整理,比较常用
的作法是: PID is null. 例如这种情况顶点的Parent_ID显然是NULL,
所以从PID is null开始无疑是最完整的。验证如下:
* 2. prior 表示返回所以符合这种条件(如id=pid)的connect by操作结果.
即以循环的显示所有的记录和继承关系。可以试一下如果把prior去掉,那么
只有两条记录,验证如下:
select name, ID, PARENT_ID, LEVEL
from test_lvl1
start with parent_id is null
connect by prior id=PARENT_ID;
--- 如果用Chile ID或者其他列作为开始列,会丢掉分支, 如下查询,可以自己验证
select name, ID, PARENT_ID, LEVEL
from test_lvl1
start with id =1
connect by prior id=PARENT_ID(+);
select name, ID, PARENT_ID, LEVEL
from test_lvl1
start with NAME='SLI1'
connect by prior id(+)=PARENT_ID;
* 另外一个很有趣的现象,对上述start with条件不同,而且结果记录数也不同的查询查看执行计划
竟是完全相同的。测试版本是Oracle 10.2.0.1
测试过程如下,节约篇幅,结果不述:
1. Set autotrace on;
2. 执行上面的query,查看返回值和执行计划;
需求2,我需要每个从根节点到当前孩子节点的路径:
create table test_parent1 (id number, name varchar2(20), pid number, parent_name varchar2(20));
insert into test_parent1 values (1,'JEFF',null,null);
insert into test_parent1 values (2,'LARRY',1,'JEFF');
insert into test_parent1 values (3,'BILL',1,'JEFF');
insert into test_parent1 values (4,'DELL',1,'JEFF');
insert into test_parent1 values (5,'MARK',3,'LARRY');
insert into test_parent1 values (6,'KART',5,'MARK');
insert into test_parent1 values (7,'ANDY',6,'KART');
insert into test_parent1 values (8,'CANDY',null,null);
insert into test_parent1 values (9,'LOUIS',8,'CANDY');
commit;
--- 得到如下表:
kl@k01> select * from test_parent1;
ID NAME PID PARENT_NAME
---------- -------------------- ---------- --------------------
1 JEFF
2 LARRY 1 JEFF
3 BILL 1 JEFF
8 CANDY
4 DELL 1 JEFF
5 MARK 3 LARRY
6 KART 5 MARK
7 ANDY 6 KART
9 LOUIS 8 CANDY
8 rows selected.
可以看出来一个简单的层次关系,如果我希望得到每个人的直接主管,应该就足够了,
但是要得到每个人的顶级BOSS,即变成如下:
1 JEFF
2 LARRY 1
3 BILL 1
8 CANDY
4 DELL 1
5 MARK 1
6 KART 1
7 ANDY 1
9 LOUIS 8
我们还需要做如下处理,使用SYS_CONNECT_BY_PATH,这个函数只使用于9i和9i以上的版本,
她可以帮我返回一个字符串,这个字符串是一个from-root-to-node的字串。如下所示:
col PID_PATH for a20
col parent_name for a30
select ID, NAME, SYS_CONNECT_BY_PATH(pid,'/') as pid_path, SYS_CONNECT_BY_PATH(parent_name,'/') as parent_name
from test_parent1
start with pid=1 connect by prior id=pid;
ID NAME PID_PATH PARENT_NAME
---------- -------------------- -------------------- ------------------------------
2 LARRY /1 /JEFF
3 BILL /1 /JEFF
5 MARK /1/3 /JEFF/LARRY
6 KART /1/3/5 /JEFF/LARRY/MARK
7 ANDY /1/3/5/6 /JEFF/LARRY/MARK/KART
4 DELL /1 /JEFF
6 rows selected.
需求3, 字符串分组方法(这里参考了zhouwf0726在ITPUB里面的研究结果 http://www.itpub.net/thread-614563-1-1.html):
++++++++++++ Input ++++++++++++++++++++
ID MC
---------- ---------
1 11111
1 22222
2 11111
2 22222
3 11111
3 22222
3 33333
++++++++++++ output ++++++++++++++++++++
ID ADD_MC
---------- -------------------
1 11111;22222
2 11111;22222
3 11111;22222;33333
++++++++++++++++++++++++++++++++++++++++
实现方法和试验步骤如下:
drop table test purge;
create table test(id varchar2(10),mc varchar2(50));
insert into test values('1','11111');
insert into test values('1','22222');
insert into test values('2','11111');
insert into test values('2','22222');
insert into test values('3','11111');
insert into test values('3','22222');
insert into test values('3','33333');
commit;
* 先看看怎么利用connect by, 如果要利用connect by,必须考虑使用sys_connect_by_path(上面有详细介绍)
关键是如何选择connect by的条件,为什么要制定rn_id,我们需要对ID进行分组,从而把同组的MC组合起来,
而对于RN,
col MC for a10
select id, mc, row_number() over(partition by id order by id) rn_id,
row_number() over(order by id)+id rn
from test;
ID MC RN_ID RN
---------- ---------- ---------- ----------
1 11111 1 2
1 22222 2 3
2 11111 1 5
2 22222 2 6
3 11111 1 8
3 22222 2 9
3 33333 3 10
7 rows selected.
col add_mc for a50
select id,ltrim(max(sys_connect_by_path(mc,';')),';') add_mc from (
select id, mc, row_number() over(partition by id order by id) rn_id,
row_number() over(order by id)+id rn from test)
start with rn_id=1 connect by rn-1=prior rn
group by id
order by id
/
ID ADD_MC
---------- --------------------------------------------------
1 11111;22222
2 11111;22222
3 11111;22222;33333
3 rows selected.
## 解释: 首先要熟悉分析函数 row_number() over(partition by id order by id)
表示以id进行分组,显示每条记录中id的数量,order by id是对id进行排序。
rn是顺序的,如果直接connect by一组顺序的数,将一直连续下去,所以因为一直满足rn - 1= rn,
所以要使rn为 row_number() over(order by id)+id,因为每组的id都不一样,这样才能避免连续的数字出现。
需求4,我需要在输出的时候就对结果进行格式化:
lpad和level配合决定错行缩进显示:
select level||'layer' layer,lpad(' ',level*5)||id id
from test_parent1
start with pid=1 connect by prior id=pid;
LAYER ID
-------------------- -----------------------------------
1layer 2
1layer 3
2layer 5
3layer 6
4layer 7
1layer 4
6 rows selected.
需求4, 如果需要构建一种循环处理机制,比如把一个字符的每个字母输出,不用通过
PLSQL来实现,直接用connect by即可,当着这里已经有点跑题了,权当兴趣吧:
select substr('JEFFREY',rownum, 1) Letter from dual
con
LETT
----
J
E
F
F
R
E
6 rows selected.
介绍这个例子也是为了引申出两个简单常用的技巧:
1) 构建大数据量的测试数据:
select rownum a from dual connect by rownum<=10000; --- 10000条顺序的纪录就轻松产生了。
或者: select level a from dual connect by level<=10000;
2) 结合case ... when优化query, 通过这种方法在列转行的时候,减少全表扫描的次数:
比如,我要把ID1,ID2,ID3合并起来,他们对应的CHARGE不变:
TYPE ID1 ID2 ID3 CHARGE
---------- ------------------------------ ------------------------------ ------------------------------ ----------
1 0000001 0000002 0000003 100
1 0000006 0 0000008 900
1 0000009 0 0 400
合并结果如下:
TYPE ID CHARGE
---------- ------------------------------ ----------
1 0000001 100
1 0000006 900
1 0000009 400
1 0000002 100
1 0000003 100
1 0000008 900
++++++++++++ test table ++++++++++++++++++
drop table test;
create table test
( type number,
id1 varchar2(10) default 0 not null,
id2 varchar2(10) default 0 not null,
id3 varchar2(10) default 0 not null,
charge number
);
insert into test values (1,'0000001','0000002','0000003',100);
insert into test values (1,'0000006',0,'0000008',900);
insert into test values (1,'0000009',0,0,400);
commit;
+++++++++++++++++++++++++++++++++++++++++++
常规方法可能是:
select type, id1 id, charge from test where id1<>0
union all
select type, id2 id, charge from test where id2<>0
union all
select type, id3 id, charge from test where id3<>0
;
但如果test纪录size很大,三次FTS肯定cost很高,所以我们必须考虑通过case .. when的方法减少FTS的次数:
select type,
case a when 1 then id1
when 2 then id2
when 3 then id3
end
id,charge
from test, (select level a from dual connect by level<=3) lvl
where case a when 1 then id1
when 2 then id2
when 3 then id3
end <>0;
从下面的执行计划看,只有一次FTS,而不是之前的三次。
-------------------- Execution Plan ----------------------------------------------
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 60 | 5 (0)|
| 1 | NESTED LOOPS | | 1 | 60 | 5 (0)|
| 2 | VIEW | | 1 | 13 | 2 (0)|
| 3 | CONNECT BY WITHOUT FILTERING| | | | |
| 4 | FAST DUAL | | 1 | | 2 (0)|
|* 5 | TABLE ACCESS FULL | TEST | 1 | 47 | 3 (0)|
----------------------------------------------------------------------------
-------------------------------------------------------------------------------------