一个工作中遇到的ORACLE合并行,对合并结果排序的记录。本来简单的认为使用wmsys.wm_concat函数就行,结果出了一些小问题,解决起来还是挺折腾的,认真记录一下:
表eims_gwjk_dbcb_log部分记录如下:
SELECT * FROM eims_gwjk_dbcb_log t
注意选中的几行。现在想以WFID为准进行合并行的操作,一开始使用的是wmsys.wm_concat函数来进行合并行的操作。
SELECT t.wfid,
wmsys.wm_concat(dbcb) AS dbcb ,
wmsys.wm_concat(tacheid) AS tacheid,
wmsys.wm_concat(tacheno) AS tacheno,
wmsys.wm_concat(senduserid) AS senduserid
FROM eims_gwjk_dbcb_log t GROUP BY t.wfid;
下面是合并后的结果:
注意选中的行,和前面选中的四行对比。我的想法是合并行之后,合并行的字段都应该按照顺序排序。
之前选中的行:
wfid |
dbcb |
tacheid |
tacheno |
senduserid |
85652 |
CB |
0000003075 |
3 |
U75CB3 |
85652 |
CB |
0000003076 |
2 |
U76CB2 |
85652 |
DB |
0000003076 |
2 |
U76DB2 |
85652 |
DB |
0000003075 |
3 |
U75DB3 |
合并之后:
wfid |
dbcb |
tacheid |
tacheno |
senduserid |
85652 |
DB,DB,CB,CB |
0000003076,0000003076,0000003075,0000003075 |
2,3,3,2 |
U76DB2,U75CB3,U75DB3,U76CB2 |
通过senduserid字段可以看出来(其实这个字段的值是我更改了用来检测合并之后字段顺序用的,对应了前面几个字段的值),几个合并行的字段顺序完全乱了,并没有按照预想的结果排列。
即使排序后合并:
SELECT t.wfid,
wmsys.wm_concat(dbcb) AS dbcb ,
wmsys.wm_concat(tacheid) AS tacheid,
wmsys.wm_concat(tacheno) AS tacheno,
wmsys.wm_concat(senduserid) AS senduserid
FROM (SELECT * FROM eims_gwjk_dbcb_log ORDER BY wfid,dbcb,tacheid,tacheno,senduserid) t GROUP BY t.wfid;
顺序也不对。还是说我不会用wmsys.wm_concat这个函数如何排序合并字段,反正搜了很多都没有解决方法。
没办法用的另一种合并行的方法,首先看这个语句:
SELECT wfid,dbcb,tacheid,tacheno,senduserid,
--按照wfid分组 每组的senduserid进行排序
(row_number() over(partition BY wfid ORDER BY senduserid)) rn
FROM eims_gwjk_dbcb_log
同注释,按照wfid分组,每组的senduserid进行排序别名RN。之后根据WFID和RN用sys_connect_by_path函数对数据进行合并:
SELECT wfid,(substr(sys_connect_by_path(dbcb,','),2)) AS dbcb,
(substr(sys_connect_by_path(tacheid,','),2)) AS tacheid,
(substr(sys_connect_by_path(tacheno,','),2)) AS tacheno,
(substr(sys_connect_by_path(senduserid,','),2)) AS senduserid,
level AS curr_level
FROM(
SELECT wfid,dbcb,tacheid,tacheno,senduserid,
--按照wfid分组 每组的senduserid进行排序
(row_number() over(partition BY wfid ORDER BY senduserid)) rn
FROM eims_gwjk_dbcb_log
)
connect BY wfid = prior wfid AND rn-1 = prior rn;
数据的合并类似于递归合并,我们只需要curr_level最高的那行,即合并了所有相关数据的那行,之后的语句:
SELECT * FROM (
--我们只需要curr_level最高的那行,即是已经完全合并好数据的那行,之后排序取得curr_level最高的行
SELECT wfid,dbcb,tacheid,tacheno,senduserid,
row_number() over(partition BY wfid ORDER BY wfid,curr_level DESC) c_level
FROM (
--根据WFID和RN用sys_connect_by_path函数对数据进行合并
SELECT wfid,(substr(sys_connect_by_path(dbcb,','),2)) AS dbcb,
(substr(sys_connect_by_path(tacheid,','),2)) AS tacheid,
(substr(sys_connect_by_path(tacheno,','),2)) AS tacheno,
(substr(sys_connect_by_path(senduserid,','),2)) AS senduserid,
level AS curr_level
FROM(
SELECT wfid,dbcb,tacheid,tacheno,senduserid,
--按照wfid分组 每组的senduserid进行排序
(row_number() over(partition BY wfid ORDER BY senduserid)) rn
FROM eims_gwjk_dbcb_log
)
connect BY wfid = prior wfid AND rn-1 = prior rn
))
WHERE c_level = 1
对比数据排序正常! = =
本来个人认为很EASY的一个事情最后确搞得如此复杂,或者说其实还是有对wmsys.wm_concat的合并排序方法?起码我搜索半天也没找到,最后用的这个方法我感觉还是,以后数据多了性能上的问题(但MS sys_connect_by_path 函数性能其实不错?)。
顺带对使用的函数进行一个总结:
-
Start with...Connect By
Start with...Connect By子句递归查询一般用于一个表维护树形结构的应用。(即递归查询)
一个使用实例,创建示例表:
CREATE TABLE TBL_TEST
(ID NUMBER,
NAME VARCHAR2(100 BYTE),
PID NUMBER DEFAULT 0);
插入测试数据:
INSERT INTO TBL_TEST(ID,NAME,PID) VALUES('1','10','0');
INSERT INTO TBL_TEST(ID,NAME,PID) VALUES('2','11','1');
INSERT INTO TBL_TEST(ID,NAME,PID) VALUES('3','20','0');
INSERT INTO TBL_TEST(ID,NAME,PID) VALUES('4','12','1');
INSERT INTO TBL_TEST(ID,NAME,PID) VALUES('5','121','2');
从Root往树末梢递归 (取全部“子孙”集):
SELECT * FROM TBL_TEST
start WITH id=1
connect BY prior id = pid
ORDER BY pid
从末梢往树ROOT递归(取全部“父爷”集):
SELECT * FROM TBL_TEST
start WITH id=5
connect BY prior pid = id
ORDER BY id
-
Group By聚合语句
Group By聚合语句,必须配合相关的例如sum,max,min等函数使用:
SELECT * FROM eims_gwjk_dbcb_log t
WHERE t.wfid = 85652
计算wfid=85652的tacheno的总和:
SELECT wfid,sum(tacheno) FROM eims_gwjk_dbcb_log
WHERE wfid = 85652
GROUP BY wfid;
-
min max
即取得最大和最小值:
SELECT max(senduserid) FROM eims_gwjk_dbcb_log
--U76DB2
SELECT min(senduserid) FROM eims_gwjk_dbcb_log
--U123
-
三个"评价"函数
这三个分析函数都可以在各个分组内从1开始排序。
ROW_NUMBER()是没有重复值的,可以利用它实现分页显示。
DENSE_RANK() 是连续排序,有两个第二名时仍然跟着第三名。
RANK()是跳跃排序,有两个第二名时接下来就是第四名(同样是在各个分组内)。
实例:
SELECT tacheid,
(ROW_NUMBER() over(ORDER BY tacheid)) rn
FROM eims_gwjk_dbcb_log t
tacheid rn
0000001864 1
0000001864 2
0000001889 3
0000001889 4
0000001911 5
0000001911 6
0000001911 7
0000001911 8
0000001918 9
0000001930 10
0000002896 11
SELECT tacheid,
(DENSE_RANK() over(ORDER BY tacheid)) rn
FROM eims_gwjk_dbcb_log t
tacheid rn
0000001864 1
0000001864 1
0000001889 2
0000001889 2
0000001911 3
0000001911 3
0000001911 3
0000001911 3
0000001918 4
0000001930 5
0000002896 6
SELECT tacheid,
(RANK() over(ORDER BY tacheid)) rn
FROM eims_gwjk_dbcb_log t
tacheid rn
0000001864 1
0000001864 1
0000001889 3
0000001889 3
0000001911 5
0000001911 5
0000001911 5
0000001911 5
0000001918 9
0000001930 10
0000002896 11
更多的使用例如:row_number() over ([partition by col1] order by col2) ) as 别名
表示根据col1分组,在分组内部根据col2排序,而这个“别名”的值就表示每组内部排序后的顺序编号(组内连续的唯一的), [partition by col1] 可省略。
SELECT dbcb,tacheid,
(RANK() over(partition BY dbcb ORDER BY tacheid)) rn
FROM eims_gwjk_dbcb_log t
先按照dbcb分组,后按照tacheid排序(即排序)即每个组内自己排序,而不是整个排序。
-
level
LEVEL查询数据所对应的级,level仅仅用于在对表执行层次树遍历的select语句中,个人理解遍历的层次越多,级别越大。
SELECT wfid,(substr(sys_connect_by_path(dbcb,','),2)) AS dbcb,
(substr(sys_connect_by_path(tacheid,','),2)) AS tacheid,
(substr(sys_connect_by_path(tacheno,','),2)) AS tacheno,
(substr(sys_connect_by_path(senduserid,','),2)) AS senduserid,
level AS curr_level
FROM(
SELECT wfid,dbcb,tacheid,tacheno,senduserid,
(row_number() over(partition BY wfid ORDER BY senduserid)) rn
FROM eims_gwjk_dbcb_log
)
connect BY wfid = prior wfid AND rn-1 = prior rn
-
sys_connect_by_path
sys_connect_by_path函数主要作用是可以把一个父节点下的所有子节点通过某个字符进行区分,然后连接在一个列中显示。这个函数使用之前必须先建立一个树,例如:
SELECT level ,sys_connect_by_path( id , '/' )
FROM test
start WITH pid= 1
connect BY prior pid = id ;