现有字符串如下“000000-00000h-010000-C0000b-000000-000000-000000-d0000d-0002e0-00003a”
要求:
1、全为0的子串不替换
2、把由0 开始的子串第一个“0”替换为“A”
替换后结果如下
“000000-A0000h-A10000-C0000b-000000-000000-000000-d0000d-A002e0-A0003a”
先看语句,然后解释
WITH tab AS
(SELECT '000000-00000h-010000-C0000b-000000-000000-000000-d0000d-0002e0-00003a' col FROM dual
UNION ALL
SELECT '00000h-010000-C0000b-000000-000000-000000-d0000d-0002e0-00003a' col FROM dual)
SELECT ltrim(regexp_replace(col, '(^0|-0)(0{0,}[^0-]{1,}[^-]{0,})', '-A\2'), '-') AS col
FROM tab;
COL
--------------------------------------------------------------------------------
000000-A0000h-A10000-C0000b-000000-000000-000000-d0000d-A002e0-A0003a
A0000h-A10000-C0000b-000000-000000-000000-d0000d-A002e0-A0003a
为了便于理解。我们来分拆下。首先分把各子串均转为一行显示,然后把各正则分组分列显示。
WITH tab0 AS
(SELECT '000000-00000h-010000-C0000b-000000-000000-000000-d0000d-0002e0-00003a' col
FROM dual),
tab AS
(SELECT regexp_substr(col, '(^0|-)[^-]+', 1, LEVEL) AS col
FROM tab0
CONNECT BY LEVEL <= regexp_count(col, '-') + 1)
SELECT col,
ltrim(regexp_replace(col, '(^0|-0)(0{0,}[^0-]{1,}[^-]{0,})', '-A\2'), '-') AS col0,
regexp_replace(col, '(^0|-0)(0{0,})([^0-]{1,})([^-]{0,})', '\1') AS col1,
regexp_replace(col, '(^0|-0)(0{0,})([^0-]{1,})([^-]{0,})', '\2') AS col2,
regexp_replace(col, '(^0|-0)(0{0,})([^0-]{1,})([^-]{0,})', '\3') AS col3,
regexp_replace(col, '(^0|-0)(0{0,})([^0-]{1,})([^-]{0,})', '\4') AS col4
FROM tab;
结果如下
COL |
COL0 |
COL1 |
COL2 |
COL3 |
COL4 |
000000 |
000000 |
000000 |
000000 |
000000 |
000000 |
-00000h |
A0000h |
-0 |
0000 |
h |
|
-010000 |
A10000 |
-0 |
|
1 |
0000 |
-C0000b |
C0000b |
-C0000b |
-C0000b |
-C0000b |
-C0000b |
-000000 |
000000 |
-000000 |
-000000 |
-000000 |
-000000 |
-000000 |
000000 |
-000000 |
-000000 |
-000000 |
-000000 |
-000000 |
000000 |
-000000 |
-000000 |
-000000 |
-000000 |
-d0000d |
d0000d |
-d0000d |
-d0000d |
-d0000d |
-d0000d |
-0002e0 |
A002e0 |
-0 |
00 |
2e |
0 |
-00003a |
A0003a |
-0 |
000 |
3a |
|
正则表达式分为四组
第一组:(^0|-0) 开头为0或-0,这一组限制了只能匹配以0开头的子串,第4、8行不符合这个标准
第二组:0{0,} 0或多个0,这一组可为空值,如第3行。
第三组:[^0-]{1,} 中间至少有一个非0 且非分割符的字符,这一组限制了子串不能都是0。第1、5、6、7都不符合这个标准
第四组:[^-]{0,} 串中不能有分隔符,这一组可为空值,如2、10行
通过上图可以看到能匹配的只有2、3、9、10四行。
按上面分为四个组的方式,正则表达式也可以写为
regexp_replace(col, '(^0|-0)(0{0,})([^0-]{1,})([^-]{0,})', '-A\2\3\4')
而“-A\2\3\4”如果用上面的结果来模拟就是“'-A' || col2 || col3 || col4”
WITH tab0 AS
(SELECT '000000-00000h-010000-C0000b-000000-000000-000000-d0000d-0002e0-00003a' col
FROM dual),
tab AS
(SELECT regexp_substr(col, '(^0|-)[^-]+', 1, LEVEL) AS col
FROM tab0
CONNECT BY LEVEL <= regexp_count(col, '-') + 1),
tab1 AS
(SELECT col,
ltrim(regexp_replace(col, '(^0|-0)(0{0,})([^0-]{1,})([^-]{0,})', '-A\2\3\4'), '-') AS col0,
regexp_replace(col, '(^0|-0)(0{0,})([^0-]{1,})([^-]{0,})', '\1') AS col1,
regexp_replace(col, '(^0|-0)(0{0,})([^0-]{1,})([^-]{0,})', '\2') AS col2,
regexp_replace(col, '(^0|-0)(0{0,})([^0-]{1,})([^-]{0,})', '\3') AS col3,
regexp_replace(col, '(^0|-0)(0{0,})([^0-]{1,})([^-]{0,})', '\4') AS col4
FROM tab)
SELECT col, col0, ltrim('-A' || col2 || col3 || col4, '-') AS "-A\2\3\4"
FROM tab1
COL |
COL0 |
-A\2\3\4 |
000000 |
000000 |
A000000000000000000 |
-00000h |
A0000h |
A0000h |
-010000 |
A10000 |
A10000 |
-C0000b |
C0000b |
A-C0000b-C0000b-C0000b |
-000000 |
000000 |
A-000000-000000-000000 |
-000000 |
000000 |
A-000000-000000-000000 |
-000000 |
000000 |
A-000000-000000-000000 |
-d0000d |
d0000d |
A-d0000d-d0000d-d0000d |
-0002e0 |
A002e0 |
A002e0 |
-00003a |
A0003a |
A0003a |