Hive中有很多字符串相关的函数,其中有两个与正则表达式相关的比较特殊,近期使用的时候做了较多的测试,做个笔记,鼓励一下自己,每天进步一点点。
正则替换是常用的字符串替换函数
#执行语句1
hive> select regexp_replace('abcdefg','abc','ABA') as res;
#执行结果1
res
ABAdefg
Time taken: 0.041 seconds, Fetched: 1 row(s)
#执行语句2
hive> select regexp_replace('abcdefg','[^aceg]','x') as res;
#执行结果2
res
axcxexg
Time taken: 0.028 seconds, Fetched: 1 row(s)
先了解正则中捕获分组的概念,其实就是一个括号内的内容,如 “(\d)\d” 而"(\d)" 这就是一个捕获分组
#执行语句1
hive> select regexp_extract('abcdefg','a(b)(.*?)(e)',0) as res;
#执行结果1
res
abcde
Time taken: 0.035 seconds, Fetched: 1 row(s)
#执行语句2
hive> select regexp_extract('abcdefg','a(b)(.*?)(e)',1) as res;
#执行结果2
res
b
Time taken: 0.032 seconds, Fetched: 1 row(s)
#执行语句3
hive> select regexp_extract('abcdefg','a(b)(.*?)(e)',2) as res;
#执行结果3
res
cd
Time taken: 0.028 seconds, Fetched: 1 row(s)
#执行语句4
hive> select regexp_extract('abcdefg','a(b)(.*?)(e)',3) as res;
#执行结果4
res
e
Time taken: 0.028 seconds, Fetched: 1 row(s)
#执行语句5
hive> select regexp_extract('abcdefg','a(b)(.*?)(e)',4) as res;
#执行结果5
FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments '4': org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public java.lang.String org.apache.hadoop.hive.ql.udf.UDFRegExpExtract.evaluate(java.lang.String,java.lang.String,java.lang.Integer) on object org.apache.hadoop.hive.ql.udf.UDFRegExpExtract@571d0925 of class org.apache.hadoop.hive.ql.udf.UDFRegExpExtract with arguments {abcdefg:java.lang.String, a(b)(.*?)(e):java.lang.String, 4:java.lang.Integer} of size 3
#执行语句6
hive> select regexp_extract('abcdefg','a(b)(.*?)(e)',-1) as res;
#执行结果6
FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments '1': org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public java.lang.String org.apache.hadoop.hive.ql.udf.UDFRegExpExtract.evaluate(java.lang.String,java.lang.String,java.lang.Integer) on object org.apache.hadoop.hive.ql.udf.UDFRegExpExtract@14201a90 of class org.apache.hadoop.hive.ql.udf.UDFRegExpExtract with arguments {abcdefg:java.lang.String, a(b)(.*?)(e):java.lang.String, -1:java.lang.Integer} of size 3
5.补充说明:如果 subject为空,或者pattern为空,或者pattern匹配不到字符串,则返回值为空
#1
hive> select regexp_extract('','ab',1) as res;
OK
res
Time taken: 0.044 seconds, Fetched: 1 row(s)
#2
hive> select regexp_extract('abcdefg','',0) as res;
OK
res
Time taken: 0.031 seconds, Fetched: 1 row(s)
#3
hive> select regexp_extract('abcdefg','a(bb)(.*?)(e)',0) as res;
OK
res
Time taken: 0.029 seconds, Fetched: 1 row(s)