取文本长度: char_length
去除非中文字符:regexp_replace(session_data, ‘[\u4e00-\u9fa5.^!]’, ‘’)as BODY
模糊正则匹配: BODY rlike ‘(?
提取正则匹配片段: regexp_extract(BODY,‘(比)(.?装修.?)(好)’,0)
日期转化: yy-mm-dd 00:00:00 转化为 yy-mm-dd:to_date(add_ts) datekey是转为yymmdd
动态日期: $now.delta(days=8).date (yy-mm-dd)不加date是yy-mm-dd 00:00:00 的格式
多个相同id字段组合concat: concat_ws(‘,’, collect_set(message)) AS messages … group by
删除中文文本/非中文文本: regexp_replace(business_hours, ‘[\u4e00-\u9fa5]|[\n]|[:]|[,]’, ‘’) AS times / regexp_replace(session_data, ‘[\u4e00-\u9fa5.^!]’, ‘’) AS BODY
提取指定字符串:substr(text , 7, 5), 从第7开始长度为5
多重if嵌套可考虑使用case:case when char_length(regexp_replace(business_hours, ‘[0-9]’, ‘’)) = 8 then substr(regexp_replace(business_hours, ‘[0-9]’, ‘’) , 0, 4)
when business_hours rlike '全天|24小时' then '0000'
end as begain_time,
模糊查找: business_hours rlike ‘(.?至.?)(休息)’ then ‘00:00’
获取jeson脚本: if(get_json_object(tag_result, ‘ . d e t a i l s . 地址不一 致 ′ ) ! = ′ N U L L ′ , c a s t ( g e t j s o n o b j e c t ( t a g r e s u l t , ′ .details.地址不一致') != 'NULL', cast(get_json_object(tag_result, ' .details.地址不一致′)!=′NULL′,cast(getjsonobject(tagresult,′.details.地址不一致’) AS float), 0) AS