python解析sql语句,基于python分析业务SQL常用查询字段

背景

由于业务灵活的查询需求,会演变成各种各样的SQL语句,但是对于目前的SQL语句并没有做到分文别类的归纳整理。不能很好的把握业务脉络,理解业务需求。导致cube频繁修改,加大cube开发人员的工作量。从而导致了cube使用的体验欠佳。基于以上的业务背景,所以想收集到某段时间业务查询某个cube的所有SQL,进行分析整理,统计排序,直观且具体的了解到业务常用字段

工具

python+excel+influxdb

实现逻辑

收集某段时间查询某个cube的所有SQL

目前比较快速的方法就是通过KM自带的influxdb实现数据的导出,可以规定导出某个cube的某段时间的查询SQL.保存到一个文件里。不限文件格式,获取命令

influxdb -host ip -port port -database database -execute “SQL content” -format ‘format’

使用脚本读取

解析规则:只需要截取出select from之间的字段,当然这个解析规则同样适用于带子查询的SQL, 为什么没有 where 和group by的字段?因为在绝大数情况下,过滤和分组字段都包含到select 和from之间。不解析别名,因为不同的查询会对应不同的别名。直接定位到表名+列名,这样对cube优化会起到一个参考指,指导哪些是常用维度,在下一个cube版本迭代开发,或者修改时,对于聚合组中的联合维度和rowkey设计的选择给到数据支持。

解些逻辑如下:

import re

import os

import csv

import operator

def sqlparse():

filePath = 'c://new//'

filenames = os.listdir(filePath)

for name in filenames:

f = open(filePath+name, 'r',encoding='UTF-8')

fw=open(filePath+name+".csv",'w',encoding='UTF-8',newline='')

csv_writer = csv.writer(fw)

buff = f.read()

buff = buff.replace('\n','').replace('\t','').replace('"','').replace("'",'')

pat = re.compile(r"\w+\.+\w+\w")

result = pat.findall(buff)

for item in result:

list=item.split(',')

for col in list:

context=col.split(' AS ')[0].replace(' ','')

csv_writer.writerow([context])

fw.close()

f.close()

f = open(filePath+name+".csv", 'r',encoding='UTF-8')

print(f)

count_dict = {}

for line in f.readlines():

print(line)

line = line.strip()

count = count_dict.setdefault(line, 0)

count += 1

count_dict[line] = count

sorted_count_dict = sorted(count_dict.items(), key=operator.itemgetter(1), reverse=True)

fw = open(filePath+name+".csv", 'w', encoding='UTF-8', newline='')

csv_writer = csv.writer(fw)

csv_writer.writerow(["name", "count"])

for item in sorted_count_dict:

csv_writer.writerow([item[0], item[1]])

fw.close()

if __name__ == '__main__':

sqlparse()

测试SQL

select * from Student RIGHT JOIN (

select t1.SId, class1, class2 from

(select SId, score as class1 from sc where sc.CId = '01')as t1,

(select SId, score as class2 from sc where sc.CId = '02')as t2

where t1.SId = t2.SId AND t1.class1 > t2.class2

)r

on Student.SId = r.SId;

select * from (

select t1.SId, class1, class2

from

(SELECT SId, score as class1 FROM sc WHERE sc.CId = '01') AS t1,

(SELECT SId, score as class2 FROM sc WHERE sc.CId = '02') AS t2

where t1.SId = t2.SId and t1.class1 > t2.class2

) r

LEFT JOIN Student

ON Student.SId = r.SId;

select * from

(select * from sc where sc.CId = '01') as t1,

(select * from sc where sc.CId = '02') as t2

where t1.SId = t2.SId;

select * from

(select * from sc where sc.CId = '01') as t1

left join

(select * from sc where sc.CId = '02') as t2

on t1.SId = t2.SId;

select * from

(select * from sc where sc.CId = '02') as t2

right join

(select * from sc where sc.CId = '01') as t1

on t1.SId = t2.SId;

select * from sc

where sc.SId not in (

select SId from sc

where sc.CId = '01'

)

AND sc.CId= '02';

select s.sid, s.sname,r.coursenumber,r.scoresum

from (

(select student.sid,student.sname

from student

)s

left join

(select

sc.sid, sum(sc.score) as scoresum, count(sc.cid) as coursenumber

from sc

group by sc.sid

)r

on s.sid = r.sid

);

select * from student

where student.sid in (

select sc.sid from sc

where sc.cid in(

select sc.cid from sc

where sc.sid = '01'

)

);

select

sc.CId ,

max(sc.score) as 最高分,

min(sc.score) as 最低分,

AVG(sc.as) as 平均分,

count(*) as 选修人数,

sum(case when sc.score>=60 then 1 else 0 end )/count(*) as 及格率,

sum(case when sc.score>=70 and sc.score<80 then 1 else 0 end )/count(*) as 中等率,

sum(case when sc.score>=80 and sc.score<90 then 1 else 0 end )/count(*) as 优良率,

sum(case when sc.score>=90 then 1 else 0 end )/count(*) as 优秀率

from sc

GROUP BY sc.CId

ORDER BY count(*)DESC, sc.CId ASC

set @crank=0;

select q.sid, total, @crank := @crank +1 as rank from(

select sc.sid, sum(sc.score) as total from sc

group by sc.sid

order by total desc)q;

说明

SQL必须符合SQL语法标准格式

分析结果

1.会生成一个csv文件 如下

解析字段及数量

2.以上是按照倒叙排列,如果想直观看出,可以借助excel的绘图功能美化。

这里由于图表格式限制,并没有完全列出来。

绘图

从结果可以分析出什么

-‘*’多达13个,由此可以知这是一个明细查询次数较多的场景

-sid次之,但是从两个表出,可以将出字段的表统一使用。

-class1和class2数量一致,含义相同,可以设置为联合维度组。

以上就是针对测试SQL做了个常用查询字段分析。

你可能感兴趣的:(python解析sql语句)