为了考察sql学习效果,经理给我们出了一道sql题,在工作中确实已经用到的。待大家把答案都发给他后,他把我们的答案做了个汇总,并进行了讲评。我感觉从中有不小的收获。
题目是这样的:
wfm_taskexec、wfm_tasklog(两个表的结构一样,wfm_tasklog是wfm_taskexec的完成记录表)
需求,同时满足以下四个条件:
1、需要将wfm_tasklog中的部分记录导回到wfm_taskexec
2、wfm_tasklog中obj_id对应的节目状态acm_program.state=100(表示已发布),该节目对应的wfm_tasklog不导回
3、同一节目只回写end_time最新的一条记录
4、wfm_tasklog表中obj_id对应的记录数>5
我们的答案
2.1
select a.* from wfm_tasklog a
left join acm_program t on a.obj_id = t.id
where t.state <> 100
and a.end_time >= ALL (select c.end_time from wfm_tasklog c where c.obj_id = a.obj_id)
and a.obj_id in
(select b.obj_id from wfm_tasklog b group by b.obj_id having count(b.obj_id) > 5)
2.2
select a.* from wfm_tasklog a
left join acm_program t on a.obj_id = t.id
where t.state <> 100
and a.end_time = (select max(c.end_time) from wfm_tasklog c where c.obj_id = a.obj_id)
and a.obj_id in
(select b.obj_id from wfm_tasklog b group by b.obj_id having count(b.obj_id) > 5)
理解了max函数、ALL函数、group by、 having count子句、in、联表查询的用法
1、left join acm_program t on a.obj_id = t.id此处必须用inner join,如果存在一个节目被删除,但工作流表记录未删除(早期媒资的节目管理就是这样),通过a.obj_id=t.id左连接,会查出一定数量t.state为null的记录(无法通过t.state<>100等其他条件过滤掉)
2、取出最新end_time及同一个obj_id对应的记录数>5时,用了两条子查询,可以直接用一个,性能方面应该会有提升
2.3
insert into wfm_taskexec
(select wt.* from wfm_tasklog wt,
(select wt2.obj_id, max(wt2.end_time) maxTime from wfm_tasklog wt2
where wt2.obj_id not in
(select distinct (wt1.obj_id) from wfm_tasklog wt1, acm_program ap
where wt1.obj_id = ap.id and ap.state = 100)
group by wt2.obj_id having count(wt2.obj_id) > 5) temp
where wt.obj_id = temp.obj_id and wt.end_time = temp.maxTime)
要考虑的点都考虑到了,插入表的字段与取出的字段要对应
1、功能方面有点小问题:如果一个节目被删除,但工作流表记录未删除(早期媒资的节目管理就是这样),语句里的not in后,就会取出wfm_tasklog中有记录其对应的节目早已删除
2、性能:尽量不要用not in
2.4
insert into wfm_task
select t.* from wfm_tasklog t
inner join acm_program a on a.id = t.obj_id
where a.state <> 100
and t.end_time = (select max(end_time) from wfm_tasklog l where l.obj_id = t.obj_id)
and (select count(0) from wfm_tasklog l where l.obj_id = t.obj_id) > 5
另一种实现方式,不通过group by方式取出次数大于5的obj_id
1、性能:取出最新end_time及同一个obj_id对应的记录数>5时,用了两条子查询
2、性能:相比2.2,此处取次数>5时的子句要被执行N次,2.2只用执行一次
2.5
insert into wfm_taskexec
(select * from wfm_tasklog wtl
inner join acm_program ap on (wtl.obj_id = ap.id)
where ap.state <> 100
and wtl.end_time in (select max(end_time) from wfm_tasklog wtl group by wtl.obj_id)
and wtl.obj_id in (select id from wfm_tasklog wtl group by id having count(id) > 5))
1、该语句只要一执行就会报错,查出的字段是wfm_tasklog及acm_program的合集,向wfm_taskexec中插入,有没有做测试?
2、功能问题:如果一个obj_id对应的wfm_tasklog记录的某个end_time(不是此obj_id的end_time最新时间)碰巧与别的obj_id最新end_time相同,也会插入
2.6
insert into wfm_taskexec
select *
from (select t.task_id,
t.procexec_id,
t.procact_id,
t.act_id,
t.obj_type,
t.obj_id,
t.execute_user,
t.create_user,
t.task_state,
t.task_percent,
t.task_param,
t.task_desc,
t.back_times,
t.ext_sysid,
t.ext_state,
t.create_time,
t.start_time,
t.end_time,
t.field_1,
rank() over(partition by t.obj_id order by t.end_time desc) as field_2
from wfm_tasklog t
where t.obj_id in
(select t.obj_id tnum from acm_program p, wfm_tasklog t
where p.id = t.obj_id
and p.state != 100
group by t.obj_id
having count(t.obj_id) > 5))
where field_2 = 1;
通过rank() over (partition)的方式取最新的end_time,另一种实现方式,不错
1、在取多个最新的情况下用rank()方式(如导回同一个ID的最新3个end_time),是max、min等函数无法比拟的,但此处功能实现上,max肯定要优于rank(),而且不需要写全所有的查询字段
2、rank()是应该oracle独有的函数,该处是没法跨数据库
2.7
insert into wfm_taskexec
(select * from wfm_tasklog wt
where wt.obj_id in (
(select ap.id from acm_program ap where ap.state <> 100) union
(select obj_id from (select wt.obj_id, count(wt.obj_id) as c
from wfm_tasklog wt group by wt.obj_id) where c > 5) union
(select obj_id from (select * from wfm_tasklog wt where rownum = 1
order by wt.end_time desc))))
想通过union来取得符合三个条件的obj_id交集
1、不会having count字句的用法
2、在取最新end_time时间的子句是不是有问题?只能取出一个obj_id吧
3、最外层的in语句,符合的obj_id对应的所有记录都会导回到wfm_taskexec表,没有达到实现只取obj_id对应的最新end_time
2.8
insert into wfm_taskexec
(select * from wfm_tasklog
where obj_id in
(select obj_id from wfm_tasklog
group by obj_id having count(obj_id) > 5 and end_time = max(end_time))
and acm_program <> 100);
1、max在此处的用法不对
2、最外层的in语句,符合的obj_id对应的所有记录都会导回到wfm_taskexec表
2.9
insert into wfm_taskexec *
select *
from wfm_tasklog l
where l.acm_program.state <> 100
and l.end_time = (select max(l.end_time) from l group by id)
group by l.obj_id
having count(l.obj_id) > 5;
l.acm_program.state是什么东东??
2.10
insert into wfm_taskexec
select * from wfm_tasklog tg
inner join acm_program pm on tg.obj_id = pm.id
where pm.state <> 100
and NOT EXISTS
(SELECT 1 FROM wfm_tasklog tg1 WHERE tg1.obj_id = tg.obj_id
and tg1.end_time > tg.end_time)
and tg.obj_id in
(select obj_id from wfm_tasklog group by obj_id having count(0) > 5)
通过not exists方式来取某个obj_id的最新end_time,挺新颖的
1、插入与查询的字段不匹配
2、性能问题
2.11
select task_id
from (select task_id, obj_id
from (select task_id, obj_id,
row_number() over(partition by obj_id order by end_time desc) mm
from wfm_tasklog ww)
where mm = 1) t1,
(select obj_id from (select w.obj_id, count(w.obj_id) counts
from wfm_tasklog w, acm_program a
where w.obj_id = a.id and a.state <> 100
group by w.obj_id) t
where t.counts > 5) t2
where t1.obj_id = t2.obj_id;
这SQL哥看着很累,真的
2.12
insert into wfm_taskexec tc
select * from wfm_tasklog ta,
(select wt.obj_id wid, max(wt.end_time) wtime
from wfm_tasklog wt, acm_program ap
where wt.obj_id = ap.id and ap.state != 100
group by wt.obj_id having count(wt.obj_id) > 5) wa
where ta.obj_id = wa.wid and wa.wtime = ta.end_time;
除了插入与查询的字段不匹配之外,其余的跟哥写的一模一样,有借鉴么?
就我个人而言,对编写sql有以下心得: