工作中一道sql题目的收获

为了考察sql学习效果,经理给我们出了一道sql题,在工作中确实已经用到的。待大家把答案都发给他后,他把我们的答案做了个汇总,并进行了讲评。我感觉从中有不小的收获。

 

题目是这样的:

wfm_taskexecwfm_tasklog(两个表的结构一样,wfm_tasklogwfm_taskexec的完成记录表)

需求,同时满足以下四个条件:

1、需要将wfm_tasklog中的部分记录导回到wfm_taskexec

2wfm_tasklogobj_id对应的节目状态acm_program.state=100(表示已发布),该节目对应的wfm_tasklog不导回

3、同一节目只回写end_time最新的一条记录

4wfm_tasklog表中obj_id对应的记录数>5

 

 

我们的答案

2.1

select a.* from wfm_tasklog a

  left join acm_program t on a.obj_id = t.id

 where t.state <> 100

   and a.end_time >= ALL (select c.end_time from wfm_tasklog c where c.obj_id = a.obj_id)

   and a.obj_id in

          (select b.obj_id from wfm_tasklog b group by b.obj_id having count(b.obj_id) > 5)

 

2.2

select a.* from wfm_tasklog a

  left join acm_program t on a.obj_id = t.id

 where t.state <> 100

   and a.end_time = (select max(c.end_time) from wfm_tasklog c where c.obj_id = a.obj_id)

   and a.obj_id in

(select b.obj_id from wfm_tasklog b group by b.obj_id having count(b.obj_id) > 5)

 

理解了max函数、ALL函数、group by having count子句、in、联表查询的用法

1left join acm_program t on a.obj_id = t.id此处必须用inner join,如果存在一个节目被删除,但工作流表记录未删除(早期媒资的节目管理就是这样),通过a.obj_id=t.id左连接,会查出一定数量t.statenull的记录(无法通过t.state<>100等其他条件过滤掉)

2、取出最新end_time及同一个obj_id对应的记录数>5时,用了两条子查询,可以直接用一个,性能方面应该会有提升

 

2.3

insert into wfm_taskexec

  (select wt.*  from wfm_tasklog wt,

          (select wt2.obj_id, max(wt2.end_time) maxTime from wfm_tasklog wt2

            where wt2.obj_id not in

                  (select distinct (wt1.obj_id) from wfm_tasklog wt1, acm_program ap

                    where wt1.obj_id = ap.id and ap.state = 100)

            group by wt2.obj_id having count(wt2.obj_id) > 5) temp

    where wt.obj_id = temp.obj_id and wt.end_time = temp.maxTime)

 

要考虑的点都考虑到了,插入表的字段与取出的字段要对应

1、功能方面有点小问题:如果一个节目被删除,但工作流表记录未删除(早期媒资的节目管理就是这样),语句里的not in后,就会取出wfm_tasklog中有记录其对应的节目早已删除

2、性能:尽量不要用not in

 

2.4

insert into wfm_task

  select t.*  from wfm_tasklog t

   inner join acm_program a on a.id = t.obj_id

   where a.state <> 100

     and t.end_time = (select max(end_time) from wfm_tasklog l where l.obj_id = t.obj_id)

     and (select count(0) from wfm_tasklog l where l.obj_id = t.obj_id) > 5

另一种实现方式,不通过group by方式取出次数大于5obj_id

1、性能:取出最新end_time及同一个obj_id对应的记录数>5时,用了两条子查询

2、性能:相比2.2,此处取次数>5时的子句要被执行N次,2.2只用执行一次

 

2.5

insert into wfm_taskexec

  (select * from wfm_tasklog wtl

    inner join acm_program ap on (wtl.obj_id = ap.id)

    where ap.state <> 100

      and wtl.end_time in (select max(end_time) from wfm_tasklog wtl group by wtl.obj_id)

      and wtl.obj_id in (select id from wfm_tasklog wtl group by id having count(id) > 5))

1、该语句只要一执行就会报错,查出的字段是wfm_tasklogacm_program的合集,向wfm_taskexec中插入,有没有做测试?

2、功能问题:如果一个obj_id对应的wfm_tasklog记录的某个end_time(不是此obj_idend_time最新时间)碰巧与别的obj_id最新end_time相同,也会插入

 

2.6

insert into wfm_taskexec

  select *

    from (select t.task_id,

                 t.procexec_id,

                 t.procact_id,

                 t.act_id,

                 t.obj_type,

                 t.obj_id,

                 t.execute_user,

                 t.create_user,

                 t.task_state,

                 t.task_percent,

                 t.task_param,

                 t.task_desc,

                 t.back_times,

                 t.ext_sysid,

                 t.ext_state,

                 t.create_time,

                 t.start_time,

                 t.end_time,

                 t.field_1,

                 rank() over(partition by t.obj_id order by t.end_time desc) as field_2

            from wfm_tasklog t

           where t.obj_id in

                            (select t.obj_id tnum from acm_program p, wfm_tasklog t

                               where p.id = t.obj_id

                                 and p.state != 100

                               group by t.obj_id

                              having count(t.obj_id) > 5))

   where field_2 = 1;

通过rank() over (partition)的方式取最新的end_time,另一种实现方式,不错

1、在取多个最新的情况下用rank()方式(如导回同一个ID的最新3end_time),是maxmin等函数无法比拟的,但此处功能实现上,max肯定要优于rank(),而且不需要写全所有的查询字段

2rank()是应该oracle独有的函数,该处是没法跨数据库

 

2.7

insert into wfm_taskexec

  (select * from wfm_tasklog wt

    where wt.obj_id in (

          (select ap.id from acm_program ap where ap.state <> 100) union

     (select obj_id from (select wt.obj_id, count(wt.obj_id) as c

                      from wfm_tasklog wt group by wt.obj_id) where c > 5) union

     (select obj_id from (select * from wfm_tasklog wt where rownum = 1

                                 order by wt.end_time desc))))

想通过union来取得符合三个条件的obj_id交集

1、不会having count字句的用法

2、在取最新end_time时间的子句是不是有问题?只能取出一个obj_id

3、最外层的in语句,符合的obj_id对应的所有记录都会导回到wfm_taskexec表,没有达到实现只取obj_id对应的最新end_time

 

2.8

insert into wfm_taskexec

  (select * from wfm_tasklog

    where obj_id in

          (select obj_id from wfm_tasklog

            group by obj_id having count(obj_id) > 5 and end_time = max(end_time))

      and acm_program <> 100);

1max在此处的用法不对

2、最外层的in语句,符合的obj_id对应的所有记录都会导回到wfm_taskexec

 

2.9

insert into wfm_taskexec *

  select *

    from wfm_tasklog l

   where l.acm_program.state <> 100

     and l.end_time = (select max(l.end_time) from l group by id)

   group by l.obj_id

  having count(l.obj_id) > 5;

l.acm_program.state是什么东东??

 

2.10

insert into wfm_taskexec

  select * from wfm_tasklog tg

   inner join acm_program pm on tg.obj_id = pm.id

   where pm.state <> 100

     and NOT EXISTS

          (SELECT 1 FROM wfm_tasklog tg1 WHERE tg1.obj_id = tg.obj_id  

                     and tg1.end_time > tg.end_time)

     and tg.obj_id in

         (select obj_id from wfm_tasklog group by obj_id having count(0) > 5)

通过not exists方式来取某个obj_id的最新end_time,挺新颖的

1、插入与查询的字段不匹配

2、性能问题

 

2.11

select task_id

  from (select task_id, obj_id

    from (select task_id, obj_id,

            row_number() over(partition by obj_id order by end_time desc) mm

         from wfm_tasklog ww)

         where mm = 1) t1,

       (select obj_id from (select w.obj_id, count(w.obj_id) counts

                  from wfm_tasklog w, acm_program a

                 where w.obj_id = a.id and a.state <> 100

                 group by w.obj_id) t

         where t.counts > 5) t2

 where t1.obj_id = t2.obj_id;

SQL哥看着很累,真的

 

2.12

insert into wfm_taskexec tc

  select * from wfm_tasklog ta,

         (select wt.obj_id wid, max(wt.end_time) wtime

            from wfm_tasklog wt, acm_program ap

           where wt.obj_id = ap.id and ap.state != 100

           group by wt.obj_id having count(wt.obj_id) > 5) wa

   where ta.obj_id = wa.wid and wa.wtime = ta.end_time;

除了插入与查询的字段不匹配之外,其余的跟哥写的一模一样,有借鉴么?

 

 就我个人而言,对编写sql有以下心得:

1充分理解需求是很重要的,对自己要编写的这条sql要干什么事情有了精准的理解。就我本人而言,当我对需求有了深入的理解后,我很自然就升起了一个愿望——很想编写出sql来表达自己的理解是多么的精准;而这势必会加快这条sql的浮现。

2将各个需求点分解,各个击破。
每个小需求点本身应该不难,使用where条件或聚合函数等就可以解决了。

3组合
添加子查询或AND、OR等逻辑操作符将各个子句衔接起来。
4最后在保证正确性的前提下去优化效率。

 
可以在command下运行 explain plan for +待解释的sql
然后运行 select * from table (dbms_xplan.display)查询语句的执行计划,
其他收获:
1.可以对子查询的结果集起个名字作为临时表,参与到连接查询中。
2.对Exists的理解,所谓最大,就是不存在比它更大的。


 

你可能感兴趣的:(工作中一道sql题目的收获)