SQL查询语句编写规范

  • 目的
    数据技术团队,每个成员经常要编写SQL来查询和统计数据,数据源有Hive和DB等,每个人编写习惯或多或少有一些差异,这些差异并不一定会影响结果,但是团队中的其他人阅读代码时会产生困扰,尤其是一些复杂的多表关联查询SQL,更是让很多代码阅读者困惑,因此有必要在团队内部制定SQL编写规范,在SQL代码书写上达成一致,提高可视化和良好的阅读体验。

  • 编写规范
    A、字段别名尽量不要使用拼音,要使用比较贴切的英文单词组合,做到文词达意;
    B、字段别名尽量使用下划线分割多个单词;

例如:login_cnt(登录次数)、login_users(登录人数)等;

C、字段别名和子查询别名时不要省略as关键字;

例如: 
count(distinct user_id) as login_users
count(user_id) as login_cnt

D、SQL语句编写使用全小写字符,除非有个别大小写敏感的字段名称;
E、SQL语句中如遇from、where、on、left join、inner join、right join、group by、order by等关键字,单独占一行;

例如:
select user_id,user_name,gender 
from user
where reg_time>='2017-05-01' and reg_time<'2017-06-01'

F、避免在单独的SQL查询语句中写表别名;

反例:
select u.user_id,u.user_name,u.gender 
from user as u
where u.reg_time>='2017-05-01' and u.reg_time<'2017-06-01'
在这里给user 起别名为u,没有任何意义,徒增代码量和阅读障碍

G、时间范围作为查询条件时,请使用>=和<作为时间范围,不要使用>=和<=这样的写法;

例如:
查询5月份登录用户数
select count(distinct user_id) as login_users
from user_login
where login_time>='2017-05-01' and login_time<'2017-06-01'
反例:
select count(distinct user_id) as login_users
from user_login
where login_time>='2017-05-01' and login_time<='2017-05-31 23:59:59'
这样写法会因为login_time时间精度的不同导致漏掉一些数据;且书写时比较复杂;

H、子查询别名尽量使用it (inner table)、t1 (temp table 1) 和rs1(record result 1)命名;

例如:
select min_date,count(mobile) as mobiles
from
(
    select mobile,min(vdate) as min_date
    from
    (
        select vdate,mobile
        from
        (
            select vdate,split(mobile,',') as mobiles
            from s_jd_picked
        ) as it1
        lateral view outer explode(mobiles) explode_table1 as mobile
    ) as t1
    where length(mobile)=11
    group by mobile
) as rs
where min_date>='2017-04-01' and min_date<'2017-05-01'
group by min_date
order by min_date

I、SQL查询语句要有层次结构,使用4个空格缩进,在Hive中制表符不能作为缩进;
J、单行字符长度不要超过80个字符,如果超过请换行,提高代码阅读性;

  • 下面提供一个较完整的例子:
例如:
select date(schedule_date) as schedule_date,
sum(cast(onBoardCount as float)) as first_billing_money, 
sum(cast(secondValue as float)) as second_billing_money
from
(
    select t1.project_id,t1.schedule_id,
    case when t2.schedule_date is null then created_time else schedule_date end as schedule_date,
    t1.mobile,interview_status
    from
    (
        select mobile as mobile,projectId as project_id,
        cast(projectScheduleId as int) as schedule_id,created_time,
        cast(interviewerStatus as int) as interview_status
        from mf_project_track
        where created_time>='2017-01-01' 
    ) as t1
    left join
    (
        select projectId as project_id,id as schedule_id,
        to_timestamp((executionDate) as schedule_date
        from mf_project_schedule
    ) as t2
    on t1.project_id=t2.project_id and t1.schedule_id=t2.schedule_id
) as rs1
inner join 
mf_project as rs2
on cast(rs1.project_id as int)=rs2.id
where schedule_date>='2017-04-01'
group by date(schedule_date)

你可能感兴趣的:(SQL查询语句编写规范)