1 .发现select结果出现空行,如:(A union all B )left join C 。可能是A和B 不需要union 删除一个即可
2 .发现select出现的行,重复问题。如 A leftjoin (select * from B left join max()… from B ) t 。可能是B做了分表,导致max()函数重复出了好几个,所以修改括号内函数为开窗函数row_number() over(partition by Fbond_id ORDER BY Findex desc, Fmodify_time desc) AS row_num
3 .遇到没有详细报错的长SQL,例如:A left join B left join C。可以拆分SQL,分别运行A,B,C子SQL,查找错误
4 .遇到出数为空的情况长SQL,例如:A left join B left join C。可以拆分SQL分别查看,分别运行A,B,C子SQL,查找为空的子查询
5 .全量表理应不该做分区,若不小心做了分区。则取数的时候一定要带上分区,否则如果该分区是每个月跑的,不加f_p_date分区,则会出现非常多的重复。因为2019.02.28包含2019.01.31和2018.12.31和… 所有之前的重复拉取
6 .A union all B,需要A,B表提出的列,字段数量,数据类型,字段顺序必须一致。Union的实际意义,就是列保持不变,累加行数
7 .join的实际意义,就是根据关联性,列和行都增大(根本:笛卡尔乘积)
8 .使用union all汇总一个表p,然后进行select xxx,zzz,yyy from p出数,如果使用了sum(),或者其它聚合函数,那么其他字段必须也是单行输出,因为整个出数的行数是以小优先。假设其中一列不符合这个原则,那么“1:多行”无法匹配,理应报错。
9 .当一个执行需要很长时间的SQL,并且带有leftjoin的各种语句嵌套,在定点测的时候,最好在关联条件ON后面增加AND Uid = 12345 的固定值。这样抽样,可以大大减少测试SQL脚本执行的时间。(也拆分查)
10 .使用join关联出现null字段,select a.zzz,b.xxx,cyyy A leftjoin B leftjoin C left join D,先找出是哪个表出的字段为空,假设:B选取的字段为空,那么排查B子查询的问题,若是单表则排查关联字段问题和数据源问题。若是B子查询为自己的逻辑,则先排查代码逻辑问题。
11 .DESC 降序 / ASC升序
12 .遇到在数据库表结构不明确的字段区别,比如ftotal_interest / fpreety_total_interest,可以先select * from …该表,查看下,发现这两数值只有单位的区别,则可以识别为前者为大单位存储,后者为小单位存储,但取值为同一个只。
13 .当苦思冥想,在逻辑上没有发现问题,各种逻辑拆分也没发现问题的情况下。可以怀疑是数据源的问题!抽样调查数据源表。
14 . count()函数和sum()函数没法同时使用,可以先把count()出来的,再包一层sum()
15 .关于测试,通过不同逻辑的脚本输出,到基本表去进行测试
16 .case when的用法案例:
select a.zzz as zzz,
a.yyy as yyy,
case
when substr(a.ffq_order_id, 1, 1) = 'O' then a.ffq_order_id
when substr(a.ffq_order_id, 1, 1) = 'X' then substr(substr(a.ffq_order_id, 7), 1,
length(substr(a.ffq_order_id, 7)) - 4)
when substr(a.ffq_order_id, 6, 1) = 'O' then substr(a.ffq_order_id, 6)
WHEN substr(a.ffq_order_id, 1, 1) = 'A' then b.forder_id
else a.ffq_order_id end as forder_id,
from XXX_table
case when案例2:
select
sum (case when t.num >= 1 and t.num < 5 then 1 else 0 end) as num_1_5
sum (case when t.num >= 5 and t.num < 10 then 1 else 0 end) as num_5_10
sum (case when t.num >= 10 and t.num < 15 then 1 else 0 end) as num_10_15
sum (case when t.num >= 15 and t.num < 20 then 1 else 0 end) as num_15_20
sum (case when t.num >= 20 then 1 else 0 end) as num_15_20
from
(
xxx
)t
17 .where条件和可以放在 A leftjoin B leftjoin C leftjoin D 之后 where b.status = !80 。似乎可以节省资源开销
18 .对于left join关联查询,其实就是根据一个条件。匹配两张表,使其成为一张合体新表。然后where再筛选出需要的条件
19 .gruop by 的用法案例 (group by什么,就只能统计它之外的东西)
SELECT o.Flicai_account_date
SUM(o.Flicai_repay_capital+o.Finvest_interest+o.Foverdue_interest)/100
FROM finance_db.t_finance_repay_order o
LEFT JOIN finance_db.t_finance_loan l ON o.Forder_id = l.Ffq_order_id
WHERE l.Faccount_date >= '2018-04-24'
AND o.Fstatus = 11
AND o.Fguarantee_flag = 1
AND o.Flicai_account_date >= '2018-10-01'
AND o.Flicai_account_date <= '2018-12-31'
group by o.Flicai_account_date
20 .group by 一般要和聚合函数一起使用,比如 count() sum(),两个要点:①出现在select后面的字段:要么是是聚合函数中的,要么就是group by 中做包含的字段 ②要筛选结果:where + group by 或者group by + having
21 .sum() 函数,无法直接sum出来,被命名为 as"别名" 的列,方案有两种:①在外面包一层,在select sum(a+b+c+d) ② 不需要再包一层,直接把别名中的具体操作,从内部提出来,放到sum( (ab)+(cd)+…)等
22 .Fuid的 in (123,234,345) 会过滤掉出数不存在的列,所以当需要输出所有的行,方法①:需要先把括号里的Fuid创建一个表,并作为主表去inner join 副表,这样才能保证该Fuid的存在,方法②:使用一个函数吧括号内的转换成一张表
23 .要再一条sql中插入,批量插入数据。
insert into table xxx as
select 412 as id union all
select 1632 as id union all
select 2310 as id union all
select 6997 as id union all
select 54409 as id
24 .select * from 主表t1 leftjoin 从表t2 on t1.xxx = t2.xxx and 条件2 and 条件3 where xxx 。←这种写法可以使得主表的所有匹配项都显示出来,因为是先join再做on后面的条件。如果使用where就不可行,因为会先执行where的筛选再进行匹配,会让主表t1不全
25 .COALESCE( xxx,0 ) 如果xxx不为空则输出xxx , 如果为空null,则输出 0
26 .Hive创表是不允许表中有模糊命名的。必须给函数生成值一个准确的命名,如下最后一行 as xxx:
create table if not exists XXX stored as orc as
select
sum(a.Famount)
,sum(a.Fcapital)
,substr(a.newFbond_id,3,8) as xxx
27 .新增多列:SQL:ALTER TABLE tablename ADD COLUMN 列名 数据类型, ADD COLUMN 列名 数据类型;
HQL:alter table tablename add columns ( 列名 double,列名 double)
28 . select t1.* from (select explode(array(1,2,3))) t1 可以把数组转成表记录
29 .If的用法
select count(t.Fuid) as numbers,
if(t.Famount < 1000000, '小于1000000', if(t.Famount >= 1000000 and t.Famount < 5000000, '1000000-5000000',
if(t.Famount >= 5000000 and t.Famount < 10000000, '5000000-10000000', '大于10000000'))) as g
from (
select s.Fuid as Fuid, sum(s.fbalance + s.fbuy_capital - s.fsell_capital - s.frepay_capital) as Famount
from xx表 s
where s.ftimeframe = '2019-04-02'
and s.Fplan_status = 110
group by s.Fuid) t
group by g
30.模糊查询 通配符 4403% 表示开头是4403后面随意,%4403:左匹配。4403%:表示右匹配; _ 表示任意单个字符
select count(1) from jz_pure_snap.finance_db_t_finance_user where Fcredit_id like ‘4403%’
31 . inner join 和 exsit / not
inner Join 的脚本:
SELECT COUNT(DISTINCT t.Fuid),NOW() FROM (
SELECT d0.Fuid,MIN(d0.Faccount_date) Fmin_date,MAX(d0.Faccount_date) Fmax_date
FROM jz_snap.finance_db_t_finance_detail d0 WHERE d0.Fcreate_time < '2018-01-01' AND d0.Fdetail_type IN (80,100,161,162)
GROUP BY d0.Fuid
HAVING COUNT(d0.Fdetail_id) >= 2
)t
inner join jz_snap.finance_db_t_finance_detail d ON t.Fuid = d.Fuid
and d.Faccount_date >= '2017-01-01' AND d.Faccount_date < '2018-01-01' AND d.Fdetail_type IN (80,100,161,162)
exists 的脚本:
SELECT COUNT(DISTINCT t.Fuid),NOW() FROM (
SELECT d0.Fuid,MIN(d0.Faccount_date) Fmin_date,MAX(d0.Faccount_date) Fmax_date
FROM jz_snap.finance_db_t_finance_detail d0 WHERE d0.Fcreate_time < '2019-01-01' AND d0.Fdetail_type IN (80,100,161,162)
GROUP BY d0.Fuid
HAVING COUNT(d0.Fdetail_id) >= 2
)t
where
1=1
and EXISTS (
SELECT 1 FROM jz_snap.finance_db_t_finance_detail d where t.Fuid = d.Fuid
and d.Faccount_date >= '2018-01-01' AND d.Faccount_date < '2019-01-01' AND d.Fdetail_type IN (80,100,161,162)
)
以上两个脚本等价!
32 .date_add(‘2019-04-19’,-7) 值为 2019-04-12
33 .Mysql添加列到某列后面 alter table table add ‘列名’ int(20) default ‘0’ after id;
–添加到id列后面
案例:
ALTER TABLE `finance_db.t_informatio_disclosure_summary`
ADD COLUMN `Flender_month` BIGINT NOT NULL DEFAULT '0' COMMENT '当月借款人' after fqlzb_zhzb_in_cnt;
34 .Update
语法:UPDATE table_name SET field1=new-value1, field2=new-value2 [WHERE Clause]
35 .根据身份证 fcredit_id, 截取前两位,统计用户的省份
select
sum(case when SUBSTRING(fcredit_id,1,2)='11' then 1 else 0 end) as beijing,
sum(case when SUBSTRING(fcredit_id,1,2)='12' then 1 else 0 end) as tainjing,
sum(case when SUBSTRING(fcredit_id,1,2)='13' then 1 else 0 end) as heibei,
sum(case when SUBSTRING(fcredit_id,1,2)='14' then 1 else 0 end) as shanxi,
sum(case when SUBSTRING(fcredit_id,1,2)='15' then 1 else 0 end) as neimenggu,
sum(case when SUBSTRING(fcredit_id,1,2)='21' then 1 else 0 end) as liaoning,
sum(case when SUBSTRING(fcredit_id,1,2)='22' then 1 else 0 end) as jiling,
sum(case when SUBSTRING(fcredit_id,1,2)='23' then 1 else 0 end) as heilongjiang,
sum(case when SUBSTRING(fcredit_id,1,2)='31' then 1 else 0 end) as shanghai,
sum(case when SUBSTRING(fcredit_id,1,2)='32' then 1 else 0 end) as jiangsu,
sum(case when SUBSTRING(fcredit_id,1,2)='33' then 1 else 0 end) as zhejiang,
sum(case when SUBSTRING(fcredit_id,1,2)='34' then 1 else 0 end) as anhui,
sum(case when SUBSTRING(fcredit_id,1,2)='35' then 1 else 0 end) as fujian,
sum(case when SUBSTRING(fcredit_id,1,2)='36' then 1 else 0 end) as jiangxi,
sum(case when SUBSTRING(fcredit_id,1,2)='37' then 1 else 0 end) as shangdong,
sum(case when SUBSTRING(fcredit_id,1,2)='41' then 1 else 0 end) as henan,
sum(case when SUBSTRING(fcredit_id,1,2)='42' then 1 else 0 end) as hubei,
sum(case when SUBSTRING(fcredit_id,1,2)='43' then 1 else 0 end) as hunan,
sum(case when SUBSTRING(fcredit_id,1,2)='44' then 1 else 0 end) as guangdong,
sum(case when SUBSTRING(fcredit_id,1,2)='45' then 1 else 0 end) as guangxi,
sum(case when SUBSTRING(fcredit_id,1,2)='46' then 1 else 0 end) as hainan,
sum(case when SUBSTRING(fcredit_id,1,2)='50' then 1 else 0 end) as chongqin,
sum(case when SUBSTRING(fcredit_id,1,2)='51' then 1 else 0 end) as sichuan,
sum(case when SUBSTRING(fcredit_id,1,2)='52' then 1 else 0 end) as guizhou,
sum(case when SUBSTRING(fcredit_id,1,2)='53' then 1 else 0 end) as yunnan,
sum(case when SUBSTRING(fcredit_id,1,2)='54' then 1 else 0 end) as xizang,
sum(case when SUBSTRING(fcredit_id,1,2)='61' then 1 else 0 end) as shanxi,
sum(case when SUBSTRING(fcredit_id,1,2)='62' then 1 else 0 end) as gansu,
sum(case when SUBSTRING(fcredit_id,1,2)='63' then 1 else 0 end) as qinghai,
sum(case when SUBSTRING(fcredit_id,1,2)='64' then 1 else 0 end) as ningxia,
sum(case when SUBSTRING(fcredit_id,1,2)='65' then 1 else 0 end) as xinjiang,
sum(case when SUBSTRING(fcredit_id,1,2)='71' then 1 else 0 end) as taiwan,
sum(case when SUBSTRING(fcredit_id,1,2)='81' then 1 else 0 end) as xianggang,
sum(case when SUBSTRING(fcredit_id,1,2)='91' then 1 else 0 end) as aomen
from jz_tmp.borrower_fcredit_20190426
36 .类似java,sum()函数自带遍历+只需要在其中新增case when xxx then xxx1 case when xxx then xxx2 … end)
案例:
--#8 (聚合状态110,400)
select
sum(datediff(d.fmodify_time,p.fcreate_time) * d.famount) /
sum(case when p.fplan_status = 110 then d.famount when p.fplan_status = 400 then p.famount end)
from
xxx表 p
left join
xxxx表 d
on p.fplan_id = d.fplan_id
where p.fplan_status in (110,400) --两种状态
and p.fproduct_type = 1
and p.ftimeframe = '2019-05-13' --观察日
37 . datediff()用法
--值为31
select datediff('2016-07-30 08:28:59.0','2016-08-30 08:10:06.0')
38 . 显示创表语句
show create table xxx
39 .显示表字段
desc table xxx
40 . substr(F_p_date,1,10) 从1开始包括10 假设原值为 2019-05-22 13:13:13
结果为 2019-05-22
41 .使网页可复制:输入代码javascript:void($={});
42 .show create table xxx表; 可以查看hive的创表语句
desc xxx表;可以查看表的字段结构
43 .notepad++ 替换的正则表达式:①$ 每行末尾 ②^ 每行行首
44 .创建分桶表
create table students_tmp (id int, name string) clustered by (id) into 2 buckets stored as orc
45 .bucket桶的理解:我桶的概念就是MapReduce的分区的概念。物理上每个桶就是目录里的一个文件,一个作业产生的桶(输出文件)数量和reduce任务个数相同。
而分区表的概念,则是新的概念。分区代表了数据的仓库,也就是文件夹目录。每个文件夹下面可以放不同的数据文件。通过文件夹可以查询里面存放的文件。但文件夹本身和数据的内容毫无关系。
桶则是按照数据内容的某个值进行分桶,把一个大文件散列称为一个个小文件。(摘自网上)
46 .SELECT (pmod(datediff(‘2019-07-14’, ‘2014-01-06’), 7) + 1) ;值为1~7对应周一到周日,判断p0是星期几
47 .遇到莫名的报错,可以先把错误的地方删除屏蔽,派出这个地方的问题,让真实报错现行
48 .SUM( xxx ) over (partition by yyy order by zzz) as vvv
统计sum() 根据 yyy 分组 根据 zzz 排序
49 .select count(1) from xxx表 t1 join zzz表 t2 on t1.findex=t2.findex where t1.floan_id<>t2.floan_id.
含义是两表关联,取得两个floan_id不相同的
50 .datediff( a.Fvip_end_time,‘2019-08-25’) > 0 日期是用 ‘后面的’-‘前面的’ > 0
51 .当多个存在多个leftjoin的时候,并提取Fuid,Date,sum(Fmount + Fasset),的后边一定要跟group by Fuid,Date 否则会报错
52 .A表 leftjoin B表,左边的作为主表,即要展示的字段基础,右边为配合左边匹配的表。例如需求,’‘未来七天的预测’’,我们需要把,月月升,按月转出的日子作为副表B,而时间全表作为主表A,让副表去匹配主表。这样才能避免B表中不存在转出日为30号,和31号的尴尬局面。
53 .维度选择,比如预测历史的,创表行列构造,使用列,分别是
(Fdate 统计日期) (Ffocus_date 预测聚焦日期) (Fmonth_day 周周升瑞出日) (金额)
如此图,在选取表报的时候,就会非常方便和明确
54 .A表 leftjoin b表,如果发现匹配不上,可以将副表(即B表)的where条件屏蔽掉。如果还是匹配不了,那估计大概率是副表数据有问题了!
55 .join的用法,A表 join B表,可以取到他们的’‘交集’’,当所取字段为双方均有的情况,随意取A表的Fuid或者B表的Fuid均可。
56 .
对宽表的一些理解:
1.重要的是业务,要理解资金流向(信息流),从A表到B表,如何记录,记录在哪个字段,什么状态。并发执行还是顺序执行,最终汇总在哪个表。
2.宽表分成detial(过程表,即会记录每个明细)
宽表分成final表(终态表,只记录时刻终止)
detail表过程的汇总即为final表
57.sql底层构造函数报错,=SQL=
struct
出现原因: 此类报错是因为,底层构造函数不支持带()括号等 特殊字符作为列名 的操作
**解决方案:**把sum(balance) 改成合理的列名 如 sum(balance) as balance_sum