记录:每天做了什么
Query数据校验
1、Excel数据校验,每个数据进行比对,
SUM(B2:B3)/E3
2019-08-09 格式更改为2019/8/9 :=Year(几行几列) =MONTH() =DAY() =DATE(YEAR,MONTH,DAY)
HIVESQL:、
2、关于时间:2.1、a.yearmonth between from_unixtime(unix_timestamp(rent_st1,'yyyy-MM-dd'),'yyyyMM') and from_unixtime(unix_timestamp(rent_end1,'yyyy-MM-dd'),'yyyyMM')
2.2、from_unixtime(unix_timestamp(dim.lease_st,'yyyy-MM-dd'),'yyyy/MM/dd') as lease_start_date,、
2.3、case when df.month='01' then concat(cast(cast(df.year as bigint)-1 as string),'12') else cast(cast(df.yearmonth as bigint) -1 as string) end=concat(dim.year,dim.month)
3、关于时间比较:
错误:hive sql报错FAILED: SemanticException Line 0:-1 Both left and right aliases encountered in JOIN或hive执行报错:Both left and right aliases encountered in JOIN 's1'
原因:两个表join的时候,不支持两个表的字段 非相等 操作。
可以把不相等条件拿到 where语句中。
例如:
LEFT JOIN dw_fact_leasing_rental df ON
df.doco=dim.doco and df.lsvr=dim.lsvr
and bb.lease_st<=dim.lease_st
and bb.lease_et>=dim.lease_et
可以改写成
INNER JOIN dw_fact_leasing_rental df ON
df.doco=dim.doco and df.lsvr=dim.lsvr
where bb.lease_st<=dim.lease_st
and bb.lease_et>=dim.lease_et
当然此时要注意 null 引起的记录数不一致。
---添加:base_rent_type
值得借鉴:
SELECT unit, brand, lease_st, lease_et
, row_number() OVER(PARTITION BY unit ORDER BY lease_st asc) AS lease_order
FROM( SELECT unit, brand--, count(1) count
, min(lease_st) lease_st, max(lease_et) lease_et
FROM kerryon.dw_dim_unit_contract_master_monthly
WHERE bu_type in ('商场','非固定出租') and brand <>''
GROUP BY unit,brand
ORDER BY unit,lease_st) aa
数据:
uint brand lease_st lease_et lease_order
E_01 A 2017-09-09 2018-02-02 1
E_01 A 2017-09-09 2018-02-02 1
E_01 B 2018-09-09 2019-06-06 2
E_01 B 2018-09-09 2019-06-06 2
E_01 C 2019-09-09 2022-12-12 3
今天校验3张Table数据
Table2 四张表 重点最后一张表的数据校验
Table3
Table4
协助同事写了Table6的Query
Tebleau 学习中