创建表:
create external table `etl_fb_unmatched_history`(
`device_id_md5` string,
`device_type` string,
`platform` string,
`package_name` string)
row format delimited fields terminated by '\t'
location 's3://mob-emr-test/dataplatform/DataWareHouse/data/dwh/etl_fb_unmatched_history';
修改表分区:
alter table etl_fb_org_daily drop if exists partition (yr=‘yyyy’,mt=‘mm’,dt=‘dd’);
alter table etl_fb_org_daily add partition (dt=‘20190620’) location ‘s3://etl_fb_org_daily/2019/06/20’;
加载数据:
load data local inpath ‘/home/hadoop/spark/employees’ overwrite into table employees;
insert overwrite table md5_match partition (dt=‘20190623’)
select id,md5(id) as id_md5 from db.ods_info where dt=‘20190623’ group by id;
修改表名:
alter table table_name rename to new_table_name;
添加列:
alter table tablename add columns(column1 string comment ‘xxxx’,column2 long comment ‘yyyy’) cascade;
修改列名:
alter table table_name change column column_name column_newName int comment ‘column_name’;
内部表和外部表的转换:
alter table table_name set TBLPROPERTIES(‘EXTERNAL’=‘TRUE’);//内部表转化成外部表
alter table table_name set TBLPROPERTIES(‘EXTERNAL’=‘FALSE’);//外部表转成内部表
case when语句:
第一种:
CASE WHEN sex = ‘1’ THEN ‘男’
WHEN sex = ‘2’ THEN ‘女’
ELSE ‘其他’ END
第二种:
CASE sex
WHEN ‘1’ THEN ‘男’
WHEN ‘2’ THEN ‘女’
ELSE ‘其他’ END
hive使用json serde解析json数据:
create external table etl_fb_org_daily
(
timestamp_date
string,
os
string,
log_time
bigint)
partitioned by (
dt
string)
row format serde ‘org.apache.hive.hcatalog.data.JsonSerDe’
location ‘s3://etl_fb_org_daily’;
hive json serde jar包下载地址:
https://repository.cloudera.com/content/repositories/releases/org/apache/hive/hcatalog/hive-hcatalog-core/
可以直接用新版本的hive-hcatalog-core-2.3.3.jar
union all的使用:
select t3.col from(
select a as col from t1
UNION ALL
select b as col from t2
) as t3;
通过Shell打_SUCCESS标记:
hadoop fs -touchz s3://output_path/_SUCCESS
Spark禁用打_SUCCESS标记
df.write.mode(SaveMode.Overwrite)
.option(“orc.compress”, “zlib”)
.option(“mapreduce.fileoutputcommitter.marksuccessfuljobs”, false)
.orc(output)
Hive复杂数据结构:
array
map
struct
coalesce:
coalesce(‘string1’,‘string2’,‘string3’)
按顺序依次选取不为空的字符串,若都为空,则返回null
concat_ws:
concat_ws(’,’,‘string1’,‘string2’,‘string3’)
连接字符串,可以连接不同字段,也可以搭配group by 连接同一个字段
concat_ws(’,’,collect_set(application_input_dir))
再搭配collect_set()函数还可以实现去重
regexp_replace:
regexp_replace(device_id,’-’,’’)
regexp_replace(‘00000000-0206-c316-222a-12920206c316’, ‘-’, ‘’)
字符串模式匹配替换,第二个字符串是模式匹配字符串,有些需要加转义符(普通字符加转义符也不影响使用)。
例子:
regexp_replace(‘foolish’, ‘oo|is’, ‘’) 返回 flh