odps 表数据同步到hive仓库

1.复制建表语句。做适当修改,把字段类型换成hive支持的类型。启动hive,粘贴建表

分区表例子

CREATETABLEdw_ft_barrage_record (pstring'from os: 1,2,5,9',roomidstring'房间号',uidstring'用户id',categoryidstring'类目id',sendtime DATETIME'发言时间',isarmyint'是否水军',isrealint'是否真实用户')'弹幕事实表(基于mongo_barrage)'PARTITIONEDBY(ptstring'按日期的分区列');PARTITIONED BY (pt string)row format delimited FIELDS TERMINATED BY ','STORED AS TEXTFILE;

注意:

rowformatdelimited FIELDS TERMINATED BY','这是列分隔符,必须和hdfs-write插件一致。方可插入数据

2.创建一个文件

vim odpsToKylintoday=$(date +%Y%m%d)hive<

以上是为以后分区表目录做的准备。历史分区可以参照一下代码

vim history.shaltertabledw_ft_barrage_recordaddpartition(pt='starttime');altertabledw_ft_barrage_recordaddpartition(pt="xxx");altertabledw_ft_barrage_recordaddpartition(pt="xxx");altertabledw_ft_barrage_recordaddpartition(pt="xxx");altertabledw_ft_barrage_recordaddpartition(pt="xxx");altertabledw_ft_barrage_recordaddpartition(pt="now");

3.切换到root用户下,对2个文件授权。设置调度任务

chmod 777 fileName对odpsToKylin 设置定时调度任务为当前用户创建cron服务1.  键入 crontab  -e 编辑crontab服务文件2.输入以下内容0  0  * * * * /bin/sh /home/qmbd/odpsToKylin.sh3.启动服务一般启动服务用  /sbin/servicecrond start 若是根用户的cron服务可以用 sudoservicecrond start, 4. 查看该用户下的crontab服务是否创建成功 crontab  -l

5.编辑Hdfs-write插件 如下

{"configuration": {"reader": {"plugin":"odps","parameter": {"partition":"pt=${bdp.system.bizdate}","datasource":"odps_first","column": ["*"],"table":"dw_ft_barrage_record"}    },"writer": {"plugin":"hdfs","parameter": {"path":"/user/hive/warehouse/dw.db/dw_ft_barrage_record/pt=${bdp.system.bizdate}","fileName":"dw_ft_barrage_record","compress":"GZIP","column": [          {"name":"p","type":"string"},          {"name":"roomid","type":"string"},          {"name":"uid","type":"string"},          {"name":"categoryid","type":"string"},          {"name":"sendtime","type":"timestamp"},          {"name":"isarmy","type":"bigint"},          {"name":"isreal","type":"bigint"}        ],"defaultFS":"hdfs://10.7.20.15:8020","writeMode":"append","fieldDelimiter":",","encoding":"UTF-8","fileType":"text"}    },"setting": {"errorLimit": {"record":"0"},"speed": {"concurrent":"10","mbps":"20"}    }  },"type":"job","version":"1.0"}

6.提交任务到阿里云进行每天调度即可

说明: 以上是同步分区表的demo.非分区在上面的基础上只需要满足以下条件

1.建好表保证字段类型正确

2.指定的分割符和hdfs-write分隔符一致

3.删除hdfs-write中的pt=${bdp.system.bizdate}

4.hdfs-write中的列要和建表列一致类型匹配。

总结:

1.建表要仔细。2.配置插件要仔细3.调度任务要早于同步任务4.建调度任务命令一定要顶格5.在hive支持字段的类型基础上要调整hdfs-write插件中的字段类型

你可能感兴趣的:(odps 表数据同步到hive仓库)