Sqoop 防止数据导出不一致的参数配置

问题来源

官网原话是这样的:

Since Sqoop breaks down export process into multiple transactions, it is possible that a failed export job may result in partial data being committed to the database.
This can further lead to subsequent jobs failing due to insert collisions in some cases, or lead to duplicated data in others.
You can overcome this problem by specifying a staging table via the --staging-table option which acts as an auxiliary table that is used to stage exported data.
The staged data is finally moved to the destination table in a single transaction.

大概意思就是

“由于Sqoop将导出过程分解为多个事务,因此失败的导出作业可能会导致将部分数据提交到数据库。

 在某些情况下,这可能进一步导致后续作业因插入冲突而失败,而在其他情况下,则可能导致数据重复。

您可以通过--staging-table选项指定暂存表来解决此问题,该选项用作用于暂存导出数据的辅助表。

最后,已分阶段处理的数据将在单个事务中移至目标表。”

解决

sqoop export 
--connect jdbc:mysql://192.168.137.10:3306/user_behavior
--username root
--password 123456
--table app_cource_study_report
--columns watch_video_cnt,complete_video_cnt,dt
--fields-terminated-by "\t"
--export-dir "/user/hive/warehouse/tmp.db/app_cource_study_analysis_${day}"
--staging-table app_cource_study_report_tmp #创建临时表来存储结果,全部成功后再提交
--clear-staging-table
--input-null-string '\N'

 

你可能感兴趣的:(Sqoop 防止数据导出不一致的参数配置)