一. 解决方案
1.首先查看source表的表结构及数据,可大概判断出哪些字段的值中可能包含分隔符。使用 字段名 like '%分割符%’,能准确判断出此字段是否包含分隔符,如果包含可选择其它分隔符试试,如果能挑选和数据不冲突的分隔符,那么sqoop就可以选定此分隔符。
2.这次遇到的表,什么分隔符以及转义字符(\b,\r,\n,\t,\r)都包含,所以采用替换字段中的分隔符的方式进行处理,同时hive这边的行分割符是\n,所以需要使用如下参数进行处理--hive-drop-import-delims
具体的sqoop语句如下:
sqoop import \
--hive-import \
--connect 地址 \
--username 账号 \
--password 密码 \
--query "select order_id, order_create_time, trans_time, 'status', trade_type, pay_type, prod_category, prod_sub_category, prod_name, pay_to_phone, pay_to_phone_prov, pay_currency_type, trans_amount, rep_code, pay_cust_acct_id, pay_cust_user_id, pay_cust_name, pay_cust_id_card_no, pay_cust_card_no, pay_cust_card_prov, pay_cust_card_type, col_cust_acct_id, col_cust_user_id, col_cust_name, col_cust_id_card_no, col_cust_card_no, col_cust_card_prov, shop_to_name, shop_to_mobile, shop_to_province, shop_to_city, shop_to_area, shop_to_street, last_update_time, shop_to_addr, trans_ip, ip_prov, trans_dfp, trans_imei, trans_mac, refund_times, merchant_id, REPLACE(trans_title,'\`','') as trans_title, trans_source, pay_chn_id, pay_psn, pay_chn_amount, REPLACE(sub_biz_data,'\`','') as sub_biz_data, fund_list, seller_payplus_id, buyer_payplus_id, fund_acc_id, fund_acc_type, notify_time, fund_amount, fund_code, out_trade_no, create_at, create_by, last_update_at, last_update_by, sub_biz_no, ori_partner_id, product_id, stype_id, pay_pack_amount, trade_industry, is_host, is_audit, task_mode, service_provider, withdraw_lock, remit_trans_id from 表 where \$CONDITIONS" \
--delete-target-dir \
--null-string '\\N' \
--null-non-string '\\N' \
--fields-terminated-by '`' \
--hive-drop-import-delims \
--target-dir /apps/hive/warehouse/mid.db/表 \
--hive-database mid \
--hive-table 表 \
--hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")' \
--split-by order_id \
--m 10