sqoop基础知识以及sqoop抽到mysql

脚本 hdfs抽到mysql


   在从mysql等关系型数据库中导入数据到hdfs后会发现原来在mysql中为NULL的字段, 到hive后NULL值都变成了字段串'null'。

在导入的时候加上以下两个参数就可以解决:

--null-string '\N' 

--null-non-string '\N' 

这是因为, 在hive里面。NULL用N来表示的。你可以自己做个实验 insert overwrite table tb select NULL from tb1 limit 1;

然后在去查看原文件就可以发现了。


如果在导入后发现数据错位了,或者有好多原来有值的字段都变成了NULL, 这是因为你原表varchar类型的字段中可能含有nr等一些特殊字符。

可以加上

--hive-drop-import-delims


简单的解决办法就是加上参数--hive-drop-import-delims来把导入数据中包含的hive默认的分隔符去掉。





#!bin/bash
source /etc/profile


source /opt/shell/log/public.cfg
export LANG=zh_CN.UTF-8


cd /tmp


LOG_PATH="/app/log"


if [ -n "$1" ] && [ -n "$2" ];
then
#------------------$2 is not null
   #calculate the days between the two date then for each it
   startDate=`date -d "$1" '+%s'`
   endDate=`date -d "$2" '+%s'`
   stampDiff=`expr $endDate - $startDate`
   daydiff=`expr $stampDiff / 86400`
   #echo "days :"$daydiff
   for ((i=0;i<=$daydiff;i++))
    do
      dt=`date -d "$i days $1" '+%Y/%m/%d'`
      pt=`date -d "$i days $1" '+%Y%m%d'`
      nextday=`date -d "1 days $dt" '+%Y-%m-%d'`
      LOG_FILE="${LOG_PATH}/module/leagsoft/user/log_sqoop2mysql_${pt}"
      LOG_OUTPUT="${LOG_PATH}/module/leagsoft/user/log_sqoop2mysql_error_${pt}" 
      echo "#Sqoop `date '+%Y-%m-%d %H:%M:%S'`" >> ${LOG_FILE}


      pt=`date -d "$i days $1" '+%Y%m%d'`
          mysql -h${MyHOST} -u${MyUSER} -p${MyPASS} -D${MyDB} -e"alter table leagsoft_user   add partition (partition P${pt} values less than(to_days('${nextday}')));"


      hadoop fs -test -e "/model/leagsoft_user/mysql/${dt}"
      if [ $? -eq 0 ] ; then
          sqoop export --connect ${MyURL} --table "leagsoft_user" --export-dir "/model/leagsoft_user/mysql/${dt}" --input-fields-terminated-by '\001' --input-null-string '\\N' --input-null-non-string '\\N' >> ${LOG_FILE} 2>${LOG_OUTPUT}


      fi


      echo "#结束 `date '+%Y-%m-%d %H:%M:%S'`" >> ${LOG_FILE}


    done
#-------------------$1 is not null and $2 is  null
 elif [ -n "$1" ];
   then
   dt=`date -d "$1" '+%Y/%m/%d'`
   #echo "parameter 1 is not null"
      nextday=`date -d "1 days $dt" '+%Y-%m-%d'`
      pt=`date -d "$1" '+%Y%m%d'`
      LOG_FILE="${LOG_PATH}/module/leagsoft/user/log_sqoop2mysql_${pt}"
      LOG_OUTPUT="${LOG_PATH}/module/leagsoft/user/log_sqoop2mysql_error_${pt}" 
      echo "#Sqoop `date '+%Y-%m-%d %H:%M:%S'`" >> ${LOG_FILE}


          mysql -h${MyHOST} -u${MyUSER} -p${MyPASS} -D${MyDB} -e"alter table leagsoft_user   add partition (partition P${pt} values less than(to_days('${nextday}')));"


      hadoop fs -test -e "/model/leagsoft_user/mysql/${dt}"
      if [ $? -eq 0 ] ; then
          sqoop export --connect ${MyURL} --table "leagsoft_user" --export-dir "/model/leagsoft_user/mysql/${dt}" --input-fields-terminated-by '\001' --input-null-string '\\N' --input-null-non-string '\\N' >> ${LOG_FILE} 2>${LOG_OUTPUT}


      fi


      echo "#结束 `date '+%Y-%m-%d %H:%M:%S'`" >> ${LOG_FILE}


#******$1 is null then get yersterday's data
else
    #echo "parameter 1 and parameter 2 both null"
    dt=`date -d "yesterday" '+%Y/%m/%d'`
    pt=`date -d "yesterday" '+%Y%m%d'`
    LOG_FILE="${LOG_PATH}/module/leagsoft/user/log_sqoop2mysql_${pt}"
    LOG_OUTPUT="${LOG_PATH}/module/leagsoft/user/log_sqoop2mysql_error_${pt}"
    today=`date  +%Y-%m-%d`
    echo "#Sqoop `date '+%Y-%m-%d %H:%M:%S'`" >> ${LOG_FILE}


        mysql -h${MyHOST} -u${MyUSER} -p${MyPASS} -D${MyDB} -e"alter table leagsoft_user   add partition (partition P${pt} values less than(to_days('${today}')));"


    hadoop fs -test -e "/model/leagsoft_user/mysql/${dt}"
    if [ $? -eq 0 ] ; then
        sqoop export --connect ${MyURL} --table "leagsoft_user" --export-dir "/model/leagsoft_user/mysql/${dt}" --input-fields-terminated-by '\001' --input-null-string '\\N' --input-null-non-string '\\N' >> ${LOG_FILE} 2>${LOG_OUTPUT}
    fi


    echo "#结束 `date '+%Y-%m-%d %H:%M:%S'`" >> ${LOG_FILE}


fi


你可能感兴趣的:(sqoop)