本文基于AWS S3,oozie 4.5.0,sqoop 1.4.7,sqoop自己安装,其余AWS安装。
配置SQOOP_HOME环境变量
vim /etc/profile
export SQOOP_HOME=/usr/lib/sqoop
export PATH=$PATH:$SQOOP_HOME/bin
配置sqoop-env.sh
# included in all the hadoop scripts with source command
# should not be executable directly
# also should not be passed any arguments, since we need original $*
# Set Hadoop-specific environment variables here.
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/usr/lib/hadoop
#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/usr/lib/hadoop
#set the path to where bin/hbase is available
#export HBASE_HOME=
#Set the path to where bin/hive is available
export HIVE_HOME=/usr/lib/hive
#Set the path for where zookeper config dir is
#export ZOOCFGDIR=
创建properties文件
nameNode=hdfs://IP:
8020
jobTracker=IP:
8032
hiveUris=thrift://IP:
9083
oozie.use.
system.libpath=true
queueName=default
importDir=/home/
hadoop/importHive.sql
oozieAppsRoot=user/
hadoop/apps
oozieDataRoot=user/
hadoop/datas
oozie.wf.
application.
path=${nameNode}/${oozieAppsRoot}/test.xml
outputDir=
output
创建XML工作流文件
xml
version=
"1.0"
encoding=
"UTF-8"?>
"uri:oozie:workflow:0.5"
name=
"sqoop-wf">
<
start
to=
"sqoop-node"/>
<
action
name=
"sqoop-node">
"uri:oozie:sqoop-action:0.3">
${jobTracker}
<
name-node>${nameNode}
name-node>
<
delete
path=
"${nameNode}/${oozieDataRoot}/${outputDir}"/>
<
configuration>
<
name>mapred.job.
queue.
name
name>
<
value>${queueName}
value>
<
property
>
<
name
>hive.metastore.uris
name
>
<
value
>${hiveuri}
value
>
property
>
<
property
>
<
name
>tez.use.cluster.hadoop-libs
name
>
<
value
>true
value
>
property
>
configuration>
import
--connect jdbc:mysql://IP:3306/db --username user -password 123456 --table test
--target-dir /user/hadoop/l3db/test --fields-terminated-by "," --hive-import --create-hive-table
--hive-table db.test
to=
"end"/>
to=
"fail"/>
action>
<
kill
name=
"fail">
<
message>Sqoop failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]
message>
kill>
<
end
name=
"end"/>
将XML上传到hdfs的工作目录,执行oozie
oozie job -config job.properties -run
查看oozie任务状态及日志
oozie job -info 0000030-180504042715459-oozie-oozi-W
oozie job -log 0000030-180504042715459-oozie-oozi-W
查看对应的hadoop任务
yarn application -list -appStates ALL
yarn logs -applicationId application_1525941313165_0005
注意:
注意事项
1.workflow 版本0,5 sqoop action 版本0.3
2.使用的是新版本的API,但旧版本依旧支持使用(可以不用改)
3.查看与之相关的hadoop任务,参考:
yarn application -list -appStates ALL
yarn logs -applicationId application_1525941313165_0005
找到对应的ERROR,进行修复即可。
问题1:
Oozie - Got exception running sqoop: Could not load db driver class: com.mysql.jdbc.Driver
解决方案:
将 mysql-connector-java-6.0.6-bin.jar 放进sharelib,更新lib执行如下命令
oozie
admin -sharelibupdate
问题2:
Caused by: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SqoopMain not found
解决方案:
oozie.use.system.libpath=true