sqoop有两个版本
两个完全不同的版本,不兼容
sqoop1是指 1.4.x
sqoop2是指 1.99.x
1、下载安装包
下载地址:https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/
本文使用的是版本是1.99.7 :sqoop-1.99.7-bin-hadoop200.tar.gz
安装sqoop需要以hadoop为基础,但是sqoop只需安装在期中一个节点即可,DataNode或者namenode都可以,只有能访问hadoop集群及相关hadoop的配置、jar包即可
2、解压安装包
将安装包拷贝至hadoop的home目录下,并解压
tar -xvf sqoop-1.99.7-bin-hadoop200.tar.gz
3、配置sqoop
cd sqoop-1.99.7-bin-hadoop200/conf
1)配置sqoop.properties文件:
sqoop.properties文件只需要配置一下四个参数即可
##hadoop的配置文件目录
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/home/hadoop/hadoop-2.6.5/etc/hado
op
#安全验证方式
org.apache.sqoop.security.authentication.type=SIMPLE
org.apache.sqoop.security.authentication.handler=org.apache.sqoop.security.authentication.SimpleAuthent
icationHandler
org.apache.sqoop.security.authentication.anonymous=true
2)sqoop_bootstrap.properties文件不用改,使用默认的就好,其实就是一行
sqoop.config.provider=org.apache.sqoop.core.PropertiesConfigurationProvider
也是一行,根据自己的hadoop的安装路径,将/home/hadoop/hadoop-2.6.5更换成自己的hadoop的安装目录
common.loader=${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,/home/hadoop/hadoop-2.6.5/share/hadoop/common/*.jar,/home/hadoop/hadoop-2.6.5/share/hadoop/common/lib/*.jar,/home/hadoop/hadoop-2.6.5/share/hadoop/hdfs/*.jar,/home/hadoop/hadoop-2.6.5/share/hadoop/hdfs/lib/*.jar,/home/hadoop/hadoop-2.6.5/share/hadoop/mapreduce/*.jar,/home/hadoop/hadoop-2.6.5/share/hadoop/mapreduce/lib/*.jar,/home/hadoop/hadoop-2.6.5/share/hadoop/tools/lib/*.jar,/home/hadoop/hadoop-2.6.5/share/hadoop/yarn/*.jar,/home/hadoop/hadoop-2.6.5/share/hadoop/yarn/lib/*.jar,/home/hadoop/hadoop-2.6.5/share/hadoop/httpfs/tomcat/lib/*.jar
除了配置sqoop本身,还需要在hadoop里增加一些配置项
1)配置core-site.xml
增加以下两项配置,只需将$SERVER_USER换成运行sqoop的用户,比如我用的是hadoop,就换成
hadoop.proxyuser.$SERVER_USER.hosts
*
hadoop.proxyuser.$SERVER_USER.groups
*
1)配置SQOOP环境变量
vim ~/.bashrc
export SQOOP_HOME=/home/hadoop/sqoop-1.99.7-bin-hadoop200
export SQOOP_SERVER_EXTRA_LIB=$SQOOP_HOME/extra
export PATH=$PATH:$SQOOP_HOME/bin
新建/home/hadoop/sqoop-1.99.7-bin-hadoop200/extra 目录
2)修改hadoop相关变量如果之前在.bashrc中只配置了$HADOOP_HOME,但没有配置其他几个环境变量,默认sqoop会自动寻找其他四个环境变量。但是如果除了$HADOOP_HOME,还配置了其他四个变量,那么在启动sqoop 的时候,可能会报NoClassDefFoundError错。解决的办法是,
修改bin/sqoop.sh
将以下四行注释
HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-${HADOOP_HOME}/share/hadoop/common}
HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-${HADOOP_HOME}/share/hadoop/hdfs}
HADOOP_MAPRED_HOME=${HADOOP_MAPRED_HOME:-${HADOOP_HOME}/share/hadoop/mapreduce}
HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-${HADOOP_HOME}/share/hadoop/yarn}
新增以下四行,也可以把${HADOOP_HOME}换成绝对路径
HADOOP_COMMON_HOME=${HADOOP_HOME}/share/hadoop/common
HADOOP_HDFS_HOME=${HADOOP_HOME}/share/hadoop/hdfs
HADOOP_MAPRED_HOME=${HADOOP_HOME}/share/hadoop/mapreduce
HADOOP_YARN_HOME=${HADOOP_HOME}/share/hadoop/yarn
6、导入第三方jar包
将数据库jdbc驱动拷贝至/home/hadoop/sqoop-1.99.7-bin-hadoop200/extra
mysql-connector-java-5.1.30.jar
ojdbc14.jar
7、启动sqoop
在启动之前,可以用验证工具验证配置文件是否正确
[hadoop@hadoop02 sqoop-1.99.7-bin-hadoop200]$ bin/sqoop2-tool verify
Setting conf dir: /home/hadoop/sqoop-1.99.7-bin-hadoop200/bin/../conf
Sqoop home directory: /home/hadoop/sqoop-1.99.7-bin-hadoop200
Sqoop tool executor:
Version: 1.99.7
Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine
Running tool: class org.apache.sqoop.tools.tool.VerifyTool
1 [main] INFO org.apache.sqoop.core.SqoopServer - Initializing Sqoop server.
21 [main] INFO org.apache.sqoop.core.PropertiesConfigurationProvider - Starting config file poller thread
Verification was successful.
Tool class org.apache.sqoop.tools.tool.VerifyTool has finished correctly.
启动sqoop
bin/sqoop2-server start
[hadoop@hadoop02 sqoop-1.99.7-bin-hadoop200]$ bin/sqoop2-server start
Setting conf dir: /home/hadoop/sqoop-1.99.7-bin-hadoop200/bin/../conf
Sqoop home directory: /home/hadoop/sqoop-1.99.7-bin-hadoop200
Starting the Sqoop2 server...
0 [main] INFO org.apache.sqoop.core.SqoopServer - Initializing Sqoop server.
27 [main] INFO org.apache.sqoop.core.PropertiesConfigurationProvider - Starting config file poller thread
Sqoop2 server started.
[hadoop@hadoop02 sqoop-1.99.7-bin-hadoop200]$ jps
7248 Jps
4163 NodeManager
4055 DataNode
7226 SqoopJettyServer
至此sqoop2安装完成,途中遇到的错误
--报错1
[hadoop@hadoop01 sqoop-1.99.7-bin-hadoop200]$ ./bin/sqoop.sh server start
Setting conf dir: ./bin/../conf
Sqoop home directory: /home/hadoop/sqoop-1.99.7-bin-hadoop200
Starting the Sqoop2 server...
1 [main] INFO org.apache.sqoop.core.SqoopServer - Initializing Sqoop server.
20 [main] INFO org.apache.sqoop.core.PropertiesConfigurationProvider - Starting config file poller thread
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
at org.apache.sqoop.security.authentication.SimpleAuthenticationHandler.secureLogin(SimpleAuthenticationHandler.java:36)
at org.apache.sqoop.security.AuthenticationManager.initialize(AuthenticationManager.java:98)
at org.apache.sqoop.core.SqoopServer.initialize(SqoopServer.java:57)
at org.apache.sqoop.server.SqoopJettyServer.
at org.apache.sqoop.server.SqoopJettyServer.main(SqoopJettyServer.java:177)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 5 more
原因:sqoop没有正确的找的hadoop的jar包,因为在安装hadoop时,配置了HADOOP_COMMON_HOME、HADOOP_HDFS_HOME、HADOOP_MAPRED_HOME、HADOOP_YARN_HOME四个环境变量
解决方法:
修改bin/sqoop.sh
将以下四行注释
HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-${HADOOP_HOME}/share/hadoop/common}
HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-${HADOOP_HOME}/share/hadoop/hdfs}
HADOOP_MAPRED_HOME=${HADOOP_MAPRED_HOME:-${HADOOP_HOME}/share/hadoop/mapreduce}
HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-${HADOOP_HOME}/share/hadoop/yarn}
新增以下四行,也可以把${HADOOP_HOME}换成绝对路径
HADOOP_COMMON_HOME=${HADOOP_HOME}/share/hadoop/common
HADOOP_HDFS_HOME=${HADOOP_HOME}/share/hadoop/hdfs
HADOOP_MAPRED_HOME=${HADOOP_HOME}/share/hadoop/mapreduce
HADOOP_YARN_HOME=${HADOOP_HOME}/share/hadoop/yarn