sqoop2安装

sqoop有两个版本

两个完全不同的版本,不兼容
sqoop1是指 1.4.x
sqoop2是指 1.99.x

1、下载安装包

下载地址:https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/

本文使用的是版本是1.99.7 :sqoop-1.99.7-bin-hadoop200.tar.gz


安装sqoop需要以hadoop为基础,但是sqoop只需安装在期中一个节点即可,DataNode或者namenode都可以,只有能访问hadoop集群及相关hadoop的配置、jar包即可

2、解压安装包

将安装包拷贝至hadoop的home目录下,并解压

tar -xvf sqoop-1.99.7-bin-hadoop200.tar.gz


3、配置sqoop

cd sqoop-1.99.7-bin-hadoop200/conf

1)配置sqoop.properties文件:

sqoop.properties文件只需要配置一下四个参数即可

##hadoop的配置文件目录
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/home/hadoop/hadoop-2.6.5/etc/hado
op
#安全验证方式
org.apache.sqoop.security.authentication.type=SIMPLE
org.apache.sqoop.security.authentication.handler=org.apache.sqoop.security.authentication.SimpleAuthent
icationHandler
org.apache.sqoop.security.authentication.anonymous=true


2)sqoop_bootstrap.properties文件不用改,使用默认的就好,其实就是一行

sqoop.config.provider=org.apache.sqoop.core.PropertiesConfigurationProvider

3)配置catalina.properties

也是一行,根据自己的hadoop的安装路径,将/home/hadoop/hadoop-2.6.5更换成自己的hadoop的安装目录

common.loader=${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,/home/hadoop/hadoop-2.6.5/share/hadoop/common/*.jar,/home/hadoop/hadoop-2.6.5/share/hadoop/common/lib/*.jar,/home/hadoop/hadoop-2.6.5/share/hadoop/hdfs/*.jar,/home/hadoop/hadoop-2.6.5/share/hadoop/hdfs/lib/*.jar,/home/hadoop/hadoop-2.6.5/share/hadoop/mapreduce/*.jar,/home/hadoop/hadoop-2.6.5/share/hadoop/mapreduce/lib/*.jar,/home/hadoop/hadoop-2.6.5/share/hadoop/tools/lib/*.jar,/home/hadoop/hadoop-2.6.5/share/hadoop/yarn/*.jar,/home/hadoop/hadoop-2.6.5/share/hadoop/yarn/lib/*.jar,/home/hadoop/hadoop-2.6.5/share/hadoop/httpfs/tomcat/lib/*.jar

4、配置hadoop

除了配置sqoop本身,还需要在hadoop里增加一些配置项

1)配置core-site.xml

增加以下两项配置,只需将$SERVER_USER换成运行sqoop的用户,比如我用的是hadoop,就换成hadoop.proxyuser.hadoop.hostshadoop.proxyuser.hadoop.groups


  hadoop.proxyuser.$SERVER_USER.hosts
  *


  hadoop.proxyuser.$SERVER_USER.groups
  *

5、配置环境变量

1)配置SQOOP环境变量

vim ~/.bashrc 

export SQOOP_HOME=/home/hadoop/sqoop-1.99.7-bin-hadoop200
export SQOOP_SERVER_EXTRA_LIB=$SQOOP_HOME/extra
export PATH=$PATH:$SQOOP_HOME/bin

新建/home/hadoop/sqoop-1.99.7-bin-hadoop200/extra 目录

2)修改hadoop相关变量
sqoop的安装依赖于hadoop的环境变量,$HADOOP_COMMON_HOME,$HADOOP_HDFS_HOME, $HADOOP_MAPRED_HOME 和 $HADOOP_YARN_HOME。

如果之前在.bashrc中只配置了$HADOOP_HOME,但没有配置其他几个环境变量,默认sqoop会自动寻找其他四个环境变量。但是如果除了$HADOOP_HOME,还配置了其他四个变量,那么在启动sqoop 的时候,可能会报NoClassDefFoundError错。解决的办法是,

修改bin/sqoop.sh
将以下四行注释
HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-${HADOOP_HOME}/share/hadoop/common}
HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-${HADOOP_HOME}/share/hadoop/hdfs}
HADOOP_MAPRED_HOME=${HADOOP_MAPRED_HOME:-${HADOOP_HOME}/share/hadoop/mapreduce}
HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-${HADOOP_HOME}/share/hadoop/yarn}

新增以下四行,也可以把${HADOOP_HOME}换成绝对路径
HADOOP_COMMON_HOME=${HADOOP_HOME}/share/hadoop/common
HADOOP_HDFS_HOME=${HADOOP_HOME}/share/hadoop/hdfs
HADOOP_MAPRED_HOME=${HADOOP_HOME}/share/hadoop/mapreduce
HADOOP_YARN_HOME=${HADOOP_HOME}/share/hadoop/yarn

6、导入第三方jar包

将数据库jdbc驱动拷贝至/home/hadoop/sqoop-1.99.7-bin-hadoop200/extra

mysql-connector-java-5.1.30.jar

ojdbc14.jar


7、启动sqoop

在启动之前,可以用验证工具验证配置文件是否正确

[hadoop@hadoop02 sqoop-1.99.7-bin-hadoop200]$ bin/sqoop2-tool verify 
Setting conf dir: /home/hadoop/sqoop-1.99.7-bin-hadoop200/bin/../conf
Sqoop home directory: /home/hadoop/sqoop-1.99.7-bin-hadoop200
Sqoop tool executor:
        Version: 1.99.7
        Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
        Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine
Running tool: class org.apache.sqoop.tools.tool.VerifyTool
1    [main] INFO  org.apache.sqoop.core.SqoopServer  - Initializing Sqoop server.
21   [main] INFO  org.apache.sqoop.core.PropertiesConfigurationProvider  - Starting config file poller thread
Verification was successful.
Tool class org.apache.sqoop.tools.tool.VerifyTool has finished correctly.
启动sqoop

bin/sqoop2-server start

[hadoop@hadoop02 sqoop-1.99.7-bin-hadoop200]$ bin/sqoop2-server start
Setting conf dir: /home/hadoop/sqoop-1.99.7-bin-hadoop200/bin/../conf
Sqoop home directory: /home/hadoop/sqoop-1.99.7-bin-hadoop200
Starting the Sqoop2 server...
0    [main] INFO  org.apache.sqoop.core.SqoopServer  - Initializing Sqoop server.
27   [main] INFO  org.apache.sqoop.core.PropertiesConfigurationProvider  - Starting config file poller thread
Sqoop2 server started.
[hadoop@hadoop02 sqoop-1.99.7-bin-hadoop200]$ jps
7248 Jps
4163 NodeManager
4055 DataNode
7226 SqoopJettyServer


至此sqoop2安装完成,途中遇到的错误

--报错1
[hadoop@hadoop01 sqoop-1.99.7-bin-hadoop200]$ ./bin/sqoop.sh server start
Setting conf dir: ./bin/../conf
Sqoop home directory: /home/hadoop/sqoop-1.99.7-bin-hadoop200
Starting the Sqoop2 server...
1    [main] INFO  org.apache.sqoop.core.SqoopServer  - Initializing Sqoop server.
20   [main] INFO  org.apache.sqoop.core.PropertiesConfigurationProvider  - Starting config file poller thread
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
        at org.apache.sqoop.security.authentication.SimpleAuthenticationHandler.secureLogin(SimpleAuthenticationHandler.java:36)
        at org.apache.sqoop.security.AuthenticationManager.initialize(AuthenticationManager.java:98)
        at org.apache.sqoop.core.SqoopServer.initialize(SqoopServer.java:57)
        at org.apache.sqoop.server.SqoopJettyServer.(SqoopJettyServer.java:67)
        at org.apache.sqoop.server.SqoopJettyServer.main(SqoopJettyServer.java:177)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 5 more
原因:sqoop没有正确的找的hadoop的jar包,因为在安装hadoop时,配置了HADOOP_COMMON_HOME、HADOOP_HDFS_HOME、HADOOP_MAPRED_HOME、HADOOP_YARN_HOME四个环境变量
解决方法:
修改bin/sqoop.sh
将以下四行注释
HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-${HADOOP_HOME}/share/hadoop/common}
HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-${HADOOP_HOME}/share/hadoop/hdfs}
HADOOP_MAPRED_HOME=${HADOOP_MAPRED_HOME:-${HADOOP_HOME}/share/hadoop/mapreduce}
HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-${HADOOP_HOME}/share/hadoop/yarn}


新增以下四行,也可以把${HADOOP_HOME}换成绝对路径
HADOOP_COMMON_HOME=${HADOOP_HOME}/share/hadoop/common
HADOOP_HDFS_HOME=${HADOOP_HOME}/share/hadoop/hdfs
HADOOP_MAPRED_HOME=${HADOOP_HOME}/share/hadoop/mapreduce
HADOOP_YARN_HOME=${HADOOP_HOME}/share/hadoop/yarn















你可能感兴趣的:(hadoop)