sqoop2安装

参考网上的文档,以及源码内容

1、安装准备工作

    sqoop2下载地址:http://apache.fayea.com/sqoop/1.99.6/sqoop-1.99.6-bin-hadoop200.tar.gz

2、安装到工作目录

    tar -xvf sqoop-1.99.6-bin-hadoop200.tar.gz

    mv sqoop-1.99.6-bin-hadoop200 /usr/local/

3、修改环境变量,可以将sqoop安装到hadoop用户下,我就用root的作为示例了

    vi /etc/profile

    添加如下内容:

    export SQOOP_HOME=/usr/local/sqoop-1.99.6-bin-hadoop200

    export PATH=$SQOOP_HOME/bin:$PATH

    export CATALINA_BASE=$SQOOP_HOME/server

    export LOGDIR=$SQOOP_HOME/logs

    保存退出及时生效:

    source /etc/profile

4、修改sqoop配置

    vi /usr/local/sqoop-1.99.6-bin-hadoop200/server/conf

    

    #修改指向我的hadoop安装目录

    org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/usr/local/hadoop-2.7.1/

    #把hadoop目录下的jar包都引进来

    vi /usr/local/sqoop-1.99.6-bin-hadoop200/server/conf/catalina.properties

    common.loader=/usr/local/hadoop-2.7.1/share/hadoop/common/*.jar,/usr/local/hadoop-2.7.1/share/hadoop/common/lib/*.jar,/usr/local/hadoop-2.7.1/share/hadoop/hdfs/*.jar,/usr/local/hadoop-2.7.1/share/hadoop/hdfs/lib/*.jar,/usr/local/hadoop-2.7.1/share/hadoop/mapreduce/*.jar,/usr/local/hadoop-2.7.1/share/hadoop/mapreduce/lib/*.jar,/usr/local/hadoop-2.7.1/share/hadoop/tools/*.jar,/usr/local/hadoop-2.7.1/share/hadoop/tools/lib/*.jar,/usr/local/hadoop-2.7.1/share/hadoop/yarn/*.jar,/usr/local/hadoop-2.7.1/share/hadoop/yarn/lib/*.jar,/usr/local/hadoop-2.7.1/share/hadoop/httpfs/tomcat/lib/*.jar

    #因为和hadoop-2.7.1中的jar包冲突,需删除$SQOOP_HOME/server/webapps/sqoop/WEB-INFO/lib/中的log4j的jar

    #这里必须填写全路径,不能使用环境变量

    或者

    在$SQOOP_HOME中建个文件夹例如hadoop_lib,然后将这些jar包cp到此文件夹中,最后将此文件夹路径添加到common.loader属性中,这种方法更加直观些

5、下载mysql驱动包

    mysql-connector-java-5.1.29.jar

    并放到 $SQOOP_HOME/server/lib/ 目录下

    [注意:下载的是 mysql-xxx.tar.gz 只需要把里面的 mysql-connector-java.xxx-bin.jar 考出来即可,这是个坑啊]

6、启动/停止sqoop200

    /usr/local/sqoop-1.99.6-bin-hadoop200/bin/sqoop.sh server start/stop

    查看启动日志:

    vi /usr/local/sqoop-1.99.6-bin-hadoop200/server/logs/catalina.out

7、进入客户端交互目录

    $SQOOP_HOME/bin/sqoop.sh client

    sqoop:000>set server --host hadoopMaster --port 12000 --webapp sqoop 【红色部分为 本机hostname 主机名】

    sqoop:000> show version --all    【查看版本信息】

    sqoop:000> show connector --all 【查看现有连接类型信息】

sqoop2安装_第1张图片

创建数据库连接:

1、创建hadoop连接

    sqoop:000> create link --cid 3

    Creating link for connector with id 3

    Please fill following values to create new link object

    Name: hdfs --设置连接名称

    Link configuration

    HDFS URI: hdfs://hadoopMaster:8020/  --HDFS访问地址

    Hadoop conf directory:/usr/local/hadoop-2.7.1/etc/hadoop

    New link was successfully created with validation status OK and persistent id 1

2、创建mysql连接

    sqoop:000> create link --cid 4

    

    Creating link for connector with id 4

    Please fill following values to create new link object

    Name: mysqltest    --连接名称

    Link configuration

    JDBC Driver Class: com.mysql.jdbc.Driver --连接驱动类

    JDBC Connection String: jdbc:mysql://mysql.server/database --jdbc连接

    Username: sqoop    --数据库用户

    Password: ************    --数据库密码

    JDBC Connection Properties: 

    There are currently 0 values in the map:

    entry# protocol=tcp

    There are currently 1 values in the map:

    protocol = tcp

    entry#按回车

    New link was successfully created with validation status OK and persistent id 2

3、建立job(MySql 到 HDFS)

    sqoop:000>show link

    sqoop:000> create job -f 2 -t 1

    Creating job for links with from id 2 and to id 1

     Please fill following values to create new job object

     Name: Sqoopy  --设置 任务名称

    From database configuration

        Schema name: (Required)sqoop --库名:必填

        Table name:(Required)sqoop --表名:必填

        Table SQL statement:(Optional)  --查询sql用来替代表查询:选填

        Table column names:(Optional)    --查询结果的字段名集合:选填

        Partition column name:(Optional) id    --唯一键,需使用查询语句中的【别名.字段】格式:选填

        Null value allowed for the partition column:(Optional) --选填

        Boundary query:(Optional)  --过滤min(id)及max(id)的sql,过滤字段为checkColumn,会将lastValue传入作为第一个参数,第二个参数为获取的当前最大值【感觉第二个参数这样不好控制哈,想改源码了...】:选填

    Incremental read

        Check column:(Optional) --选填

        Last value:(Optional)--选填

    To HDFS configuration

        Null value: (Optional)--选填

        Output format:

            0 : TEXT_FILE

            1 : SEQUENCE_FILE

        Choose: 0   --选择文件压缩格式

        Compression format:

            0 : NONE

            1 : DEFAULT

            2 : DEFLATE

            3 : GZIP

            4 : BZIP2

            5 : LZO

            6 : LZ4

            7 : SNAPPY

            8 : CUSTOM

        Choose: 0  --选择压缩类型

        Custom compression format:(Optional)  --选填

        Output directory:(Required)/root/projects/sqoop  --HDFS存储目录(目的地)

        Append mode:(Optional)  --是否增量导入:选填

        Driver Config

        Extractors: 2  --提取器

        Loaders: 2     --加载器

        New job was successfully created with validation status OK  and persistent id 1

    #查看job

    sqoop:000> show job

    #执行任务用

    #start job命令去执行这个任务,用--jid来传入任务id

    sqoop:000> start job --jid 1

问题一:

要注意一下$SQOOP_HOME/server/conf/server.xml中的tomcat端口问题,确保这些端口不会和你其他tomcat服务器冲突。

问题二:

删除$SQOOP_HOME/server/sqoop/WEB-INFO/lib中的log4j-1.2.16.jar解决jar包冲突问题

问题三:

在sqoop客户端设置查看job详情:

set option --name verbose --value true




你可能感兴趣的:(sqoop2)