sqoop1.99.4搭建(hadoop2.4.0)

1. 版本信息:hadoop版本:2.4.0,sqoop版本:sqoop-1.99.4-bin-hadoop200

2. 首先将下载好的sqoop-1.99.4-bin-hadoop200.tar.gz解压,并放到常用程序安装目录:/usr

3. 修改环境变量:
sudo /etc/profile:
添加如下内容:
#sqoop
export SQOOP_HOME=/usr/sqoop-1.99.4-bin-hadoop200
export PATH=$SQOOP_HOME/bin:$PATH
export CATALINA_HOME=$SQOOP_HOME/server
export LOGDIR=$SQOOP_HOME/logs
保存退出即时生效:
source /etc/profile
4. 修改sqoop1.99.4的配置:
1)sqoop.properties 
#修改其中的hadoop安装目录
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/usr/hadoop/etc/hadoop

#修改@LOGDIR@和@BASEDIR@:

文档中写的很明白:

LOGDIR:The absolute path to the directory where system genearated log files will be kept.

BASEDIR:The absolute path to the directory where Sqoop 2 is installed

所以将LOGDIR设置为:/usr/sqoop-1.99.4-bin-hadoop200/server/logs

有三处需要修改:

org.apache.sqoop.log4j.appender.file.File=/usr/sqoop-1.99.4-bin-hadoop200/server/logs/sqoop.log

org.apache.sqoop.auditlogger.default.file=/usr/sqoop-1.99.4-bin-hadoop200/server/logs/default.audit

org.apache.sqoop.repository.sysprop.derby.stream.error.file=/usr/sqoop-1.99.4-bin-hadoop200/server/logs/derbyrepo.log

BASEDIR设置为:/usr/sqoop-1.99.4-bin-hadoop200

有一处需要修改:

org.apache.sqoop.repository.jdbc.url=jdbc:derby:/usr/sqoop-1.99.4-bin-hadoop200/repository/db;create=true

当然这两个文件可以不修改,只不过默认存放在桌面,不方便整理查看.

2)catalina.properties的配置:
#同时将hadoop目录下的jar包都引进来,后期涉及到hive,hbase,也可以同时引进或者建立一个专门的jar文件夹用于存储jar文件:

common.loader=${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,/usr/hadoop/share/hadoop/common/*.jar,/usr/hadoop/share/hadoop/common/lib/*.jar,/usr/hadoop/share/hadoop/hdfs/*.jar,/usr/hadoop/share/hadoop/hdfs/lib/*.jar,/usr/hadoop/share/hadoop/mapreduce/*.jar,/usr/hadoop/share/hadoop/mapreduce/lib/*.jar,/usr/hadoop/share/hadoop/httpfs/tomcat/lib/*.jar,/usr/hadoop/share/hadoop/tools/*.jar,/usr/hadoop/share/hadoop/tools/lib/*.jar,/usr/hadoop/share/hadoop/yarn/*.jar,/usr/hadoop/share/hadoop/yarn/lib/*.jar

本人就是在这里配置除了错误,配置不仔细,没有将多余的项注释掉或删除,导致sqoop一直搭建失败,还专门开贴询问大家意见(大家可能没遇到,最后自己解决了)http://bbs.csdn.net/topics/390981830

5. jar包冲突问题,将sqoop自带的log4j-1.2.16.jar删除(直接在sqoop文件夹中搜索,然后删除)

因为hadoop的版本为log4j-1.2.17,版本冲突,删除就版本即可
6.下载mysql驱动包
mysql-connector-java-5.1.16-bin.jar

下载地址:http://downloads.mysql.com/archives/c-j/ 

将驱动包放至安装目录下的lib中,/usr/sqoop-1.99.4-bin-hadoop200/lib,lib文件夹需要自己创建

7. 权限设置:

sqoop.sh默认是没有运行权限的,所以需要给sqoop.sh赋予运行权限
sudo chmod 777 /usr/sqoop/bin/sqoop.sh
运行sqoop.sh,此时会提示另外一个脚本没有运行,可根据提示赋予权限给相应文件
或者把整个sqoop目录赋予所有权限。
sudo chmod 777 -R /usr/sqoop-1.99.4-bin-hadoop200

8. 工具验证:

sqoop2-tool verify

可以通过日志查看报错信息.


9. 启动/停止sqoop200 server

hadoop@master:~$ sqoop2-server start

或者:
hadoop@master:~$ sqoop.sh server start/stop

hadoop@master:~$ sqoop2-server start
Sqoop home directory: /usr/sqoop-1.99.4-bin-hadoop200
Setting SQOOP_HTTP_PORT:     12000
Setting SQOOP_ADMIN_PORT:     12001
Using   CATALINA_OPTS:       
Adding to CATALINA_OPTS:    -Dsqoop.http.port=12000 -Dsqoop.admin.port=12001
Using CATALINA_BASE:   /usr/sqoop-1.99.4-bin-hadoop200/server
Using CATALINA_HOME:   /usr/sqoop-1.99.4-bin-hadoop200/server
Using CATALINA_TMPDIR: /usr/sqoop-1.99.4-bin-hadoop200/server/temp
Using JRE_HOME:        /usr/jdk1.8.0_25/jre
Using CLASSPATH:       /usr/sqoop-1.99.4-bin-hadoop200/server/bin/bootstrap.jar

查看启动日志:
/usr/sqoop-1.99.4-bin-hadoop200/server/logs/catalina.out 查看
10. 进入客户端交互目录
hadoop@master:~$ sqoop.sh client
Sqoop home directory: /usr/sqoop-1.99.4-bin-hadoop200
Sqoop Shell: Type 'help' or '\h' for help.

sqoop:000>

  为客户端配置服务器:
sqoop:000> set server --host 10.15.43.214 --port 12000 --webapp sqoop
Server is set successfully
  查版本信息:
sqoop:000> show version --all                                        
client version:
  Sqoop 1.99.4 source revision 2475a76ef70a0660f381c75c3d47d0d24f00b57f 
  Compiled by gshapira on Sun Nov 16 02:50:00 PST 2014
server version:
  Sqoop 1.99.4 source revision 2475a76ef70a0660f381c75c3d47d0d24f00b57f 
  Compiled by gshapira on Sun Nov 16 02:50:00 PST 2014
API versions:
  [v1]
  显示连接器(我这里有两个):
sqoop:000> show connector --all
2 connector(s) to show: 
Connector with id 1:
  Name: generic-jdbc-connector 
  Class: org.apache.sqoop.connector.jdbc.GenericJdbcConnector
  Version: 1.99.4
  Supported Directions FROM/TO
    link config 1:
      Name: linkConfig
      Label: Link configuration
      Help: You must supply the information requested in order to create a link object.
      Input 1:

.....此处省略

.....

Connector with id 2:
  Name: hdfs-connector 
  Class: org.apache.sqoop.connector.hdfs.HdfsConnector
  Version: 1.99.4
  Supported Directions FROM/TO
    link config 1:
      Name: linkConfig
      Label: Link configuration
      Help: Here you supply information necessary to connect to HDFS
      Input 1:
        Name: linkConfig.uri
        Label: HDFS URI
        Help: HDFS URI used to connect to HDFS
        Type: STRING
        Sensitive: false
        Size: 255
    FROM Job config 1:
      Name: fromJobConfig

有两个连接器:generic-jdbc-connector hdfs-connector 

hadoop@master:~$ jps
4002 SqoopShell
3042 NameNode
3261 SecondaryNameNode
11597 Jps
3917 Bootstrap
3423 ResourceManager
hadoop@master:~$ 

11. 创建数据库连接:

1)建立连接

sqoop:000> create link --cid 1
Creating link for connector with id 1
Please fill following values to create new link object
Name: First Link


Link configuration


JDBC Driver Class: com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://localhost:3306/database
Username: sqoop
Password: *****
JDBC Connection Properties: 
There are currently 0 values in the map:
entry# protocol=tcp
There are currently 1 values in the map:
protocol = tcp

New link was successfully created with validation status WARNING and persistent id 1

2)建立HDFS连接

sqoop:000> create link --cid 2
Creating link for connector with id 2
Please fill following values to create new link object
Name: Second Link


Link configuration


HDFS URI: hdfs://10.15.43.214:8020/
New link was successfully created with validation status OK and persistent id 2

3)创建任务

sqoop:000> create job -f 1 -t 2
Creating job for links with from id 1 and to id 2
Please fill following values to create new job object
Name: Sqoopy


From database configuration


Schema name: (必填)sqoop
Table name: (必填)sqoop
Table SQL statement: 
Table column names: 
Partition column name: (必填)id
Null value allowed for the partition column: 
Boundary query: 


ToJob configuration


Output format: 
  0 : TEXT_FILE
  1 : SEQUENCE_FILE
Choose: 0
Compression format: 
  0 : NONE
  1 : DEFAULT
  2 : DEFLATE
  3 : GZIP
  4 : BZIP2
  5 : LZO
  6 : LZ4
  7 : SNAPPY
  8 : CUSTOM
Choose: 0
Custom compression format: 
Output directory: /home/hadoop/output_1


Throttling resources


Extractors: 2
Loaders: 2
New job was successfully created with validation status OK  and persistent id 1
4)开始任务:
sqoop:000> start job --jid 1

最后常用项:

1.查看版本号:show version --all
2.建立连接:create link --cid 1/2  #1代表jdbc连接,2代表hdfs连接,可用show connector查看  
3.建立job:create job --f 1 --t 2  #link1导数据到link2
4.启动Job:start job --jid 1
5.查看Job状态:status job --jid 1
6.查看连接:show link
7.查看job:show job
8.查看连接器:show connector  #查看连接器的id号,用于建立连接
9.配置客户端使用服务:set server --host 10.15.43.214 --port 12000 --webapp sqoop


搭建sqoop花了不少时间,看了许多文章,最后还是感谢下面三个博客:

http://www.cnblogs.com/likehua/p/3825489.html

http://94it.net/a/jingxuanboke/2015/0115/449167.html

http://houshangxiao.iteye.com/blog/2070057


你可能感兴趣的:(sqoop,sqoop2,mysql,hadoop)