1. 版本信息:hadoop版本:2.4.0,sqoop版本:sqoop-1.99.4-bin-hadoop200
2. 首先将下载好的sqoop-1.99.4-bin-hadoop200.tar.gz解压,并放到常用程序安装目录:/usr
3. 修改环境变量:
sudo /etc/profile:
添加如下内容:
#sqoop
export SQOOP_HOME=/usr/sqoop-1.99.4-bin-hadoop200
export PATH=$SQOOP_HOME/bin:$PATH
export CATALINA_HOME=$SQOOP_HOME/server
export LOGDIR=$SQOOP_HOME/logs
保存退出即时生效:
source /etc/profile
4. 修改sqoop1.99.4的配置:
1)sqoop.properties
#修改其中的hadoop安装目录
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/usr/hadoop/etc/hadoop
#修改@LOGDIR@和@BASEDIR@:
文档中写的很明白:
LOGDIR:The absolute path to the directory where system genearated log files will be kept.
BASEDIR:The absolute path to the directory where Sqoop 2 is installed
所以将LOGDIR设置为:/usr/sqoop-1.99.4-bin-hadoop200/server/logs
有三处需要修改:
org.apache.sqoop.log4j.appender.file.File=/usr/sqoop-1.99.4-bin-hadoop200/server/logs/sqoop.log
org.apache.sqoop.auditlogger.default.file=/usr/sqoop-1.99.4-bin-hadoop200/server/logs/default.audit
org.apache.sqoop.repository.sysprop.derby.stream.error.file=/usr/sqoop-1.99.4-bin-hadoop200/server/logs/derbyrepo.log
BASEDIR设置为:/usr/sqoop-1.99.4-bin-hadoop200
有一处需要修改:
org.apache.sqoop.repository.jdbc.url=jdbc:derby:/usr/sqoop-1.99.4-bin-hadoop200/repository/db;create=true
当然这两个文件可以不修改,只不过默认存放在桌面,不方便整理查看.
2)catalina.properties的配置:
#同时将hadoop目录下的jar包都引进来,后期涉及到hive,hbase,也可以同时引进或者建立一个专门的jar文件夹用于存储jar文件:
common.loader=${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,/usr/hadoop/share/hadoop/common/*.jar,/usr/hadoop/share/hadoop/common/lib/*.jar,/usr/hadoop/share/hadoop/hdfs/*.jar,/usr/hadoop/share/hadoop/hdfs/lib/*.jar,/usr/hadoop/share/hadoop/mapreduce/*.jar,/usr/hadoop/share/hadoop/mapreduce/lib/*.jar,/usr/hadoop/share/hadoop/httpfs/tomcat/lib/*.jar,/usr/hadoop/share/hadoop/tools/*.jar,/usr/hadoop/share/hadoop/tools/lib/*.jar,/usr/hadoop/share/hadoop/yarn/*.jar,/usr/hadoop/share/hadoop/yarn/lib/*.jar
本人就是在这里配置除了错误,配置不仔细,没有将多余的项注释掉或删除,导致sqoop一直搭建失败,还专门开贴询问大家意见(大家可能没遇到,最后自己解决了)http://bbs.csdn.net/topics/390981830
5. jar包冲突问题,将sqoop自带的log4j-1.2.16.jar删除(直接在sqoop文件夹中搜索,然后删除)
因为hadoop的版本为log4j-1.2.17,版本冲突,删除就版本即可
6.下载mysql驱动包
mysql-connector-java-5.1.16-bin.jar
下载地址:http://downloads.mysql.com/archives/c-j/
将驱动包放至安装目录下的lib中,/usr/sqoop-1.99.4-bin-hadoop200/lib,lib文件夹需要自己创建
7. 权限设置:
sqoop.sh默认是没有运行权限的,所以需要给sqoop.sh赋予运行权限
sudo chmod 777 /usr/sqoop/bin/sqoop.sh
运行sqoop.sh,此时会提示另外一个脚本没有运行,可根据提示赋予权限给相应文件
或者把整个sqoop目录赋予所有权限。
sudo chmod 777 -R /usr/sqoop-1.99.4-bin-hadoop200
8. 工具验证:
sqoop2-tool verify
可以通过日志查看报错信息.
9. 启动/停止sqoop200 server
hadoop@master:~$ sqoop2-server start
或者:
hadoop@master:~$ sqoop.sh server start/stop
hadoop@master:~$ sqoop2-server start
Sqoop home directory: /usr/sqoop-1.99.4-bin-hadoop200
Setting SQOOP_HTTP_PORT: 12000
Setting SQOOP_ADMIN_PORT: 12001
Using CATALINA_OPTS:
Adding to CATALINA_OPTS: -Dsqoop.http.port=12000 -Dsqoop.admin.port=12001
Using CATALINA_BASE: /usr/sqoop-1.99.4-bin-hadoop200/server
Using CATALINA_HOME: /usr/sqoop-1.99.4-bin-hadoop200/server
Using CATALINA_TMPDIR: /usr/sqoop-1.99.4-bin-hadoop200/server/temp
Using JRE_HOME: /usr/jdk1.8.0_25/jre
Using CLASSPATH: /usr/sqoop-1.99.4-bin-hadoop200/server/bin/bootstrap.jar
查看启动日志:
/usr/sqoop-1.99.4-bin-hadoop200/server/logs/catalina.out 查看
10. 进入客户端交互目录
hadoop@master:~$ sqoop.sh client
Sqoop home directory: /usr/sqoop-1.99.4-bin-hadoop200
Sqoop Shell: Type 'help' or '\h' for help.
sqoop:000>
为客户端配置服务器:
sqoop:000> set server --host 10.15.43.214 --port 12000 --webapp sqoop
Server is set successfully
查版本信息:
sqoop:000> show version --all
client version:
Sqoop 1.99.4 source revision 2475a76ef70a0660f381c75c3d47d0d24f00b57f
Compiled by gshapira on Sun Nov 16 02:50:00 PST 2014
server version:
Sqoop 1.99.4 source revision 2475a76ef70a0660f381c75c3d47d0d24f00b57f
Compiled by gshapira on Sun Nov 16 02:50:00 PST 2014
API versions:
[v1]
显示连接器(我这里有两个):
sqoop:000> show connector --all
2 connector(s) to show:
Connector with id 1:
Name: generic-jdbc-connector
Class: org.apache.sqoop.connector.jdbc.GenericJdbcConnector
Version: 1.99.4
Supported Directions FROM/TO
link config 1:
Name: linkConfig
Label: Link configuration
Help: You must supply the information requested in order to create a link object.
Input 1:
.....此处省略
.....
Connector with id 2:
Name: hdfs-connector
Class: org.apache.sqoop.connector.hdfs.HdfsConnector
Version: 1.99.4
Supported Directions FROM/TO
link config 1:
Name: linkConfig
Label: Link configuration
Help: Here you supply information necessary to connect to HDFS
Input 1:
Name: linkConfig.uri
Label: HDFS URI
Help: HDFS URI used to connect to HDFS
Type: STRING
Sensitive: false
Size: 255
FROM Job config 1:
Name: fromJobConfig
有两个连接器:generic-jdbc-connector 和hdfs-connector
hadoop@master:~$ jps
4002 SqoopShell
3042 NameNode
3261 SecondaryNameNode
11597 Jps
3917 Bootstrap
3423 ResourceManager
hadoop@master:~$
11. 创建数据库连接:
1)建立连接
sqoop:000> create link --cid 1
Creating link for connector with id 1
Please fill following values to create new link object
Name: First Link
Link configuration
JDBC Driver Class: com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://localhost:3306/database
Username: sqoop
Password: *****
JDBC Connection Properties:
There are currently 0 values in the map:
entry# protocol=tcp
There are currently 1 values in the map:
protocol = tcp
New link was successfully created with validation status WARNING and persistent id 1
2)建立HDFS连接
sqoop:000> create link --cid 2
Creating link for connector with id 2
Please fill following values to create new link object
Name: Second Link
Link configuration
HDFS URI: hdfs://10.15.43.214:8020/
New link was successfully created with validation status OK and persistent id 2
3)创建任务
sqoop:000> create job -f 1 -t 2
Creating job for links with from id 1 and to id 2
Please fill following values to create new job object
Name: Sqoopy
From database configuration
Schema name: (必填)sqoop
Table name: (必填)sqoop
Table SQL statement:
Table column names:
Partition column name: (必填)id
Null value allowed for the partition column:
Boundary query:
ToJob configuration
Output format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
Choose: 0
Compression format:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 0
Custom compression format:
Output directory: /home/hadoop/output_1
Throttling resources
Extractors: 2
Loaders: 2
New job was successfully created with validation status OK and persistent id 1
4)开始任务:
sqoop:000> start job --jid 1
最后常用项:
1.查看版本号:show version --all
2.建立连接:create link --cid 1/2 #1代表jdbc连接,2代表hdfs连接,可用show connector查看
3.建立job:create job --f 1 --t 2 #link1导数据到link2
4.启动Job:start job --jid 1
5.查看Job状态:status job --jid 1
6.查看连接:show link
7.查看job:show job
8.查看连接器:show connector #查看连接器的id号,用于建立连接
9.配置客户端使用服务:set server --host 10.15.43.214 --port 12000 --webapp sqoop
搭建sqoop花了不少时间,看了许多文章,最后还是感谢下面三个博客:
http://www.cnblogs.com/likehua/p/3825489.html
http://94it.net/a/jingxuanboke/2015/0115/449167.html
http://houshangxiao.iteye.com/blog/2070057