Sqoop1.9.x安装

环境:CentOS 6.5 、Hadoop2.5.2 、sqoop1.99.3

1.9.x版本的sqoop分为服务器和客户端两个部分。服务端可以独立安装hadoop集群的任意一台节点,甚至此节点的hadoop服务不必处于运行状态;在启动服务端后任意安装客户端即可。在新的架构下,命令的执行全部放在了客户端。

1.下载sqoop
[grid@hadoop6 ~]$ wget http://archive.apache.org/dist/sqoop/1.99.3/sqoop-1.99.3-bin-hadoop200.tar.gz
[grid@hadoop6 ~]$ tar -zxf sqoop-1.99.3-bin-hadoop200.tar.gz
[grid@hadoop6 ~]$ mv sqoop-1.99.3-bin-hadoop200 sqoop-1.99.3
2.修改环境变量
[grid@hadoop6 ~]$ vi .bash_profile
export SQOOP_HOME=/home/grid/sqoop-1.99.3
export PATH=$SQOOP_HOME/bin:$PATH
export CATALINA_BASE=$SQOOP_HOME/server
export LOGDIR=$SQOOP_HOME/logs
[grid@hadoop6 ~]$ source .bash_profile
附:hadoop的环境变量设置
export HADOOP_PREFIX=/home/grid/hadoop-2.5.2
export HADOOP_COMMON_HOME=$HADOOP_PREFIX
export HADOOP_HDFS_HOME=$HADOOP_PREFIX
export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
export HADOOP_YARN_HOME=$HADOOP_PREFIX
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export PATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin

3.修改catalina.properties
找到common.loader行,删除hadoop所有jar路径,加入本机hadoop2的jar路径
hadoop-2.x.x/share/hadoop/common/*.jar
hadoop-2.x.x/share/hadoop/common/lib/*.jar
hadoop-2.x.x/share/hadoop/yarn/*.jar
hadoop-2.x.x/share/hadoop/yarn/lib/*.jar
hadoop-2.x.x/share/hadoop/hdfs/*.jar
hadoop-2.x.x/share/hadoop/hdfs/lib/*.jar
hadoop-2.x.x/share/hadoop/mapreduce/*.jar
hadoop-2.x.x/share/hadoop/mapreduce/lib/*.jar
本次实验示例:
[grid@hadoop6 conf]$ pwd
/home/grid/sqoop-1.99.3/server/conf
[grid@hadoop6 conf]$ vim catalina.properties 
common.loader=${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,/home/grid/hadoop-2.5.2/share/hadoop/common/*.jar,/home/grid/hadoop-2.5.2/share/hadoop/common/lib/*.jar,/home/grid/hadoop-2.5.2/share/hadoop/hdfs/*.jar,/home/grid/hadoop-2.5.2/share/hadoop/hdfs/lib/*.jar,/home/grid/hadoop-2.5.2/share/hadoop/mapreduce/*.jar,/home/grid/hadoop-2.5.2/share/hadoop/mapreduce/lib/*.jar,/home/grid/hadoop-2.5.2/share/hadoop/tools/lib/*.jar,/home/grid/hadoop-2.5.2/share/hadoop/yarn/*.jar,/home/grid/hadoop-2.5.2/share/hadoop/yarn/lib/*.jar,/home/grid/hadoop-2.5.2/share/hadoop/httpfs/tomcat/lib/*.jar
如果还需要导入hive或hbase,对应的jar包也需要加入

由于添加的jar包中包含了log4j.jar,为了防止jar包冲突,删除sqoop中的log4j.jar

[grid@hadoop6 sqoop-1.99.3]$ mv ./server/webapps/sqoop/WEB-INF/lib/log4j-1.2.16.jar ./server/webapps/sqoop/WEB-INF/lib/log4j-1.2.16.jar.bak
4.修改sqoop.properties
指定hadoop配置文件路径,修改 内嵌元数据库deby的库名为SQOOP
[grid@hadoop6 conf]$ vim sqoop.properties
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/home/grid/hadoop-2.5.2/etc/hadoop
org.apache.sqoop.repository.jdbc.url=jdbc:derby:@BASEDIR@/repository/SQOOP;create=true
替换@LOGDIR@ 和@BASEDIR@ :
[grid@hadoop6 conf]$ sed -i 's/@BASEDIR@/\/home\/grid\/sqoop-1.99.3\/base/g' sqoop.properties
[grid@hadoop6 conf]$ sed -i 's/@LOGDIR@/\/home\/grid\/sqoop-1.99.3\/logs/g' sqoop.properties
5.拷贝数据库驱动包到$SQOOP_HOME/lib目录下(若目录不存在则首先创建目录)

检测安装是否成功(1.99.4以上版本支持):
a.先赋可执行权限:
[grid@hadoop6 sqoop-1.99.4]$ chmod ug+x ./bin/*

b.再检测:

[grid@hadoop6 sqoop-1.99.4]$ sqoop-tool verify
6.启动sqoop服务
启动hadoop
[grid@hadoop6 ~]$ sh $HADOOP_PREFIX/sbin/start-dfs.sh
[grid@hadoop6 ~]$ sh $HADOOP_PREFIX/sbin/start-yarn.sh

启动sqoop服务
[grid@hadoop6 ~]$ sqoop.sh server start
7.使用sqoop客户端 
[grid@hadoop6 logs]$ sqoop.sh client
Sqoop home directory: /home/grid/sqoop-1.99.3
四月 09, 2015 11:17:13 上午 java.util.prefs.FileSystemPreferences$1 run
信息: Created user preferences directory.
Sqoop Shell: Type 'help' or '\h' for help.


sqoop:000> show version --all
client version:
  Sqoop 1.99.3 revision 2404393160301df16a94716a3034e31b03e27b0b 
  Compiled by mengweid on Fri Oct 18 14:15:53 EDT 2013
server version:
  Sqoop 1.99.3 revision 2404393160301df16a94716a3034e31b03e27b0b 
  Compiled by mengweid on Fri Oct 18 14:15:53 EDT 2013
Protocol version:
  [1]
sqoop:000> 

设置服务器:

sqoop:000> set server --host hadoop6 --port 12000 --webapp sqoop
Server is set successfully
创建连接
sqoop:000> create connection --cid 1
Creating connection for connector with id 1
Please fill following values to create new connection object
Name: 106-mysql-grid

Connection configuration

JDBC Driver Class: com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://192.168.0.106:3306/sqoop
Username: grid
Password: ******
JDBC Connection Properties: 
There are currently 0 values in the map:
entry# 

Security related configuration options

Max connections: 8

There were warnings while create or update, but saved successfully.
Warning message: Can't connect to the database with given credentials: Access denied for user 'grid'@'%' to database 'sqoop' 
New connection was successfully created with validation status ACCEPTABLE and persistent id 3
看到有一个警告:Can't connect to the database with given credentials: Access denied for user 'grid'@'%' to database 'sqoop' ,这个是MySql的连接权限问题,通过以下方法解决:
mysql> GRANT ALL PRIVILEGES ON sqoop.* TO 'grid'@'%';
Query OK, 0 rows affected (0.00 sec)

mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.00 sec)
创建Job
sqoop:000> create job --xid 3 --type import
Creating job for connection with id 3
Please fill following values to create new job object
Name: import-userinfo

Database configuration

Schema name: 
Table name: userinfo
Table SQL statement: 
Table column names: 
Partition column name: 
Nulls in partition column: 
Boundary query: 

Output configuration

Storage type: 
  0 : HDFS
Choose: 0
Output format: 
  0 : TEXT_FILE
  1 : SEQUENCE_FILE
Choose: 0
Compression format: 
  0 : NONE
  1 : DEFAULT
  2 : DEFLATE
  3 : GZIP
  4 : BZIP2
  5 : LZO
  6 : LZ4
  7 : SNAPPY
Choose: 0
Output directory: /userinfo

Throttling resources

Extractors: 
Loaders: 
New job was successfully created with validation status FINE  and persistent id 1
设置可显示Job详情
sqoop:000> set option --name verbose --value true
执行Job:
sqoop:000> start job --jid 1
Submission details
Job ID: 1
Server URL: http://localhost:12000/sqoop/
Created by: grid
Creation date: 2015-04-10 12:41:13 CST
Lastly updated by: grid
External ID: job_1428509720344_0001
	http://hadoop4:8088/proxy/application_1428509720344_0001/
Connector schema: Schema{name=userinfo,columns=[
	FixedPoint{name=uid,nullable=null,byteSize=null,unsigned=null},
	Text{name=uname,nullable=null,size=null},
	FixedPoint{name=age,nullable=null,byteSize=null,unsigned=null},
	FixedPoint{name=sex,nullable=null,byteSize=null,unsigned=null},
	Text{name=address,nullable=null,size=null}]}
2015-04-10 12:41:13 CST: BOOTING  - Progress is not available
查看Job状态:
sqoop:000> status job --jid 1 
发现Job出错:Caused by: Exception: java.io.IOException Message: java.net.ConnectException: Call From hadoop6/192.168.0.108 to hadoop4:10020 failed on connection exception: java.net.ConnectException: 拒绝连接
经查是jobhistory没有启动导致的,启动jobhistory:
[grid@hadoop4 ~]$ mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /home/grid/hadoop-2.5.2/logs/mapred-grid-historyserver-hadoop4.out
再次查看Job状态:
sqoop:000> status job --jid 1
Submission details
Job ID: 1
Server URL: http://localhost:12000/sqoop/
Created by: grid
Creation date: 2015-04-10 12:41:13 CST
Lastly updated by: grid
External ID: job_1428509720344_0001
	hadoop4:19888/jobhistory/job/job_1428509720344_0001
2015-04-10 12:51:41 CST: SUCCEEDED 
Counters:
	org.apache.hadoop.mapreduce.JobCounter
		SLOTS_MILLIS_MAPS: 32124
		MB_MILLIS_MAPS: 32894976
		TOTAL_LAUNCHED_MAPS: 2
		MILLIS_MAPS: 32124
		VCORES_MILLIS_MAPS: 32124
		OTHER_LOCAL_MAPS: 2
	org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
		BYTES_READ: 0
	org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter
		BYTES_WRITTEN: 81
	org.apache.hadoop.mapreduce.TaskCounter
		MAP_INPUT_RECORDS: 0
		MERGED_MAP_OUTPUTS: 0
		PHYSICAL_MEMORY_BYTES: 224673792
		SPILLED_RECORDS: 0
		FAILED_SHUFFLE: 0
		CPU_MILLISECONDS: 6530
		COMMITTED_HEAP_BYTES: 47710208
		VIRTUAL_MEMORY_BYTES: 1689280512
		MAP_OUTPUT_RECORDS: 3
		SPLIT_RAW_BYTES: 235
		GC_TIME_MILLIS: 362
	org.apache.hadoop.mapreduce.FileSystemCounter
		FILE_READ_OPS: 0
		FILE_WRITE_OPS: 0
		FILE_BYTES_READ: 0
		FILE_LARGE_READ_OPS: 0
		HDFS_BYTES_READ: 235
		FILE_BYTES_WRITTEN: 211968
		HDFS_LARGE_READ_OPS: 0
		HDFS_BYTES_WRITTEN: 81
		HDFS_READ_OPS: 8
		HDFS_WRITE_OPS: 4
	org.apache.sqoop.submission.counter.SqoopCounters
		ROWS_READ: 3
Job executed successfully
查看数据是否成功导入到HDFS
[grid@hadoop4 ~]$ hadoop fs -ls /userinfo
-rw-r--r--   2 grid supergroup          0 2015-04-10 12:42 /userinfo/_SUCCESS
-rw-r--r--   2 grid supergroup         25 2015-04-10 12:42 /userinfo/part-m-00000
-rw-r--r--   2 grid supergroup         56 2015-04-10 12:41 /userinfo/part-m-00001
[grid@hadoop4 ~]$ hadoop fs -cat /userinfo/*
0,'王五',21,1,'上海'
1,'张三',18,1,'湖北武汉'
2,'李四',16,0,'北京'





遇到的问题:同样的步骤,使用sqoop1.99.4、sqoop1.99.5,都安装失败,这里请大伙指点,启动报错信息如下

Exception in thread "PurgeThread" org.apache.sqoop.common.SqoopException: JDBCREPO_0009:Failed to finalize transaction
	at org.apache.sqoop.repository.JdbcRepositoryTransaction.close(JdbcRepositoryTransaction.java:115)
	at org.apache.sqoop.repository.JdbcRepository.doWithConnection(JdbcRepository.java:109)
	at org.apache.sqoop.repository.JdbcRepository.doWithConnection(JdbcRepository.java:61)
	at org.apache.sqoop.repository.JdbcRepository.purgeSubmissions(JdbcRepository.java:564)
	at org.apache.sqoop.driver.JobManager$PurgeThread.run(JobManager.java:667)
Caused by: java.sql.SQLNonTransientConnectionException: No current connection.
	at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
	at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Unknown Source)
	at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Unknown Source)
	at org.apache.derby.impl.jdbc.Util.noCurrentConnection(Unknown Source)
	at org.apache.derby.impl.jdbc.EmbedConnection.checkIfClosed(Unknown Source)
	at org.apache.derby.impl.jdbc.EmbedConnection.setupContextStack(Unknown Source)
	at org.apache.derby.impl.jdbc.EmbedConnection.commit(Unknown Source)
	at org.apache.commons.dbcp.DelegatingConnection.commit(DelegatingConnection.java:334)
	at org.apache.commons.dbcp.DelegatingConnection.commit(DelegatingConnection.java:334)
	at org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.commit(PoolingDataSource.java:211)
	at org.apache.sqoop.repository.JdbcRepositoryTransaction.close(JdbcRepositoryTransaction.java:112)
	... 4 more
Caused by: java.sql.SQLException: No current connection.
	at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
	at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source)
	... 15 more




你可能感兴趣的:(Sqoop1.9.x安装)