[grid@hadoop6 ~]$ wget http://archive.apache.org/dist/sqoop/1.99.3/sqoop-1.99.3-bin-hadoop200.tar.gz [grid@hadoop6 ~]$ tar -zxf sqoop-1.99.3-bin-hadoop200.tar.gz [grid@hadoop6 ~]$ mv sqoop-1.99.3-bin-hadoop200 sqoop-1.99.32.修改环境变量
[grid@hadoop6 ~]$ vi .bash_profile export SQOOP_HOME=/home/grid/sqoop-1.99.3 export PATH=$SQOOP_HOME/bin:$PATH export CATALINA_BASE=$SQOOP_HOME/server export LOGDIR=$SQOOP_HOME/logs [grid@hadoop6 ~]$ source .bash_profile附:hadoop的环境变量设置
[grid@hadoop6 conf]$ pwd /home/grid/sqoop-1.99.3/server/conf [grid@hadoop6 conf]$ vim catalina.properties common.loader=${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,/home/grid/hadoop-2.5.2/share/hadoop/common/*.jar,/home/grid/hadoop-2.5.2/share/hadoop/common/lib/*.jar,/home/grid/hadoop-2.5.2/share/hadoop/hdfs/*.jar,/home/grid/hadoop-2.5.2/share/hadoop/hdfs/lib/*.jar,/home/grid/hadoop-2.5.2/share/hadoop/mapreduce/*.jar,/home/grid/hadoop-2.5.2/share/hadoop/mapreduce/lib/*.jar,/home/grid/hadoop-2.5.2/share/hadoop/tools/lib/*.jar,/home/grid/hadoop-2.5.2/share/hadoop/yarn/*.jar,/home/grid/hadoop-2.5.2/share/hadoop/yarn/lib/*.jar,/home/grid/hadoop-2.5.2/share/hadoop/httpfs/tomcat/lib/*.jar如果还需要导入hive或hbase,对应的jar包也需要加入
由于添加的jar包中包含了log4j.jar,为了防止jar包冲突,删除sqoop中的log4j.jar
[grid@hadoop6 sqoop-1.99.3]$ mv ./server/webapps/sqoop/WEB-INF/lib/log4j-1.2.16.jar ./server/webapps/sqoop/WEB-INF/lib/log4j-1.2.16.jar.bak4.修改sqoop.properties
[grid@hadoop6 conf]$ vim sqoop.properties org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/home/grid/hadoop-2.5.2/etc/hadoop org.apache.sqoop.repository.jdbc.url=jdbc:derby:@BASEDIR@/repository/SQOOP;create=true替换@LOGDIR@ 和@BASEDIR@ :
[grid@hadoop6 conf]$ sed -i 's/@BASEDIR@/\/home\/grid\/sqoop-1.99.3\/base/g' sqoop.properties [grid@hadoop6 conf]$ sed -i 's/@LOGDIR@/\/home\/grid\/sqoop-1.99.3\/logs/g' sqoop.properties5.拷贝数据库驱动包到$SQOOP_HOME/lib目录下(若目录不存在则首先创建目录)
[grid@hadoop6 sqoop-1.99.4]$ chmod ug+x ./bin/*
b.再检测:
[grid@hadoop6 sqoop-1.99.4]$ sqoop-tool verify6.启动sqoop服务
[grid@hadoop6 ~]$ sqoop.sh server start7.使用sqoop客户端
[grid@hadoop6 logs]$ sqoop.sh client Sqoop home directory: /home/grid/sqoop-1.99.3 四月 09, 2015 11:17:13 上午 java.util.prefs.FileSystemPreferences$1 run 信息: Created user preferences directory. Sqoop Shell: Type 'help' or '\h' for help. sqoop:000> show version --all client version: Sqoop 1.99.3 revision 2404393160301df16a94716a3034e31b03e27b0b Compiled by mengweid on Fri Oct 18 14:15:53 EDT 2013 server version: Sqoop 1.99.3 revision 2404393160301df16a94716a3034e31b03e27b0b Compiled by mengweid on Fri Oct 18 14:15:53 EDT 2013 Protocol version: [1] sqoop:000>
设置服务器:
sqoop:000> set server --host hadoop6 --port 12000 --webapp sqoop Server is set successfully创建连接
sqoop:000> create connection --cid 1 Creating connection for connector with id 1 Please fill following values to create new connection object Name: 106-mysql-grid Connection configuration JDBC Driver Class: com.mysql.jdbc.Driver JDBC Connection String: jdbc:mysql://192.168.0.106:3306/sqoop Username: grid Password: ****** JDBC Connection Properties: There are currently 0 values in the map: entry# Security related configuration options Max connections: 8 There were warnings while create or update, but saved successfully. Warning message: Can't connect to the database with given credentials: Access denied for user 'grid'@'%' to database 'sqoop' New connection was successfully created with validation status ACCEPTABLE and persistent id 3看到有一个警告:Can't connect to the database with given credentials: Access denied for user 'grid'@'%' to database 'sqoop' ,这个是MySql的连接权限问题,通过以下方法解决:
mysql> GRANT ALL PRIVILEGES ON sqoop.* TO 'grid'@'%'; Query OK, 0 rows affected (0.00 sec) mysql> FLUSH PRIVILEGES; Query OK, 0 rows affected (0.00 sec)创建Job
sqoop:000> create job --xid 3 --type import Creating job for connection with id 3 Please fill following values to create new job object Name: import-userinfo Database configuration Schema name: Table name: userinfo Table SQL statement: Table column names: Partition column name: Nulls in partition column: Boundary query: Output configuration Storage type: 0 : HDFS Choose: 0 Output format: 0 : TEXT_FILE 1 : SEQUENCE_FILE Choose: 0 Compression format: 0 : NONE 1 : DEFAULT 2 : DEFLATE 3 : GZIP 4 : BZIP2 5 : LZO 6 : LZ4 7 : SNAPPY Choose: 0 Output directory: /userinfo Throttling resources Extractors: Loaders: New job was successfully created with validation status FINE and persistent id 1设置可显示Job详情
sqoop:000> set option --name verbose --value true执行Job:
sqoop:000> start job --jid 1 Submission details Job ID: 1 Server URL: http://localhost:12000/sqoop/ Created by: grid Creation date: 2015-04-10 12:41:13 CST Lastly updated by: grid External ID: job_1428509720344_0001 http://hadoop4:8088/proxy/application_1428509720344_0001/ Connector schema: Schema{name=userinfo,columns=[ FixedPoint{name=uid,nullable=null,byteSize=null,unsigned=null}, Text{name=uname,nullable=null,size=null}, FixedPoint{name=age,nullable=null,byteSize=null,unsigned=null}, FixedPoint{name=sex,nullable=null,byteSize=null,unsigned=null}, Text{name=address,nullable=null,size=null}]} 2015-04-10 12:41:13 CST: BOOTING - Progress is not available查看Job状态:
sqoop:000> status job --jid 1发现Job出错:Caused by: Exception: java.io.IOException Message: java.net.ConnectException: Call From hadoop6/192.168.0.108 to hadoop4:10020 failed on connection exception: java.net.ConnectException: 拒绝连接
[grid@hadoop4 ~]$ mr-jobhistory-daemon.sh start historyserver starting historyserver, logging to /home/grid/hadoop-2.5.2/logs/mapred-grid-historyserver-hadoop4.out再次查看Job状态:
sqoop:000> status job --jid 1 Submission details Job ID: 1 Server URL: http://localhost:12000/sqoop/ Created by: grid Creation date: 2015-04-10 12:41:13 CST Lastly updated by: grid External ID: job_1428509720344_0001 hadoop4:19888/jobhistory/job/job_1428509720344_0001 2015-04-10 12:51:41 CST: SUCCEEDED Counters: org.apache.hadoop.mapreduce.JobCounter SLOTS_MILLIS_MAPS: 32124 MB_MILLIS_MAPS: 32894976 TOTAL_LAUNCHED_MAPS: 2 MILLIS_MAPS: 32124 VCORES_MILLIS_MAPS: 32124 OTHER_LOCAL_MAPS: 2 org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter BYTES_READ: 0 org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter BYTES_WRITTEN: 81 org.apache.hadoop.mapreduce.TaskCounter MAP_INPUT_RECORDS: 0 MERGED_MAP_OUTPUTS: 0 PHYSICAL_MEMORY_BYTES: 224673792 SPILLED_RECORDS: 0 FAILED_SHUFFLE: 0 CPU_MILLISECONDS: 6530 COMMITTED_HEAP_BYTES: 47710208 VIRTUAL_MEMORY_BYTES: 1689280512 MAP_OUTPUT_RECORDS: 3 SPLIT_RAW_BYTES: 235 GC_TIME_MILLIS: 362 org.apache.hadoop.mapreduce.FileSystemCounter FILE_READ_OPS: 0 FILE_WRITE_OPS: 0 FILE_BYTES_READ: 0 FILE_LARGE_READ_OPS: 0 HDFS_BYTES_READ: 235 FILE_BYTES_WRITTEN: 211968 HDFS_LARGE_READ_OPS: 0 HDFS_BYTES_WRITTEN: 81 HDFS_READ_OPS: 8 HDFS_WRITE_OPS: 4 org.apache.sqoop.submission.counter.SqoopCounters ROWS_READ: 3 Job executed successfully查看数据是否成功导入到HDFS
[grid@hadoop4 ~]$ hadoop fs -ls /userinfo -rw-r--r-- 2 grid supergroup 0 2015-04-10 12:42 /userinfo/_SUCCESS -rw-r--r-- 2 grid supergroup 25 2015-04-10 12:42 /userinfo/part-m-00000 -rw-r--r-- 2 grid supergroup 56 2015-04-10 12:41 /userinfo/part-m-00001 [grid@hadoop4 ~]$ hadoop fs -cat /userinfo/* 0,'王五',21,1,'上海' 1,'张三',18,1,'湖北武汉' 2,'李四',16,0,'北京'
遇到的问题:同样的步骤,使用sqoop1.99.4、sqoop1.99.5,都安装失败,这里请大伙指点,启动报错信息如下
Exception in thread "PurgeThread" org.apache.sqoop.common.SqoopException: JDBCREPO_0009:Failed to finalize transaction at org.apache.sqoop.repository.JdbcRepositoryTransaction.close(JdbcRepositoryTransaction.java:115) at org.apache.sqoop.repository.JdbcRepository.doWithConnection(JdbcRepository.java:109) at org.apache.sqoop.repository.JdbcRepository.doWithConnection(JdbcRepository.java:61) at org.apache.sqoop.repository.JdbcRepository.purgeSubmissions(JdbcRepository.java:564) at org.apache.sqoop.driver.JobManager$PurgeThread.run(JobManager.java:667) Caused by: java.sql.SQLNonTransientConnectionException: No current connection. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.noCurrentConnection(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.checkIfClosed(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.setupContextStack(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.commit(Unknown Source) at org.apache.commons.dbcp.DelegatingConnection.commit(DelegatingConnection.java:334) at org.apache.commons.dbcp.DelegatingConnection.commit(DelegatingConnection.java:334) at org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.commit(PoolingDataSource.java:211) at org.apache.sqoop.repository.JdbcRepositoryTransaction.close(JdbcRepositoryTransaction.java:112) ... 4 more Caused by: java.sql.SQLException: No current connection. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) ... 15 more