CDH5.7中sqoop2使用

---------------------------------------sqoop2版本不支持直接导成hive表的形式,只能导入到hdfs中--------------------
在官网下载对应版本的额包http://www.apache.org/dyn/closer.lua/sqoop/1.99.5


设置配置文件/home/dba/sqoop2-1.99.5-cdh5.7.0/server/conf/catalina.properties,最后把CDH的jar包放入到目录中
common.loader=${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,/usr/lib/hadoop/*.jar,/usr/
lib/hadoop/lib/*.jar,/usr/lib/hadoop-hdfs/*.jar,/usr/lib/hadoop-hdfs/lib/*.jar,/usr/lib/hadoop-mapreduce/*.jar,/usr/lib/hadoop-mapreduce/lib/*.jar,/usr/lib/hadoop-yarn
/*.jar,/usr/lib/hadoop-yarn/lib/*.jar,/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/jars/*.jar


启动server的过程可能会报错,查看日志看具体信息,在启动服务的过程中报了很多类无法加载初始化之类的问题,把下面的jar包都移走或改名了

[dba@dmall1 jars]$ ls|grep bak
derby-10.10.1.1.jarbak
derby-10.10.2.0.jarbak
derby-10.11.1.1.jarbak
derby-10.4.2.0.jarbak
derby-10.9.1.0.jarbak
derbyclient-10.8.2.2.jarbak
derbynet-10.8.2.2.jarbak
sqoop-connector-generic-jdbc-1.99.5-cdh5.7.0.jarbak
sqoop-connector-hdfs-1.99.5-cdh5.7.0.jarbak
sqoop-connector-kafka-1.99.5-cdh5.7.0.jarbak
sqoop-connector-kite-1.99.5-cdh5.7.0.jarbak
sqoop-repository-derby-1.99.5-cdh5.7.0.jarbak


cd /home/dba/sqoop2-1.99.5-cdh5.7.0/bin
./sqoop.sh server start
登陆客户端
./sqoop.sh client


在创建连接前先看下有哪些可用的连接器,在创建连接前选择对应的连接器
show connector


创建到hdfs的连接,及创建到mysql的连接
create link -c 4
Creating link for connector with id 2
Please fill following values to create new link object
Name: First Link


Link configuration
JDBC Driver Class: com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://xxx:3307/test   此处写机器名也是找不到地址
Username: tungsten
Password: tungsten
JDBC Connection Properties:
There are currently 0 values in the map:
entry#protocol=tcp
New link was successfully created with validation status OK and persistent id 1


create link -c 2
Creating link for connector with id 1
Please fill following values to create new link object
Name: Second Link


Link configuration
HDFS URI: hdfs://xxx:8020/
New link was successfully created with validation status OK and persistent id 2


show link --all
创建job
create job -f 1 -t 2
启动job
start job -j 1 -s
查看job状态
status job -j 1
更新连接命令
update link --lid 1
更新job命令
update job --jid 1
查看job信息
show job --all


--------------------------上面的命令在加载完表后,需要load到hive中-----------------------------


hive上创建表
create table t(id int, name string) row format delimited fields terminated by ',';

LOAD DATA INPATH '/user/dba/1c055391-9e57-4c0a-8461-ba57d8c5488c.txt' INTO TABLE test.t;
LOAD DATA INPATH '/user/dba/c36cafd7-98a3-41d3-b593-fcf84a9e8549.txt' INTO TABLE test.t;


load完后,hdfs上对应的文件被删除了


sqoop1在这个版本hadoop上启动的时候报hadoop中的类找不到,没有在去适配使用,java的开源东西,这种jar包冲突,类找不到的问题还有版本匹配问题真TM烦

你可能感兴趣的:(hadoop)