sqoop1.4.6安装
sqoop2不建议生产中使用,这里测试sqoop1.
--下载sqoop-1.4.6
http://mirror.bit.edu.cn/apache/sqoop/1.4.6/
[ ] sqoop-1.4.6.bin__hadoop-0.23.tar.gz 08-May-2015 16:28 16M
[ ] sqoop-1.4.6.bin__hadoop-1.0.0.tar.gz 08-May-2015 16:28 16M
[ ] sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz 08-May-2015 16:28 16M
[ ] sqoop-1.4.6.tar.gz 08-May-2015 16:28 2.1M
选择对应的hadoop版本,我的hadoop版本是hadoop-2.7.1
[hadoop@node1 ~]$ tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
--设置环境变量
[hadoop@node1 bin]$ sudo vi /etc/profile
export JAVA_HOME=/home/hadoop/jdk1.7.0_67
export HADOOP_HOME=/home/hadoop/hadoop-2.7.1
export ZK_HOME=/home/hadoop/zookeeper-3.4.6
export HIVE_HOME=/home/hadoop/apache-hive-1.2.1-bin
export HBASE_HOME=/home/hadoop/hbase-1.1.2
export SQOOP_HOME=/home/hadoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha
export PATH=$PATH:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${ZK_HOME}/bin:${HIVE_HOME}/bin:${HBASE_HOME}/bin:${SQOOP_HOME}/bin
[hadoop@node1 bin]$ sqoop help
Warning: /home/hadoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hadoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
15/11/24 13:44:31 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
usage: sqoop COMMAND [ARGS]
--把sqoop/bin/configure-sqoop里面的两段内容注释掉就可以了。根据fail搜索
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
See 'sqoop help COMMAND' for information on a specific command.
--设置配置文件
[hadoop@node1 conf]$ cp sqoop-env-template.sh sqoop-env.sh
[hadoop@node1 conf]$ vi sqoop-env.sh
[hadoop@node1 conf]$ vi sqoop-site.xml
--复制需要的类
[hadoop@node1 ~]$ cp $HADOOP_HOME/share/hadoop/common/hadoop-common-2.7.1.jar $SQOOP_HOME/lib
[hadoop@node1 mysql-connector-java-5.1.37]$ cp mysql-connector-java-5.1.37-bin.jar $SQOOP_HOME/lib
mysql-connector-java-5.1.37-bin.jar这个包才有用
---------------------------------------------------------------
--附配置:
---------------------------------------------------------------
[hadoop@node1 conf]$ vi sqoop-env.sh
export HADOOP_COMMON_HOME=/home/hadoop/hadoop-2.7.1/
export HADOOP_MAPRED_HOME=/home/hadoop/hadoop-2.7.1/
export HBASE_HOME=/home/hadoop/hbase-1.1.2
export HIVE_HOME=/home/hadoop/apache-hive-1.2.1-bin
export ZOOCFGDIR=/home/hadoop/zookeeper-3.4.6/conf
[hadoop@node1 conf]$ vi sqoop-site.xml
sqoop list-databases --connect jdbc:mysql://node1 --username root --password 123456
sqoop.metastore.client.autoconnect.url
jdbc:hsqldb:file:/tmp/sqoop-meta/meta.db;shutdown=true
jdbc:mysql://node1/hive?useUnicode=true&characterEncoding=utf-8
The connect string to use when connecting to a
job-management metastore. If unspecified, uses ~/.sqoop/.
You can specify a different path here.
sqoop.metastore.client.autoconnect.username
root
The username to bind to the metastore.
sqoop.metastore.client.autoconnect.password
123456
The password to bind to the metastore.
---------------------------------------------------------------
--测试:
---------------------------------------------------------------
[hadoop@node1 hadoop]$ sudo service mysqld start
正在启动 mysqld: [确定]
[hadoop@node1 hadoop]$ mysql -uroot -p123456
mysql>use
mysql> create table a(id int,name varchar(50));
mysql> insert into a values(1,'a1');
mysql> insert into a values(2,'a2');
mysql> commit;
mysql> select * from a;
+------+------+
| id | name |
+------+------+
| 1 | a1 |
| 2 | a2 |
+------+------+
[hadoop@node1 lib]$ sqoop create-hive-table --connect jdbc:mysql://node1/hive --username root --password 123456 --table a --hive-table a --fields-terminated-by ',' --hive-overwrite
[hadoop@node1 conf]$ sqoop list-tables --connect jdbc:mysql://node1/hive --username root --password 123456
a
mysql导入到hive中,导出格式如上方
[hadoop@node1 lib]$ sqoop import --connect jdbc:mysql://node1/hive --username root --password 123456 --table a --hive-table a --hive-import --fields-terminated-by ',' --hive-overwrite -m 1
mysql> create table b(id int,name varchar(50)); --先建立表
Query OK, 0 rows affected (0.13 sec)
--导出:
mysql> select * from b;
+------+------+
| id | name |
+------+------+
| 1 | a1 |
| 2 | a2 |
+------+------+
2 rows in set (0.15 sec)
--将a文件夹导出到mysql中的b表
[hadoop@node1 lib]$ sqoop export --connect jdbc:mysql://node1/hive --username root --password 123456 --table b --export-dir /user/hive/warehouse/a --input-fields-terminated-by ','
--sqoop eval连接mysql直接select和dml
[hadoop@node1 lib]$ sqoop eval --connect jdbc:mysql://node1/hive --username root --password 123456 --query 'select * from a'
[hadoop@node1 lib]$ sqoop eval --connect jdbc:mysql://node1/hive --username root --password 123456 -e 'select * from a'
[hadoop@node1 lib]$ sqoop eval --connect jdbc:mysql://node1/hive --username root --password 123456 -e "insert into a values (4,'a4')"
[hadoop@node1 lib]$ sqoop eval --connect jdbc:mysql://node1/hive --username root --password 123456 --query "insert into a values (5,'a5')"
[hadoop@node1 lib]$ sqoop eval --connect jdbc:mysql://node1/hive --username root --password 123456 -e "select * from a"
sqoop job --create myjob -- import --connect jdbc:mysql://node1/hive --username root --password 123456 --table a -m 1 --target-dir /test/a_old
sqoop job --list
sqoop job --show myjob
sqoop job --exec myjob
sqoop job --exec myjob -- --username root -P
sqoop job --delete myjob
-sqoop codegen生成java代码
[hadoop@node1 ~]$ sqoop codegen --connect jdbc:mysql://node1/hive --username root --password 123456 --table a
...
15/11/25 00:25:21 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/0fc68731200a4f397cac20ef4a4c718f/a.jar
[hadoop@node1 ~]$ ll /tmp/sqoop-hadoop/compile/0fc68731200a4f397cac20ef4a4c718f/
总用量 28
-rw-rw-r--. 1 hadoop hadoop 8715 11月 25 00:25 a.class
-rw-rw-r--. 1 hadoop hadoop 3618 11月 25 00:25 a.jar
-rw-rw-r--. 1 hadoop hadoop 10346 11月 25 00:25 a.java
---------------------------------------------------------------
--mysql数据增量导入hive:
---------------------------------------------------------------
Incremental import arguments: --增量导入
--check-column Source column to check for incremental
change
--incremental Define an incremental import of type
'append' or 'lastmodified'
--last-value Last imported value in the incremental
check column
--append不支持
Append mode for hive imports is not yet supported. Please remove the parameter --append-mode
--1.mysql中建表
drop table a;
create table a(id int,name varchar(50),crt_date timestamp);
insert into a values(1,'a1',sysdate());
insert into a values(2,'a2',sysdate());
insert into a values(3,'a3',sysdate());
select * from a;
mysql> select * from a;
+------+------+---------------------+
| id | name | crt_date |
+------+------+---------------------+
| 1 | a1 | 2015-11-25 12:41:39 |
| 2 | a2 | 2015-11-25 12:41:39 |
| 3 | a3 | 2015-11-25 12:41:39 |
+------+------+---------------------+
--2.第一次mysql导出到a_1,a_1不要创建
sqoop import --connect jdbc:mysql://node1/hive --username root --password 123456 --table a -m 1 --target-dir /test/a_1
--3.插入数据
mysql> insert into a values(4,'a4',sysdate());
mysql> insert into a values(5,'a5',sysdate());
mysql> select * from a;
+------+------+---------------------+
| id | name | crt_date |
+------+------+---------------------+
| 1 | a1 | 2015-11-25 12:41:39 |
| 2 | a2 | 2015-11-25 12:41:39 |
| 3 | a3 | 2015-11-25 12:41:39 |
| 4 | a4 | 2015-11-25 13:46:42 |
| 5 | a5 | 2015-11-25 13:46:42 |
+------+------+---------------------+
--4.第二次导出
sqoop import --connect jdbc:mysql://node1/hive --username root --password 123456 --table a -m 1 --target-dir /test/a_2 --incremental lastmodified --check-column crt_date --last-value "2015-11-25 12:41:40"
--where crt_date>="2015-11-25 12:41:40",时间要比id=3大一点,不然会把前面3条导进去
[hadoop@node1 ~]$ hadoop fs -cat /test/a_old/*
1,a1,2015-11-25 12:41:39.0
2,a2,2015-11-25 12:41:39.0
3,a3,2015-11-25 12:41:39.0
[hadoop@node1 ~]$ hadoop fs -cat /test/a_new/*
4,a4,2015-11-25 13:46:42.0
5,a5,2015-11-25 13:46:42.0
--5.生成a.jar
sqoop codegen --connect jdbc:mysql://node1/hive --username root --password 123456 --table a
/tmp/sqoop-hadoop/compile/6e3034f9fa9b0b46716ff31aee94c2e4/a.jar
[hadoop@node1 ~]$ ll /tmp/sqoop-hadoop/compile/6e3034f9fa9b0b46716ff31aee94c2e4/
-rw-rw-r--. 1 hadoop hadoop 10321 11月 25 14:31 a.class
-rw-rw-r--. 1 hadoop hadoop 4201 11月 25 14:31 a.jar
-rw-rw-r--. 1 hadoop hadoop 12969 11月 25 14:31 a.java
--6.合并,a_merge不要创建,--class-name a(这里是表名)
sqoop merge --new-data /test/a_2 --onto /test/a_1 --target-dir /test/a_merge --jar-file /tmp/sqoop-hadoop/compile/6e3034f9fa9b0b46716ff31aee94c2e4/a.jar --class-name a --merge-key id
[hadoop@node1 ~]$ hadoop fs -ls /test/a_merge
-rw-r--r-- 3 hadoop supergroup 0 2015-11-25 15:57 /test/a_merge/_SUCCESS
-rw-r--r-- 3 hadoop supergroup 135 2015-11-25 15:57 /test/a_merge/part-r-00000 --hive后面load进去后会在这里删除
[hadoop@node1 6e3034f9fa9b0b46716ff31aee94c2e4]$ hadoop fs -cat /test/a_merge/part*
1,a1,2015-11-25 12:41:39.0
2,a2,2015-11-25 12:41:39.0
3,a3,2015-11-25 12:41:39.0
4,a4,2015-11-25 13:46:42.0
5,a5,2015-11-25 13:46:42.0
--7.导入hive
hive> create table a(id int,name string,crt_date string) row format delimited fields terminated by ',';
hive> load data inpath '/test/a_merge/part*' into table a;
hive> show create table a;
OK
CREATE TABLE `a`(
`id` int,
`name` string,
`crt_date` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://odscluster/user/hive/warehouse/a'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='true',
'numFiles'='1',
'totalSize'='135',
'transient_lastDdlTime'='1448437545')
Time taken: 0.485 seconds, Fetched: 17 row(s)
--8.检查数据文件,会从hdfs中移动到hive
[hadoop@node1 ~]$ hadoop fs -ls /test/a_merge
-rw-r--r-- 3 hadoop supergroup 0 2015-11-25 15:57 /test/a_merge/_SUCCESS
[hadoop@node1 ~]$ hadoop fs -ls /user/hive/warehouse/a
-rwxr-xr-x 3 hadoop supergroup 135 2015-11-25 15:57 /user/hive/warehouse/a/part-r-00000
hive> select * from a;
OK
1 a1 2015-11-25 12:41:39.0
2 a2 2015-11-25 12:41:39.0
3 a3 2015-11-25 12:41:39.0
4 a4 2015-11-25 13:46:42.0
5 a5 2015-11-25 13:46:42.0