准备环境:
- Hadoop 完全分布式集群环境
- Sqoop 安装包:http://archive.apache.org/dist/sqoop/
复制 sqoop-env-template.sh
模板,并将模板重命名为 sqoop-env.sh
,在 sqoop-env.sh
中添加 Hdoop
、Hbase
、Hive
、HBase
的安装路径:
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/usr/local/src/hadoop
#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/usr/local/src/hadoop
#set the path to where bin/hbase is available
export HBASE_HOME=/usr/local/src/hbase
#Set the path to where bin/hive is available
export HIVE_HOME=/usr/local/src/hive
#Set the path for where zookeper config dir is
export ZOOCFGDIR=/usr/local/src/zookeeper/conf
返回顶部
配置 Linux 系统环境变量(/etc/profile
),添加 Sqoop 组件的路径:
# set sqoop environment
export SQOOP_HOME=/usr/local/src/sqoop
export PATH=$PATH:$SQOOP_HOME/bin
export CLASSPATH=$CLASSPATH:$SQOOP_HOME/lib
保存退出后 source
使其生效!
返回顶部
为了使 Sqoop 能够连接 MySQL 数据库,需要将 /../download/mysql-connector-java-5.1.46.jar
文件放入 sqoop 的 lib
目录中:
[root@server download]# cp mysql-connector-java-5.1.46.jar /usr/local/src/sqoop/lib/
执行 Sqoop 前需要先启动 Hadoop 集群,在 server
节点执行 start-all.sh
命令启动 Hadoop 集群,查看进程:
[root@server conf]# jps
5392 HMaster
1762 ResourceManager
4754 QuorumPeerMain
10226 Jps
1603 SecondaryNameNode
1412 NameNode
[root@agent1 src]# jps
1235 DataNode
1863 HRegionServer
1306 NodeManager
1563 QuorumPeerMain
2687 Jps
[root@agent2 src]# jps
1344 NodeManager
1570 QuorumPeerMain
1238 DataNode
2678 Jps
1868 HRegionServer
测试 Sqoop 是否能够正常连接 MySQL 数据库:
[root@server download]# cp mysql-connector-java-5.1.46.jar /usr/local/src/sqoop/lib/
[root@server download]# sqoop list-databases --connect jdbc:mysql://127.0.0.1:3306/ --username root -P
Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/02/25 19:35:42 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Enter password:
22/02/25 19:35:45 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
Fri Feb 25 19:35:46 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
information_schema
hive
mysql
performance_schema
sys
能 够 查 看 到 MySQL 数 据 库 中 的 information_schema
、 hive
、 mysql
、performance_schema
、sys
等数据库,说明 Sqoop 可以正常连接 MySQL。
返回顶部
为了使 Sqoop 能够连接 Hive,需要将 hive 组件 /usr/local/src/hive/lib
目录下的 hive-common-2.3.9.jar
也放入 Sqoop 安装路径的 lib 目录中:
[root@server ~] cp /usr/local/src/hive/lib/hive-common-2.3.9.jar /usr/local/src/sqoop/lib/
MySQL中创建 test
数据库,在 test 中创建 student
表,在 student 表中插入了 3 条数据:
[root@server lib]# mysql -uroot -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 15
Server version: 5.7.22 MySQL Community Server (GPL)
Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> create database test;
Query OK, 1 row affected (0.00 sec)
mysql> use test;
Database changed
mysql> create table student(number char(9) primary key, name varchar(10));
Query OK, 0 rows affected (0.25 sec)
mysql> insert into student values('01','zhangsan');
Query OK, 1 row affected (0.00 sec)
mysql> insert into student values('02','lisi');
Query OK, 1 row affected (0.00 sec)
mysql> insert into student values('03','wangwu');
Query OK, 1 row affected (0.00 sec)
mysql> select * from student;
+--------+----------+
| number | name |
+--------+----------+
| 01 | zhangsan |
| 02 | lisi |
| 03 | wangwu |
+--------+----------+
3 rows in set (0.00 sec)
在 Hive 中创建 test 数据库和 student 数据表:
hive> create database test;
.................
OK
Time taken: 6.211 seconds
hive> use test;
OK
Time taken: 0.018 seconds
hive> create table student(number STRING, name STRING)
> row format delimited
> fields terminated by "|"
> stored as textfile;
OK
Time taken: 9.939 seconds
需要说明该命令的以下几个参数:
[hadoop@master ~]$ sqoop import --connect jdbc:mysql://server:3306/test --username root --password 123456 --table student --fields-terminated-by '|' --delete-target-dir --num-mappers 1 --hive-import --hive-database test --hive-table student
..........
22/02/25 20:02:10 INFO hive.HiveImport: OK
22/02/25 20:02:10 INFO hive.HiveImport: Time taken: 10.6 seconds
22/02/25 20:02:13 INFO hive.HiveImport: Loading data to table test.student
22/02/25 20:02:14 INFO hive.HiveImport: OK
22/02/25 20:02:14 INFO hive.HiveImport: Time taken: 3.846 seconds
22/02/25 20:02:15 INFO hive.HiveImport: Hive import complete.
22/02/25 20:02:15 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.
查看Hive表中的数据:
hive> show tables;
OK
student
Time taken: 0.409 seconds, Fetched: 1 row(s)
hive> select * from student;
OK
01 zhangsan
02 lisi
03 wangwu
Time taken: 1.371 seconds, Fetched: 3 row(s)
返回顶部