【CentOS】安装 Sqoop 组件


文章目录

  • 1、解压 Sqoop 安装包
  • 2、配置 sqoop-env.sh
  • 3、配置系统环境变量
  • 4、sqoop 连接 MySQL
  • 5、sqoop 连接 MySQL 导入到 Hive
    • 5.1 准备
    • 5.2 MySQL创建模拟数据表(数据源)
    • 5.3 Hive 创建数据表(接收)
    • 5.3 MySQL 导出到 Hive


准备环境:

  • Hadoop 完全分布式集群环境
  • Sqoop 安装包:http://archive.apache.org/dist/sqoop/

1、解压 Sqoop 安装包

上传本地安装包:
【CentOS】安装 Sqoop 组件_第1张图片
解压后并重命名:
【CentOS】安装 Sqoop 组件_第2张图片
返回顶部


2、配置 sqoop-env.sh

复制 sqoop-env-template.sh 模板,并将模板重命名为 sqoop-env.sh,在 sqoop-env.sh 中添加 HdoopHbaseHiveHBase的安装路径:
【CentOS】安装 Sqoop 组件_第3张图片

#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/usr/local/src/hadoop

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/usr/local/src/hadoop

#set the path to where bin/hbase is available
export HBASE_HOME=/usr/local/src/hbase

#Set the path to where bin/hive is available
export HIVE_HOME=/usr/local/src/hive

#Set the path for where zookeper config dir is
export ZOOCFGDIR=/usr/local/src/zookeeper/conf

返回顶部


3、配置系统环境变量

配置 Linux 系统环境变量(/etc/profile),添加 Sqoop 组件的路径:

# set sqoop environment
export SQOOP_HOME=/usr/local/src/sqoop
export PATH=$PATH:$SQOOP_HOME/bin
export CLASSPATH=$CLASSPATH:$SQOOP_HOME/lib

保存退出后 source 使其生效!

返回顶部


4、sqoop 连接 MySQL

为了使 Sqoop 能够连接 MySQL 数据库,需要将 /../download/mysql-connector-java-5.1.46.jar 文件放入 sqoop 的 lib 目录中:

[root@server download]# cp mysql-connector-java-5.1.46.jar /usr/local/src/sqoop/lib/

执行 Sqoop 前需要先启动 Hadoop 集群,在 server 节点执行 start-all.sh 命令启动 Hadoop 集群,查看进程:

[root@server conf]# jps
5392 HMaster
1762 ResourceManager
4754 QuorumPeerMain
10226 Jps
1603 SecondaryNameNode
1412 NameNode

[root@agent1 src]# jps
1235 DataNode
1863 HRegionServer
1306 NodeManager
1563 QuorumPeerMain
2687 Jps

[root@agent2 src]# jps
1344 NodeManager
1570 QuorumPeerMain
1238 DataNode
2678 Jps
1868 HRegionServer

测试 Sqoop 是否能够正常连接 MySQL 数据库:

[root@server download]# cp mysql-connector-java-5.1.46.jar /usr/local/src/sqoop/lib/
[root@server download]# sqoop list-databases --connect jdbc:mysql://127.0.0.1:3306/ --username root -P
Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/02/25 19:35:42 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Enter password: 
22/02/25 19:35:45 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
Fri Feb 25 19:35:46 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
information_schema
hive
mysql
performance_schema
sys

能 够 查 看 到 MySQL 数 据 库 中 的 information_schemahivemysqlperformance_schemasys 等数据库,说明 Sqoop 可以正常连接 MySQL。

返回顶部


5、sqoop 连接 MySQL 导入到 Hive

5.1 准备

为了使 Sqoop 能够连接 Hive,需要将 hive 组件 /usr/local/src/hive/lib 目录下的 hive-common-2.3.9.jar 也放入 Sqoop 安装路径的 lib 目录中:

[root@server ~] cp /usr/local/src/hive/lib/hive-common-2.3.9.jar /usr/local/src/sqoop/lib/

5.2 MySQL创建模拟数据表(数据源)

MySQL中创建 test 数据库,在 test 中创建 student 表,在 student 表中插入了 3 条数据:

[root@server lib]# mysql -uroot -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 15
Server version: 5.7.22 MySQL Community Server (GPL)

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database test;
Query OK, 1 row affected (0.00 sec)

mysql> use test;
Database changed
mysql> create table student(number char(9) primary key, name varchar(10));
Query OK, 0 rows affected (0.25 sec)

mysql> insert into student values('01','zhangsan');
Query OK, 1 row affected (0.00 sec)

mysql> insert into student values('02','lisi');
Query OK, 1 row affected (0.00 sec)

mysql> insert into student values('03','wangwu');
Query OK, 1 row affected (0.00 sec)

mysql> select * from student;
+--------+----------+
| number | name     |
+--------+----------+
| 01     | zhangsan |
| 02     | lisi     |
| 03     | wangwu   |
+--------+----------+
3 rows in set (0.00 sec)

5.3 Hive 创建数据表(接收)

在 Hive 中创建 test 数据库和 student 数据表:

hive> create database test;
.................
OK
Time taken: 6.211 seconds

hive> use test;
OK
Time taken: 0.018 seconds

hive>  create table student(number STRING, name STRING) 
    > row format delimited 
    > fields terminated by "|"
    > stored as textfile; 
OK
Time taken: 9.939 seconds

5.3 MySQL 导出到 Hive

需要说明该命令的以下几个参数:

  • –connectMySQL 数据库连接 URL
  • –username & –passwordMySQL 数据库的用户名和密码
  • –table导出的数据表名
  • –fields-terminated-byHive 中字段分隔符
  • –delete-target-dir删除导出目的目录
  • –num-mappersHadoop 执行 Sqoop 导入导出启动的 map 任务数
  • –hive-import --hive-database导出到 Hive 的数据库名
  • –hive-table导出到 Hive 的表名
[hadoop@master ~]$ sqoop import --connect jdbc:mysql://server:3306/test --username root --password 123456 --table student --fields-terminated-by '|' --delete-target-dir --num-mappers 1 --hive-import --hive-database test --hive-table student
..........
22/02/25 20:02:10 INFO hive.HiveImport: OK
22/02/25 20:02:10 INFO hive.HiveImport: Time taken: 10.6 seconds
22/02/25 20:02:13 INFO hive.HiveImport: Loading data to table test.student
22/02/25 20:02:14 INFO hive.HiveImport: OK
22/02/25 20:02:14 INFO hive.HiveImport: Time taken: 3.846 seconds
22/02/25 20:02:15 INFO hive.HiveImport: Hive import complete.
22/02/25 20:02:15 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.

查看Hive表中的数据:

hive> show tables;
OK
student
Time taken: 0.409 seconds, Fetched: 1 row(s)
hive> select * from student;
OK
01	zhangsan
02	lisi
03	wangwu
Time taken: 1.371 seconds, Fetched: 3 row(s)

返回顶部


你可能感兴趣的:(#,CentOS,centos,sqoop,hbase)