经过一下午的努力,终于实现了Sqoop,在Hive和Mysql之间互相导入数据(表)
前期准备:Hapood平台、Zookeeper、Hbase、Hive都安装好。(安装可以参考其它资源)
本Demo版本:jdk1.8.0_171,hadoop-2.7.3,zookeeper-3.4.9,mysql-5.6.40-linux-glibc2.12-x86_64,hbase-1.2.4,apache-hive-2.1.1-bin
1.安装部署Sqoop
Sqoop有Sqoop1 和Sqoop2之区。
可以参阅此文章:http://doc.yonyoucloud.com/doc/ae/921008.html
我使用的是Sqoop1:sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz (有人说Sqoop2不支持Hive与Mysql之间,因此选择Sqoog1)
还有这个Jar包:mysql-connector-java-5.1.45-bin.jar(这个是必须的)
对Sqoop的。gz文件进行释放,这里不在赘述。
把Mysql-connector的包拷贝到你安装的Sqoop的 lib目录下。($SQOOP_HOME/lib)
2. Sqoop配置
Sqoop环境变量配置:
[root@master software]# more ~/.bash_profile
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
PATH=$PATH:$HOME/bin
export PATH
export JAVA_HOME=/application/software/jdk1.8.0_171
export HIVE_HOME=/application/software/apache-hive-2.1.1-bin
export HADOOP_HOME=/application/software/hadoop-2.7.3
export HBASE_HOME=/application/software/hbase-1.2.4
export MAVEN_HOME=/application/software/apache-maven-3.5.4
export SQOOP_HOME=/application/software/sqoop-1.4.7.bin__hadoop-2.6.0
export ZOOKEEPER_HOME=/application/software/zookeeper-3.4.9
export HIVE_CONF_DIR=/application/software/apache-hive-2.1.1-bin/conf #这行比较重要,不配置后边报错
export HADOOP_CLASSPATH=$HBASE_HOME/lib/*:$HIVE_HOME/lib/*:$HADOOP_HOME/lib/* #这行也是,不配置报错。
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$SQOOP_HOME/bin:$MAVEN_HOME/bin:$ZOOKEEPER_HOME/bin:$HBASE_HOME/bin:$HIVE_H
OME/bin
[root@master software]#
Sqoop-env.sh配置:
[root@master conf]# cp sqoop-env-template.sh sqoop-env.sh
[root@master conf]# more sqoop-env.sh
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# included in all the hadoop scripts with source command
# should not be executable directly
# also should not be passed any arguments, since we need original $*
# Set Hadoop-specific environment variables here.
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/application/software/hadoop-2.7.3
#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/application/software/hadoop-2.7.3
#set the path to where bin/hbase is available
export HBASE_HOME=/application/software/hbase-1.2.4
#Set the path to where bin/hive is available
export HIVE_HOME=/application/software/apache-hive-2.1.1-bin
#Set the path for where zookeper config dir is
export ZOOCFGDIR=/application/software/zookeeper-3.4.9
export HIVE_CONF_DIR=/application/software/apache-hive-2.1.1-bin/conf
[root@master conf]#
Sqoop-site.xml 我未作任何修改。
3. 环境已具备,可以做实验了。
1)首先是从Hive导出数据到Mysql
Hive里的表内容:
hive> show tables> ;在Mysql要创建同样的表结构。
mysql> create table sogoutest1(
-> rid varchar(50),uid varchar(50),price int);
Query OK, 0 rows affected (0.02 sec)
mysql>
mysql> desc sogoutest1 ;
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| rid | varchar(50) | YES | | NULL | |
| uid | varchar(50) | YES | | NULL | |
| price | int(11) | YES | | NULL | |
+-------+-------------+------+-----+---------+-------+
3 rows in set (0.00 sec)
执行操作:
[root@master conf]# sqoop export --connect jdbc:mysql://master:3306/test --username root --password root123 --table sogoutest1 --export-dir '/Hive/sogoutest' --fields-terminated-by '\t' -m 1
--table sogoutest1 #你在mysql里创建的表
--export-dir #是你在Hive里的表
可以通过这个命令查看:
hive> show create table sogoutest;结果展示(在Mysql里可以查到数据):
mysql> show tables;
+----------------+
| Tables_in_test |
+----------------+
| sogoutest1 |
| stu_mysql |
+----------------+
2 rows in set (0.00 sec)
mysql> select * from sogoutest1;
+------------+----------+-------+
| rid | uid | price |
+------------+----------+-------+
| 0000000005 | 00000008 | 120 |
| 0000000004 | 00000002 | 429 |
| 0000000003 | 00000004 | 347 |
| 0000000002 | 00000004 | 697 |
| 0000000001 | 00000001 | 252 |
| 0000000000 | 00000001 | 625 |
+------------+----------+-------+
6 rows in set (0.00 sec)
mysql>
2) 其次是从Mysql导出数据到Hive。
Mysql的表内容:
mysql> show tables;
+----------------+
| Tables_in_test |
+----------------+
| sogoutest1 |
| stu_mysql |
+----------------+
2 rows in set (0.00 sec)
mysql> desc stu_mysql ;
+--------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| stu_id | int(11) | YES | | NULL | |
| name | varchar(15) | YES | | NULL | |
| sex | varchar(8) | YES | | NULL | |
| score | int(11) | YES | | NULL | |
+--------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
mysql> select * from stu_mysql;
+--------+------------+------+-------+
| stu_id | name | sex | score |
+--------+------------+------+-------+
| 1 | mafeng | male | 899 |
| 2 | mazhuoxing | male | 1000 |
| 3 | qizhuoxing | male | 1000 |
+--------+------------+------+-------+
3 rows in set (0.00 sec)
mysql>
不需要在Hive里创建表。通过Sqoop可以在Hive里创建。
直接执行操作:(这是全表导入,也可以选择几列进行导入)
[root@master conf]# sqoop import --connect jdbc:mysql://master:3306/test --username root --password root123 --table stu_mysql --fields-terminated-by "\t" --lines-terminated-by "\n" --hive-import --hive-overwrite --create-hive-table --hive-table stu3 -m 1
结果展示:
hive> show tables;
OK
blog_hive_hbase
brand_dimension
record
record_orc
record_parquet
record_partition
sogoutest
stocks
stu
stu2
stu3
stu_hive
user_dimension
Time taken: 0.037 seconds, Fetched: 13 row(s)
hive> desc stu3;
OK
stu_id int
name string
sex string
score int
Time taken: 0.07 seconds, Fetched: 4 row(s)
hive> select * from stu3;
OK
1 mafeng male 899
2 mazhuoxing male 1000
3 qizhuoxing male 1000
Time taken: 0.117 seconds, Fetched: 3 row(s)
hive>
4. 遇到的坑:
1)报错:Caused by: java.lang.RuntimeException: java.sql.SQLException: null, message from server: "Host 'datanode1' is not allowed to connect to this MySQL server"
解决方案:改变Mysql授权。
例如,你想myuser使用mypassword从任何主机连接到mysql服务器的话。
Mysql>GRANT ALL PRIVILEGES ON *.* TO 'myuser'@'%' IDENTIFIED BY 'mypassword' WITH GRANT OPTION;
Mysql>FLUSH PRIVILEGES;
2)报错:
18/07/05 12:03:55 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly.
解决方案: