搭建Sqoop, Hive和Mysql之间数据导入

经过一下午的努力,终于实现了Sqoop,在Hive和Mysql之间互相导入数据(表)

前期准备:Hapood平台、Zookeeper、Hbase、Hive都安装好。(安装可以参考其它资源)
本Demo版本:jdk1.8.0_171,hadoop-2.7.3,zookeeper-3.4.9,mysql-5.6.40-linux-glibc2.12-x86_64,hbase-1.2.4,apache-hive-2.1.1-bin

1.安装部署Sqoop

  Sqoop有Sqoop1 和Sqoop2之区。

可以参阅此文章:http://doc.yonyoucloud.com/doc/ae/921008.html

 我使用的是Sqoop1:sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz (有人说Sqoop2不支持Hive与Mysql之间,因此选择Sqoog1)

还有这个Jar包:mysql-connector-java-5.1.45-bin.jar(这个是必须的)

对Sqoop的。gz文件进行释放,这里不在赘述。

把Mysql-connector的包拷贝到你安装的Sqoop的 lib目录下。($SQOOP_HOME/lib)

2. Sqoop配置

Sqoop环境变量配置:


[root@master software]# more ~/.bash_profile
# .bash_profile
 
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi
 
# User specific environment and startup programs
 
PATH=$PATH:$HOME/bin
 
export PATH
export JAVA_HOME=/application/software/jdk1.8.0_171
export HIVE_HOME=/application/software/apache-hive-2.1.1-bin
export HADOOP_HOME=/application/software/hadoop-2.7.3
export HBASE_HOME=/application/software/hbase-1.2.4
export MAVEN_HOME=/application/software/apache-maven-3.5.4
export SQOOP_HOME=/application/software/sqoop-1.4.7.bin__hadoop-2.6.0
export ZOOKEEPER_HOME=/application/software/zookeeper-3.4.9
export HIVE_CONF_DIR=/application/software/apache-hive-2.1.1-bin/conf          #这行比较重要,不配置后边报错
export HADOOP_CLASSPATH=$HBASE_HOME/lib/*:$HIVE_HOME/lib/*:$HADOOP_HOME/lib/*  #这行也是,不配置报错。

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$SQOOP_HOME/bin:$MAVEN_HOME/bin:$ZOOKEEPER_HOME/bin:$HBASE_HOME/bin:$HIVE_H
OME/bin
[root@master software]#

Sqoop-env.sh配置:

[root@master conf]# cp sqoop-env-template.sh sqoop-env.sh

[root@master conf]# more sqoop-env.sh
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
 
# included in all the hadoop scripts with source command
# should not be executable directly
# also should not be passed any arguments, since we need original $*
 
# Set Hadoop-specific environment variables here.
 
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/application/software/hadoop-2.7.3
 
#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/application/software/hadoop-2.7.3
 
#set the path to where bin/hbase is available
export HBASE_HOME=/application/software/hbase-1.2.4
 
#Set the path to where bin/hive is available
export HIVE_HOME=/application/software/apache-hive-2.1.1-bin
 
#Set the path for where zookeper config dir is
export ZOOCFGDIR=/application/software/zookeeper-3.4.9
export HIVE_CONF_DIR=/application/software/apache-hive-2.1.1-bin/conf
[root@master conf]#

注意:根据实际路径配置HADOOP_COMMON_HOME,HADOOP_MAPRED_HOME,HIVE_HOME

Sqoop-site.xml 我未作任何修改。

3. 环境已具备,可以做实验了。

1)首先是从Hive导出数据到Mysql

Hive里的表内容:

hive> show tables> ;
OK
blog_hive_hbase
brand_dimension
record
record_orc
record_parquet
record_partition
sogoutest
stocks
stu
stu2
stu3
stu_hive
user_dimension
Time taken: 0.031 seconds, Fetched: 13 row(s)
hive> desc sogoutest;
OK
rid                     string                                     
uid                     string                                     
price                   int                                        
Time taken: 0.058 seconds, Fetched: 3 row(s)
hive> select * from sogoutest;
OK
0000000005      00000008        120
0000000004      00000002        429
0000000003      00000004        347
0000000002      00000004        697
0000000001      00000001        252
0000000000      00000001        625
Time taken: 0.102 seconds, Fetched: 6 row(s)
hive>

在Mysql要创建同样的表结构。

mysql> create table sogoutest1(
    -> rid varchar(50),uid varchar(50),price int);
Query OK, 0 rows affected (0.02 sec)
 
mysql>

mysql> desc sogoutest1 ;
+-------+-------------+------+-----+---------+-------+
| Field | Type        | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| rid   | varchar(50) | YES  |     | NULL    |       |
| uid   | varchar(50) | YES  |     | NULL    |       |
| price | int(11)     | YES  |     | NULL    |       |
+-------+-------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

执行操作:

[root@master conf]# sqoop export --connect jdbc:mysql://master:3306/test --username root --password root123 --table sogoutest1 --export-dir '/Hive/sogoutest' --fields-terminated-by '\t' -m 1

--table sogoutest1 #你在mysql里创建的表

--export-dir #是你在Hive里的表

可以通过这个命令查看:

hive> show create table sogoutest;
OK
CREATE TABLE `sogoutest`(
  `rid` string,
  `uid` string,
  `price` int)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'field.delim'='\t',
  'serialization.format'='\t')
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://master:9000/Hive/sogoutest'                                               #这个位置
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
  'numFiles'='1',
  'numRows'='6',
  'rawDataSize'='138',
  'totalSize'='144',
  'transient_lastDdlTime'='1530754256')
Time taken: 0.069 seconds, Fetched: 22 row(s)
hive>


结果展示(在Mysql里可以查到数据):

mysql> show tables;
+----------------+
| Tables_in_test |
+----------------+
| sogoutest1     |
| stu_mysql      |
+----------------+
2 rows in set (0.00 sec)
 
mysql> select * from sogoutest1;
+------------+----------+-------+
| rid        | uid      | price |
+------------+----------+-------+
| 0000000005 | 00000008 |   120 |
| 0000000004 | 00000002 |   429 |
| 0000000003 | 00000004 |   347 |
| 0000000002 | 00000004 |   697 |
| 0000000001 | 00000001 |   252 |
| 0000000000 | 00000001 |   625 |
+------------+----------+-------+
6 rows in set (0.00 sec)
 
mysql>

2) 其次是从Mysql导出数据到Hive。

Mysql的表内容:

mysql> show tables;
+----------------+
| Tables_in_test |
+----------------+
| sogoutest1     |
| stu_mysql      |
+----------------+
2 rows in set (0.00 sec)
 
mysql> desc stu_mysql ;
+--------+-------------+------+-----+---------+-------+
| Field  | Type        | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| stu_id | int(11)     | YES  |     | NULL    |       |
| name   | varchar(15) | YES  |     | NULL    |       |
| sex    | varchar(8)  | YES  |     | NULL    |       |
| score  | int(11)     | YES  |     | NULL    |       |
+--------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
 
mysql> select * from stu_mysql;
+--------+------------+------+-------+
| stu_id | name       | sex  | score |
+--------+------------+------+-------+
|      1 | mafeng     | male |   899 |
|      2 | mazhuoxing | male |  1000 |
|      3 | qizhuoxing | male |  1000 |
+--------+------------+------+-------+
3 rows in set (0.00 sec)
 
mysql>

不需要在Hive里创建表。通过Sqoop可以在Hive里创建。

直接执行操作:(这是全表导入,也可以选择几列进行导入)

[root@master conf]# sqoop import --connect jdbc:mysql://master:3306/test --username root --password root123 --table stu_mysql --fields-terminated-by "\t" --lines-terminated-by "\n" --hive-import --hive-overwrite --create-hive-table --hive-table stu3 -m 1

结果展示:

hive> show tables;
OK
blog_hive_hbase
brand_dimension
record
record_orc
record_parquet
record_partition
sogoutest
stocks
stu
stu2
stu3
stu_hive
user_dimension
Time taken: 0.037 seconds, Fetched: 13 row(s)
hive> desc stu3;
OK
stu_id                  int                                        
name                    string                                     
sex                     string                                     
score                   int                                        
Time taken: 0.07 seconds, Fetched: 4 row(s)
hive> select * from stu3;
OK
1       mafeng  male    899
2       mazhuoxing      male    1000
3       qizhuoxing      male    1000
Time taken: 0.117 seconds, Fetched: 3 row(s)
hive>

4. 遇到的坑:

1)报错:Caused by: java.lang.RuntimeException: java.sql.SQLException: null,  message from server: "Host 'datanode1' is not allowed to connect to this MySQL server"

解决方案:改变Mysql授权。

例如,你想myuser使用mypassword从任何主机连接到mysql服务器的话。

Mysql>GRANT ALL PRIVILEGES ON *.* TO 'myuser'@'%' IDENTIFIED BY 'mypassword' WITH GRANT OPTION;

Mysql>FLUSH   PRIVILEGES;

2)报错:

18/07/05 12:03:55 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly.

解决方案:
在~/.bash_profile里增加:
export JAVA_HOME=/application/software/jdk1.8.0_171
export HIVE_HOME=/application/software/apache-hive-2.1.1-bin
export HADOOP_HOME=/application/software/hadoop-2.7.3
export HBASE_HOME=/application/software/hbase-1.2.4
export MAVEN_HOME=/application/software/apache-maven-3.5.4
export SQOOP_HOME=/application/software/sqoop-1.4.7.bin__hadoop-2.6.0
export ZOOKEEPER_HOME=/application/software/zookeeper-3.4.9
export HIVE_CONF_DIR=/application/software/apache-hive-2.1.1-bin/conf
export HADOOP_CLASSPATH=$HBASE_HOME/lib/*:$HIVE_HOME/lib/*:$HADOOP_HOME/lib/*




你可能感兴趣的:(Bigdata,Hbase,HIve)