问题导读:
1、Sqoop在Hadoop与关系型数据库之间传输数据,需要修改哪个配置文件?
2、需要将对应的关系型数据库JDBC驱动包拷贝到哪个目录下?
一、Sqoop1.4.4简介
Sqoop是一个在Hadoop与关系型数据库之间传输数据的工具。我们可以使用Sqoop将关系型数据库(如MySQL、Oracle等)中的数据导入到Hadoop的HDFS(Hadoop分布式文件系统)中,传输数据使用Hadoop的MapReduce并行计算机框架,也可以将数据从HDFS中导出到关系型数据库中。
现在Sqoop2.X已经出来,在安全性、并发性等方面比Sqoop1.X都要好,但是支持的功能有限,此处我们还是使用Sqoop1.X来学习使用Sqoop在关系型数据库与Hadoop之间进行数据的导入导出。下面介绍Sqoop1.4.4安装在Hadoop2.2.0集群上:
二、Sqoop1.4.4安装包下载解压
在Sqoop官网中下载Sqoop1.4.4安装包:sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz,在指定目录解压,如下:
[hadoopUser@secondmgt sqoop1.0]$ ls sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz [hadoopUser@secondmgt sqoop1.0]$ tar -zxvf sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz
三、配置环境变量
在用户家目录.bashrc中配置Sqoop1.4.4的环境变量,方便命令的执行
#Sqoop1.4.4 Configure export SQOOP_HOME=/home/hadoopUser/cloud/sqoop1.0/sqoop-1.4.4.bin__hadoop-2.0.4-alpha export PATH=$PATH:$SQOOP_HOME/bin四、检查环境变量是否配置成功
[hadoopUser@secondmgt ~]$ sqoop help Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. usage: sqoop COMMAND [ARGS] Available commands: codegen Generate code to interact with database records create-hive-table Import a table definition into Hive eval Evaluate a SQL statement and display the results export Export an HDFS directory to a database table help List available commands import Import a table from a database to HDFS import-all-tables Import tables from a database to HDFS job Work with saved jobs list-databases List available databases on a server list-tables List available tables in a database merge Merge results of incremental imports metastore Run a standalone Sqoop metastore version Display version information See 'sqoop help COMMAND' for information on a specific command.五、配置sqoop-env.sh
从sqoop-env-template.sh复制一份重命名为sqoop-env.sh文件。编辑里面内容
# Set Hadoop-specific environment variables here. #Set path to where bin/hadoop is available #export HADOOP_COMMON_HOME= export HADOOP_COMMON_HOME=/home/hadoopUser/cloud/hadoop/programs/hadoop-2.2.0 #Set path to where hadoop-*-core.jar is available #export HADOOP_MAPRED_HOME=/home/hadoopUser/cloud/hadoop/programs/hadoop-2.2.0/share/hadoop/mapreduce #set the path to where bin/hbase is available #export HBASE_HOME= #Set the path to where bin/hive is available #export HIVE_HOME= #Set the path for where zookeper config dir is #export ZOOCFGDIR=
HADOOP_COMMON_HOME:填写Hadoop的安装根目录
HADOOP_MAPRED_HOME:填写MapReduce的目录
HBASE_HOME:HBase的安装根目录。(此处暂时用不到可以先不填)
HIVE_HOME:Hive的安装根目录。(此处暂时用不到可以先不填)
六、将Mysql JDBC包拷贝到lib下
[hadoopUser@secondmgt lib]$ ls ant-contrib-1.0b3.jar avro-ipc-1.5.3.jar hsqldb-1.8.0.10.jar jopt-simple-3.2.jar snappy-java-1.0.3.2.jar ant-eclipse-1.0-jvm1.2.jar avro-mapred-1.5.3.jar jackson-core-asl-1.7.3.jar mysql-connector-java-5.1.18-bin.jar avro-1.5.3.jar commons-io-1.4.jar jackson-mapper-asl-1.7.3.jar paranamer-2.3.jar此处我们使用的是MySQL 数据库,如果是Oracle数据库,导入Oracle对应的JDBC包即可。
七、启动测试验证
Sqoop1.0无需启动即可使用,我们使用一条命令来查看是否配置正确,如下:
[hadoopUser@secondmgt ~]$ sqoop list-databases --connect jdbc:mysql://secondmgt:3306/ --password hive --username hive Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. 15/01/17 20:07:39 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 15/01/17 20:07:39 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. information_schema goodseval hive mysql spice sqoopdb test--connect jdbc:mysql://secondmgt:3306/ --password hive --username hive 是MySQL数据库的连接命令。
而我的MySQL中的数据库如下:
hadoopUser@secondmgt ~]$ mysql -uhive -phive Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 410 Server version: 5.1.73 Source distribution Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | goodseval | | hive | | mysql | | spice | | sqoopdb | | test | +--------------------+ 7 rows in set (0.00 sec)由结果看,和Sqoop查询得到的结果一致,所以Sqoop安装成功。
推荐阅读:
下一篇:使用Sqoop1.4.4将MySQL数据库表中数据导入到HDFS中