sqoop1.4.7的安装及使用(hadoop2.7环境)

一、sqoop简介

Sqoop是一款开源的工具,主要用于在Hadoop(Hive)与传统的数据库(mysql、postgresql…)间进行数据的传递,可以将一个关系型数据库(例如 : MySQL ,Oracle ,Postgres等)中的数据导进到Hadoop的HDFS中,也可以将HDFS的数据导进到关系型数据库中。
sqoop1.4.7的安装及使用(hadoop2.7环境)_第1张图片
说明:本测试hadoop是单机伪分布式环境,如果读者想要学习如何搭建伪分布式hadoop环境请看本人这篇文章([Hadoop+Hive+HBase+Kylin 伪分布式安装指南(https://blog.csdn.net/qq_28356739/article/details/88569175)),本文章只介绍如何安装部署sqoop,后续通过sqoop将Oracle数据导入到hadoop环境实验在另外一篇文章,待测试完将会把连接放到此文章。

二、环境配置

sqoop1.4.7的安装及使用(hadoop2.7环境)_第2张图片

三、安装Sqoop

1. 下载,解压到指定目录

下载连接:

点此下载
创建安装目录,通过xshell上传安装包

[root@hadoop hadoop]# pwd
/hadoop
[root@hadoop hadoop]# mkdir sqoop
[root@hadoop hadoop]# cd sqoop/
[root@hadoop sqoop]# ls
sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
[root@hadoop sqoop]# tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz 
[root@hadoop sqoop]# ls
sqoop-1.4.7.bin__hadoop-2.6.0  sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
[root@hadoop sqoop]# rm -rf *gz
[root@hadoop sqoop]# mv sqoop-1.4.7.bin__hadoop-2.6.0/* .

2、修改配置文件sqoop-env.sh

在sqoop/conf目录下有一个文件sqoop-env-template.sh,把它复制为sqoop-env.sh并修改

[root@hadoop sqoop]# cd conf/
[root@hadoop conf]# cp sqoop-env-template.sh sqoop-env.sh
#Set path to where bin/hadoop is available
[root@hadoop conf]# vim sqoop-env.sh 
根据自己情况修改,另外,你还装了Zookeeper的话则最后一句也要配置。
export HADOOP_COMMON_HOME=/hadoop/

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/hadoop/

#set the path to where bin/hbase is available
export HBASE_HOME=/hadoop/hbase/

#Set the path to where bin/hive is available
export HIVE_HOME=/hadoop/hive

#Set the path for where zookeper config dir is
#export ZOOCFGDIR=                                 

3. 配置环境变量

我测试用户为root用户,直接修改/etc/profile加入下面内容:

export SQOOP_HOME=/hadoop/sqoop
export PATH=$PATH:${SQOOP_HOME}/bin
export CLASSPATH=$CLASSPATH:${SQOOP_HOME}/lib

然后使环境变量生效

[root@hadoop conf]# source /etc/profile

4. 复制相关依赖包到$SQOOP_HOME/lib

因为我是将Oracle数据导入到hive,所以复制环境数据库所在虚拟机(195.168.1.6)的Oracle的OJDBC包到/hadoop/sqoop/lib下

[oracle@source ~]$ cd $ORACLE_HOME/jdbc/lib
[oracle@source lib]$ pwd
/u01/app/oracle/product/11.2.0/db_1/jdbc/lib
[oracle@source lib]$ ls
ojdbc5dms_g.jar  ojdbc5_g.jar  ojdbc6dms_g.jar  ojdbc6_g.jar  simplefan.jar
ojdbc5dms.jar    ojdbc5.jar    ojdbc6dms.jar    ojdbc6.jar
上面是数据库所在虚拟机Oraclejar包位置及信息。将ojdbc包传到hadoop虚拟机
[oracle@source lib]$ scp ojdbc6.jar [email protected]:/hadoop/sqoop/lib
[email protected]'s password: 
ojdbc6.jar                                                           100% 2675KB   2.6MB/s   00:00    

5、修改$SQOOP_HOME/bin/configure-sqoop

注释掉HCatalog,Accumulo检查(除非你准备使用HCatalog,Accumulo等HADOOP上的组件)

##Moved to be a runtime check in sqoop.
#if[ ! -d "${HCAT_HOME}" ]; then
#  echo "Warning: $HCAT_HOME does notexist! HCatalog jobs will fail."
#  echo 'Please set $HCAT_HOME to the root ofyour HCatalog installation.'
#fi

#if[ ! -d "${ACCUMULO_HOME}" ]; then
#  echo "Warning: $ACCUMULO_HOME does notexist! Accumulo imports will fail."
#  echo 'Please set $ACCUMULO_HOME to the rootof your Accumulo installation.'
#fi

#Add HCatalog to dependency list
#if[ -e "${HCAT_HOME}/bin/hcat" ]; then
# TMP_SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:`${HCAT_HOME}/bin/hcat-classpath`
#  if [ -z "${HIVE_CONF_DIR}" ]; then
#   TMP_SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}:${HIVE_CONF_DIR}
#  fi
#  SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}
#fi
 
#Add Accumulo to dependency list
#if[ -e "$ACCUMULO_HOME/bin/accumulo" ]; then
#  for jn in `$ACCUMULO_HOME/bin/accumuloclasspath | grep file:.*accumulo.*jar |cut -d':' -f2`; do
#    SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn
#  done
#  for jn in `$ACCUMULO_HOME/bin/accumuloclasspath | grep file:.*zookeeper.*jar |cut -d':' -f2`; do
#    SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn
#  done
#fi

6、 测试与Oracle的连接

[root@hadoop sqoop]# pwd
/hadoop/sqoop
[root@hadoop sqoop]# sqoop list-databases --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott --password tiger
Warning: /hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/03/18 14:25:57 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/03/18 14:25:57 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consi
der using -P instead.19/03/18 14:25:57 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
19/03/18 14:25:57 INFO manager.SqlManager: Using default fetchSize of 1000
19/03/18 14:25:58 INFO manager.OracleManager: Time zone has been set to GMT
SYS
SYSTEM
SCOTT
TEST
ADMRG
OGG
OUTLN
MGMT_VIEW
FLOWS_FILES
MDSYS
ORDSYS
EXFSYS
DBSNMP
WMSYS
APPQOSSYS
APEX_030200
OWBSYS_AUDIT
ORDDATA
CTXSYS
ANONYMOUS
SYSMAN
XDB
ORDPLUGINS
OWBSYS
SI_INFORMTN_SCHEMA
OLAPSYS
ORACLE_OCM
XS$NULL
BI
PM
MDDATA
IX
SH
DIP
OE
APEX_PUBLIC_USER
HR
SPATIAL_CSW_ADMIN_USR
SPATIAL_WFS_ADMIN_USR

你可能感兴趣的:(Hadoop)