6.1.CentOS7下安装Sqoop安装与使用

python编程快速上手(持续更新中…)

推荐系统基础


文章目录

  • python编程快速上手(持续更新中…)
    • 推荐系统基础
    • 一、Sqoop介绍
    • 二、Sqoop原理
    • 三、Sqoop安装
      • 事前准备安装包
      • 解压安装
    • 四、Sqoop使用
      • 1、MySQL->HDFS
      • 2、创建job
      • 3、导入数据到Hive中
      • 4、导入数据到HBase中
        • 1.在HBase中建表
        • 2.sqoop导入
      • 5、HDFS向MySQL中导出数据
        • 1.MySQL中建表
        • 2.上传数据
        • 3.导出数据
      • 6、编写sqoop脚本
        • 1.编写脚本 job_01.opt
        • 2.执行脚本

一、Sqoop介绍

作用:数据交换工具,可以实现数据在mysql/oracle<–>hdfs之间互相传递
原理:通过写sqoop命令把sqoop命令翻译成mapreduce,通过maperdece连接各种数据,实现数据传递

二、Sqoop原理

6.1.CentOS7下安装Sqoop安装与使用_第1张图片

三、Sqoop安装

事前准备安装包

sqoop链接
提取码:4tkp

解压安装

将安装包托入/software目录下

tar -zxvf sqoop-1.4.6-cdh5.14.2.tar.gz -C /opt

进入/opt对sqoop改名

cd /opt/
mv sqoop-1.4.6-cdh5.14.2/ sqoop

配置环境变量

vi /etc/profile

export SQOOP_HOME=/opt/sqoop
export PATH= S Q O O P H O M E / b i n : SQOOP_HOME/bin: SQOOPHOME/bin:PATH

让配置文件生效

source /etc/profile

修改配置文件

cd sqoop/conf
mv sqoop-env-template.sh sqoop-env.sh

vi sqoop-env.sh

export HADOOP_COMMON_HOME=/opt/hadoop
export HADOOP_MAPRED_HOME=/opt/hadoop
export HIVE_HOME=/opt/hive
export ZOOKEEPER_HOME=/opt/zookeeper
export ZOOCFGDIR=/opt/zookeeper
export HBASE_HOME=/opt/hbase

将准备的两个jar包拖到/opt/sqoop/lib 目录下
6.1.CentOS7下安装Sqoop安装与使用_第2张图片
验证输入
sqoop help

有命令出来就表示成功

运行sqoop1.4.5报Warning: does not exist! HCatalog jobs will fail.
进入bin

cd vi configure-sqoop

注释

## Moved to be a runtime check in sqoop.
#if [ ! -d "${HCAT_HOME}" ]; then
#  echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
#  echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
#fi

#if [ ! -d "${ACCUMULO_HOME}" ]; then
#  echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
#  echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
#fi

四、Sqoop使用

1、MySQL->HDFS

准备sql脚本,放入自己知道的目录下
准备工作:mysql中建库建表

mysql> create database sqoop;

mysql> use sqoop;

mysql> source /tmp/retail_db.sql

mysql> show tables;

6.1.CentOS7下安装Sqoop安装与使用_第3张图片
使用sqoop将customers表导入到hdfs上
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop //mysql上的数据库
–driver com.mysql.jdbc.Driver
–table customers //mysql上的表
–username root //mysql用户名
–password root //密码
–target-dir /tmp/customers //目标HDFS路径
–m 3 //map数量

sqoop import --connect jdbc:mysql://localhost:3306/sqoop --driver com.mysql.jdbc.Driver --table customers --username root --password root --target-dir /tmp/customers --m 3

使用where过滤
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–driver com.mysql.jdbc.Driver
–table orders
–where “order_id<500”
–username root
–password root
–target-dir /data1/retail_db/orders
–m 3

使用colum 过滤
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop1
–driver com.mysql.jdbc.Driver
–table emp
–columns “EMPNO,ENAME,JOB,HIREDATE”
–where “SAL>2000”
–username root
–password root
–delete-target-dir
–target-dir /data1/sqoop1/emp
–m 3

使用查询语句
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–driver com.mysql.jdbc.Driver
–query “select * from orders where order_status!=‘CLOSED’ and $CONDITIONS”
–username root
–password root
–split-by order_id
–delete-target-dir
–target-dir /data1/retail_db/orders
–m 3

追加导入

sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–driver com.mysql.jdbc.Driver
–table orders
–username root
–password root
–incremental append
–check-column order_date
–last-value ‘2014-07-24 00:00:00’
–target-dir /data1/retail_db/orders
–m 3

2、创建job

创建job 注意import前必须有空格
sqoop job
–create mysqlToHdfs
– import
–connect jdbc:mysql://localhost:3306/sqoop
–table orders
–username root
–password root
–incremental append
–check-column order_date
–last-value ‘0’
–target-dir /data1/retail_db/orders
–m 3

查看job

sqoop job --list

执行job

sqoop job --exec mysqlToHdfs

定时执行

crontab -e

  • 2 */1 * *
    sqoop job --exec mysqlToHdfs

3、导入数据到Hive中

先在Hive中创建表

hive -e “create database if not exists retail_db;”

如果目标路径存在会报错 删除已存在的目录
hdfs dfs -rmr hdfs://hadoop1:9000/user/root/orders1
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–driver com.mysql.jdbc.Driver
–table orders
–username root
–password root
–hive-import
–create-hive-table
–hive-database retail_db
–hive-table orders1
–m 3

导入数据到Hive分区中
删除Hive表

drop table if exists orders;

导入
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–driver com.mysql.jdbc.Driver
–query “select order_id,order_status from orders where order_date>=‘2013-11-03’ and order_date <‘2013-11-04’ and $CONDITIONS”
–username root
–password ok
–delete-target-dir
–target-dir /data1/retail_db/orders
–split-by order_id
–hive-import
–hive-database retail_db
–hive-table orders
–hive-partition-key “order_date”
–hive-partition-value “2013-11-03”
–m 3

注意:分区字段不能当成普通字段导入表中

4、导入数据到HBase中

1.在HBase中建表

create ‘products’,‘data’,‘category’

2.sqoop导入

sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–driver com.mysql.jdbc.Driver
–username root
–password ok
–table products
–hbase-table products
–column-family data
–m 3

5、HDFS向MySQL中导出数据

1.MySQL中建表

create table customers_demo as select * from customers where 1=2;

2.上传数据

hdfs dfs -mkdir /customerinput
hdfs dfs -put customers.csv /customerinput

3.导出数据

sqoop export
–connect jdbc:mysql://localhost:3306/sqoop
–driver com.mysql.jdbc.Driver
–username root
–password root
–table customers_demo
–export-dir /customerinput
–m 1

6、编写sqoop脚本

1.编写脚本 job_01.opt
import
--connect
jdbc:mysql://localhost:3306/sqoop
--driver com.mysql.jdbc.Driver
--table customers
--username root
--password root
--target-dir
/data/retail_db/customers
--delete-target-dir
--m 3
2.执行脚本

sqoop --options-file job_01.opt

你可能感兴趣的:(推荐系统-大数据,sqoop,mysql,hadoop)