作用:数据交换工具,可以实现数据在mysql/oracle<–>hdfs之间互相传递
原理:通过写sqoop命令把sqoop命令翻译成mapreduce,通过maperdece连接各种数据,实现数据传递
sqoop链接
提取码:4tkp
将安装包托入/software目录下
tar -zxvf sqoop-1.4.6-cdh5.14.2.tar.gz -C /opt
进入/opt对sqoop改名
cd /opt/
mv sqoop-1.4.6-cdh5.14.2/ sqoop
配置环境变量
vi /etc/profile
export SQOOP_HOME=/opt/sqoop
export PATH= S Q O O P H O M E / b i n : SQOOP_HOME/bin: SQOOPHOME/bin:PATH
让配置文件生效
source /etc/profile
修改配置文件
cd sqoop/conf
mv sqoop-env-template.sh sqoop-env.sh
vi sqoop-env.sh
export HADOOP_COMMON_HOME=/opt/hadoop
export HADOOP_MAPRED_HOME=/opt/hadoop
export HIVE_HOME=/opt/hive
export ZOOKEEPER_HOME=/opt/zookeeper
export ZOOCFGDIR=/opt/zookeeper
export HBASE_HOME=/opt/hbase
有命令出来就表示成功
运行sqoop1.4.5报Warning: does not exist! HCatalog jobs will fail.
进入bin
cd vi configure-sqoop
注释
## Moved to be a runtime check in sqoop.
#if [ ! -d "${HCAT_HOME}" ]; then
# echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
# echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
#fi
#if [ ! -d "${ACCUMULO_HOME}" ]; then
# echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
# echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
#fi
准备sql脚本,放入自己知道的目录下
准备工作:mysql中建库建表
mysql> create database sqoop;
mysql> use sqoop;
mysql> source /tmp/retail_db.sql
mysql> show tables;
使用sqoop将customers表导入到hdfs上
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop //mysql上的数据库
–driver com.mysql.jdbc.Driver
–table customers //mysql上的表
–username root //mysql用户名
–password root //密码
–target-dir /tmp/customers //目标HDFS路径
–m 3 //map数量
sqoop import --connect jdbc:mysql://localhost:3306/sqoop --driver com.mysql.jdbc.Driver --table customers --username root --password root --target-dir /tmp/customers --m 3
使用where过滤
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–driver com.mysql.jdbc.Driver
–table orders
–where “order_id<500”
–username root
–password root
–target-dir /data1/retail_db/orders
–m 3
使用colum 过滤
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop1
–driver com.mysql.jdbc.Driver
–table emp
–columns “EMPNO,ENAME,JOB,HIREDATE”
–where “SAL>2000”
–username root
–password root
–delete-target-dir
–target-dir /data1/sqoop1/emp
–m 3
使用查询语句
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–driver com.mysql.jdbc.Driver
–query “select * from orders where order_status!=‘CLOSED’ and $CONDITIONS”
–username root
–password root
–split-by order_id
–delete-target-dir
–target-dir /data1/retail_db/orders
–m 3
追加导入
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–driver com.mysql.jdbc.Driver
–table orders
–username root
–password root
–incremental append
–check-column order_date
–last-value ‘2014-07-24 00:00:00’
–target-dir /data1/retail_db/orders
–m 3
创建job 注意import前必须有空格
sqoop job
–create mysqlToHdfs
– import
–connect jdbc:mysql://localhost:3306/sqoop
–table orders
–username root
–password root
–incremental append
–check-column order_date
–last-value ‘0’
–target-dir /data1/retail_db/orders
–m 3
查看job
sqoop job --list
执行job
sqoop job --exec mysqlToHdfs
定时执行
crontab -e
- 2 */1 * *
sqoop job --exec mysqlToHdfs
先在Hive中创建表
hive -e “create database if not exists retail_db;”
如果目标路径存在会报错 删除已存在的目录
hdfs dfs -rmr hdfs://hadoop1:9000/user/root/orders1
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–driver com.mysql.jdbc.Driver
–table orders
–username root
–password root
–hive-import
–create-hive-table
–hive-database retail_db
–hive-table orders1
–m 3
导入数据到Hive分区中
删除Hive表
drop table if exists orders;
导入
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–driver com.mysql.jdbc.Driver
–query “select order_id,order_status from orders where order_date>=‘2013-11-03’ and order_date <‘2013-11-04’ and $CONDITIONS”
–username root
–password ok
–delete-target-dir
–target-dir /data1/retail_db/orders
–split-by order_id
–hive-import
–hive-database retail_db
–hive-table orders
–hive-partition-key “order_date”
–hive-partition-value “2013-11-03”
–m 3
注意:分区字段不能当成普通字段导入表中
create ‘products’,‘data’,‘category’
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–driver com.mysql.jdbc.Driver
–username root
–password ok
–table products
–hbase-table products
–column-family data
–m 3
create table customers_demo as select * from customers where 1=2;
hdfs dfs -mkdir /customerinput
hdfs dfs -put customers.csv /customerinput
sqoop export
–connect jdbc:mysql://localhost:3306/sqoop
–driver com.mysql.jdbc.Driver
–username root
–password root
–table customers_demo
–export-dir /customerinput
–m 1
import
--connect
jdbc:mysql://localhost:3306/sqoop
--driver com.mysql.jdbc.Driver
--table customers
--username root
--password root
--target-dir
/data/retail_db/customers
--delete-target-dir
--m 3
sqoop --options-file job_01.opt