使用Sqoop从MySQL导入数据到HBase和hive(转载)

有时候需要将mysql的全量数据导入到hive或者hbase中,使用sqoop是一个比较好用的工具,速度相对来说比较快。mysql的增量数据在用其他方法实时同步。

一、mysql同步到hbase

导入命令:
sqoop import --connect jdbc:mysql://xxx.xxx.xxx.xxx:3306/database --table tablename --hbase-table hbasetablename --column-family family --hbase-row-key ID --hbase-create-table --username 'root' -P
参数说明:
--connect:数据库连接串
--username:用户名
--P:交互式输入密码
--table:表名
-m:并行执行sqoop导入程序的map task的数量,在不指定的情况下默认启动4个map
--split-by:并行导入过程中,各个map task根据哪个字段来划分数据段,该参数最好指定一个能相对均匀划分数据的字段,比如创建时间、递增的ID
--hbase-table:hbase中接收数据的表名
--hbase-create-table:如果指定的接收数据表在hbase中不存在,则新建表
--column-family:列族名称,所有源表的字段都进入该列族
--hbase-row-key:如果不指定则采用源表的key作为hbase的row key。可以指定一个字段作为row key,或者指定组合行键,当指定组合行键时,用双引号包含多个字段,各字段用逗号分隔

执行命令部分日志:

[hdfs@slave1 ~]$ sqoop import --connect jdbc:mysql://xxx.xxx.xxx.xxx:3306/database --table tablename --hbase-table hbasetablename --column-family family --hbase-row-key ID --hbase-create-table --username 'root' -P
Warning: /soft/bigdata/clouderamanager/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/04/28 15:54:37 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.10.0
Enter password: 
17/04/28 15:54:44 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/04/28 15:54:44 INFO tool.CodeGenTool: Beginning code generation
Fri Apr 28 15:54:44 CST 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
17/04/28 15:54:45 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `COMP_DICT` AS t LIMIT 1
17/04/28 15:54:45 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `COMP_DICT` AS t LIMIT 1
17/04/28 15:54:45 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /soft/bigdata/clouderamanager/cloudera/parcels/CDH/lib/hadoop-mapreduce
Note: /tmp/sqoop-hdfs/compile/f5c3b693ffb26b66c554308ad32b2880/COMP_DICT.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/04/28 15:54:47 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/f5c3b693ffb26b66c554308ad32b2880/COMP_DICT.jar
……
7/04/28 15:54:53 INFO mapreduce.Job: The url to track the job: http://master2:8088/proxy/application_1491881598805_0027/
17/04/28 15:54:53 INFO mapreduce.Job: Running job: job_1491881598805_0027
17/04/28 15:54:59 INFO mapreduce.Job: Job job_1491881598805_0027 running in uber mode : false
17/04/28 15:54:59 INFO mapreduce.Job:  map 0% reduce 0%
17/04/28 15:55:05 INFO mapreduce.Job:  map 20% reduce 0%
17/04/28 15:55:06 INFO mapreduce.Job:  map 60% reduce 0%
17/04/28 15:55:09 INFO mapreduce.Job:  map 100% reduce 0%
17/04/28 15:55:10 INFO mapreduce.Job: Job job_1491881598805_0027 completed successfully
17/04/28 15:55:10 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=925010
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=665
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=5
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=0
    Job Counters 
        Launched map tasks=5
        Other local map tasks=5
        Total time spent by all maps in occupied slots (ms)=25663
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=25663
        Total vcore-seconds taken by all map tasks=25663
        Total megabyte-seconds taken by all map tasks=26278912
    Map-Reduce Framework
        Map input records=10353
        Map output records=10353
        Input split bytes=665
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=586
        CPU time spent (ms)=17940
        Physical memory (bytes) snapshot=1619959808
        Virtual memory (bytes) snapshot=14046998528
        Total committed heap usage (bytes)=1686634496
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=0
17/04/28 15:55:10 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 20.0424 seconds (0 bytes/sec)
17/04/28 15:55:10 INFO mapreduce.ImportJobBase: Retrieved 10353 records.

PS:如果想把表中的主键也增加到hbase的cell中也就是列族中的列的话,需要设置如下参数:
sqoop.hbase.add.row.key=true

sqoop import -D sqoop.hbase.add.row.key=true --connect jdbc:mysql://xxx.xxx.xxx.xxx:3306/database --table tablename --hbase-table hbasetablename --column-family family --hbase-row-key "etl_date,APPLY_ID" --hbase-create-table --username 'root' -P

二、mysql同步到hive

1、创建与mysql相同结构的表:

sqoop create-hive-table --connect jdbc:mysql://xxx.xxx.xxx.xxx:3306/shiro --table UserInfo --hive-database shiro --hive-table userinfo --username root --password xxxxxx --fields-terminated-by "\0001" --lines-terminated-by "\n";
参数说明:
--fields-terminated-by "\0001" 是设置每列之间的分隔符,"\0001"是ASCII码中的1,它也是hive的默认行内分隔符, 而sqoop的默认行内分隔符为","
--lines-terminated-by "\n" 设置的是每行之间的分隔符,此处为换行符,也是默认的分隔符;

2、mysql数据导入到hive中

sqoop import --connect jdbc:mysql://xxx.xxx.xxx.xxx:3306/testSqoop --table dydata --hive-database testsqoop --hive-import --hive-table dydata --username root --password xxxxxx --fields-terminated-by "\0001";
参数说明:
-m 2 表示由两个map作业执行
--fields-terminated-by "\0001" 需同创建hive表时保持一致;
--hive-import一定要加此参数,否则无法成功导入hive中



作者:献给记性不好的自己
链接:https://www.jianshu.com/p/929934b5e9b8
來源:简书
简书著作权归作者所有,任何形式的转载都请联系作者获得授权并注明出处。

你可能感兴趣的:(使用Sqoop从MySQL导入数据到HBase和hive(转载))