使用Sqoop将SQL Server视图中数据导入Hive

环境版本: ·HDP-2.5.3  ·Hive 1.2.1  ·Sqoop 1.4.6  ·SQL Server 2012

文章目录

  • 1.下载sqljdbc4.jar放在$SQOOP_HOME/lib下
  • 2.测试SQL Server连接
    • 2.1 List available databases on a server
    • 2.2 List available tables in a database
    • 2.3 执行查询语句
  • 3.全量导入HDFS
  • 4.全量导入Hive
  • 5.重写,覆盖原数据
  • 备注

1.下载sqljdbc4.jar放在$SQOOP_HOME/lib下

2.测试SQL Server连接

2.1 List available databases on a server

[root@hqc-test-hdp1 ~]# sqoop list-databases --connect jdbc:sqlserver://10.35.xx.xx -username xx -password xx
Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/10/29 16:13:24 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
19/10/29 16:13:24 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/10/29 16:13:24 INFO manager.SqlManager: Using default fetchSize of 1000
master
AI

2.2 List available tables in a database

只显示dbo中表,且不显示视图

[root@hqc-test-hdp1 ~]# sqoop list-tables --connect "jdbc:sqlserver://10.35.xx.xx:1433;DatabaseName=AI" -username xx -password xx
Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/10/30 08:52:21 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
19/10/30 08:52:21 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/10/30 08:52:21 INFO manager.SqlManager: Using default fetchSize of 1000
tt
table1
Dictionary

# SELECT TOP 1000  * FROM [dbo].[Dictionary]

2.3 执行查询语句

[root@hqc-test-hdp1 ~]# sqoop eval --connect jdbc:sqlserver://10.35.xx.xx -username xx -password xx --query "SELECT TOP 5 * from [xx.xx]"
Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/10/29 16:22:22 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
19/10/29 16:22:22 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/10/29 16:22:22 INFO manager.SqlManager: Using default fetchSize of 1000
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Name                 | Code                 | Expr1    | Sname                | Cname                | Aname                | Longitude            | Latitude             | Position             | 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
略
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

3.全量导入HDFS

因为视图不是表,不能使用–table,只能使用–query。

[root@hqc-test-hdp1 ~]# su hdfs
[hdfs@hqc-test-hdp1 root]$ sqoop import --connect "jdbc:sqlserver://10.35.xx.xx:1433;DatabaseName=ICP" -username xx -password xx --query "SELECT * from [xx.xx] WHERE \$CONDITIONS" --target-dir /apps/hive/warehouse/hqc.db/xx -m 1
Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
find: failed to restore initial working directory: 权限不够
19/10/30 09:27:22 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
19/10/30 09:27:22 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/10/30 09:27:22 INFO manager.SqlManager: Using default fetchSize of 1000
19/10/30 09:27:22 INFO tool.CodeGenTool: Beginning code generation
19/10/30 09:27:23 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE 1=1 AND  (1 = 0) 
19/10/30 09:27:23 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE 1=1 AND  (1 = 0) 
19/10/30 09:27:23 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.5.3.0-37/hadoop-mapreduce
错误: 读取/usr/hdp/2.5.3.0-37/sqoop/lib/mysql-connector-java.jar时出错; cannot read zip file
错误: 读取/usr/hdp/2.5.3.0-37/hive/lib/mysql-connector-java.jar时出错; cannot read zip file
注: /tmp/sqoop-hdfs/compile/3c7b7eafcd1020b0b4e6d390fb32265b/QueryResult.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
19/10/30 09:27:26 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/3c7b7eafcd1020b0b4e6d390fb32265b/QueryResult.jar
19/10/30 09:27:26 INFO mapreduce.ImportJobBase: Beginning query import.
19/10/30 09:27:27 INFO impl.TimelineClientImpl: Timeline service address: http://hqc-test-hdp2:8188/ws/v1/timeline/
19/10/30 09:27:27 INFO client.RMProxy: Connecting to ResourceManager at hqc-test-hdp1/10.35:8050
19/10/30 09:27:28 INFO client.AHSProxy: Connecting to Application History server at hqc-test-hdp2/10.35:10200
19/10/30 09:27:30 INFO db.DBInputFormat: Using read commited transaction isolation
19/10/30 09:27:30 INFO mapreduce.JobSubmitter: number of splits:1
19/10/30 09:27:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1564035532438_0008
19/10/30 09:27:30 INFO impl.YarnClientImpl: Submitted application application_1564035532438_0008
19/10/30 09:27:30 INFO mapreduce.Job: The url to track the job: http://hqc-test-hdp1:8088/proxy/application_1564035532438_0008/
19/10/30 09:27:30 INFO mapreduce.Job: Running job: job_1564035532438_0008
19/10/30 09:27:46 INFO mapreduce.Job: Job job_1564035532438_0008 running in uber mode : false
19/10/30 09:27:46 INFO mapreduce.Job:  map 0% reduce 0%
19/10/30 09:27:55 INFO mapreduce.Job:  map 100% reduce 0%
19/10/30 09:27:55 INFO mapreduce.Job: Job job_1564035532438_0008 completed successfully
19/10/30 09:27:55 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=158794
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=87
		HDFS: Number of bytes written=717075
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Other local map tasks=1
		Total time spent by all maps in occupied slots (ms)=6086
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=6086
		Total vcore-milliseconds taken by all map tasks=6086
		Total megabyte-milliseconds taken by all map tasks=31160320
	Map-Reduce Framework
		Map input records=5676
		Map output records=5676
		Input split bytes=87
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=133
		CPU time spent (ms)=6280
		Physical memory (bytes) snapshot=369975296
		Virtual memory (bytes) snapshot=6399401984
		Total committed heap usage (bytes)=329777152
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=717075
19/10/30 09:27:55 INFO mapreduce.ImportJobBase: Transferred 700.2686 KB in 28.168 seconds (24.8605 KB/sec)
19/10/30 09:27:55 INFO mapreduce.ImportJobBase: Retrieved 5676 records.

4.全量导入Hive

不需要手动提前建hive表!不需要手动提前建hive表!不需要手动提前建hive表!–hive-table 指定表名就可以了,sqoop不同版本情况可能不同,请注意文章开头的环境版本。
因为视图不是表,不能使用–table,只能使用–query。使用 import --query时,–split-by 可以没有,但–target-dir必须有,路径写一个临时路径即可。据我观察,它是先把数据抽取到hdfs该路径下 ,然后把数据载入hive(表对应的hdfs目录),最后把该文件夹及里面文件删除。

[hdfs@hqc-test-hdp1 root]$ sqoop import --connect "jdbc:sqlserver://10.35.xx.xx:1433;DatabaseName=ICP" -username xx -password xx --query "SELECT * from [xx.xx] WHERE \$CONDITIONS" --hive-import -hive-database hqc --hive-table xx --target-dir /apps/hive/warehouse/hqc.db/xx -m 1
Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
find: failed to restore initial working directory: 权限不够
19/10/30 10:14:24 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
19/10/30 10:14:24 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/10/30 10:14:24 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
19/10/30 10:14:24 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
19/10/30 10:14:25 INFO manager.SqlManager: Using default fetchSize of 1000
19/10/30 10:14:25 INFO tool.CodeGenTool: Beginning code generation
19/10/30 10:14:25 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
19/10/30 10:14:26 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
19/10/30 10:14:26 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.5.3.0-37/hadoop-mapreduce
错误: 读取/usr/hdp/2.5.3.0-37/sqoop/lib/mysql-connector-java.jar时出错; cannot read zip file
错误: 读取/usr/hdp/2.5.3.0-37/hive/lib/mysql-connector-java.jar时出错; cannot read zip file
注: /tmp/sqoop-hdfs/compile/f1f7d212fbef24d849cb7d0604d2b0e5/QueryResult.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
19/10/30 10:14:28 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/f1f7d212fbef24d849cb7d0604d2b0e5/QueryResult.jar
19/10/30 10:14:29 INFO mapreduce.ImportJobBase: Beginning query import.
19/10/30 10:14:30 INFO impl.TimelineClientImpl: Timeline service address: http://hqc-test-hdp2:8188/ws/v1/timeline/
19/10/30 10:14:30 INFO client.RMProxy: Connecting to ResourceManager at hqc-test-hdp1/10.35:8050
19/10/30 10:14:31 INFO client.AHSProxy: Connecting to Application History server at hqc-test-hdp2/10.35:10200
19/10/30 10:14:33 INFO db.DBInputFormat: Using read commited transaction isolation
19/10/30 10:14:33 INFO mapreduce.JobSubmitter: number of splits:1
19/10/30 10:14:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1564035532438_0010
19/10/30 10:14:34 INFO impl.YarnClientImpl: Submitted application application_1564035532438_0010
19/10/30 10:14:34 INFO mapreduce.Job: The url to track the job: http://hqc-test-hdp1:8088/proxy/application_1564035532438_0010/
19/10/30 10:14:34 INFO mapreduce.Job: Running job: job_1564035532438_0010
19/10/30 10:14:44 INFO mapreduce.Job: Job job_1564035532438_0010 running in uber mode : false
19/10/30 10:14:44 INFO mapreduce.Job:  map 0% reduce 0%
19/10/30 10:14:59 INFO mapreduce.Job:  map 100% reduce 0%
19/10/30 10:14:59 INFO mapreduce.Job: Job job_1564035532438_0010 completed successfully
19/10/30 10:14:59 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=158786
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=87
		HDFS: Number of bytes written=717075
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Other local map tasks=1
		Total time spent by all maps in occupied slots (ms)=12530
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=12530
		Total vcore-milliseconds taken by all map tasks=12530
		Total megabyte-milliseconds taken by all map tasks=64153600
	Map-Reduce Framework
		Map input records=5676
		Map output records=5676
		Input split bytes=87
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=99
		CPU time spent (ms)=5950
		Physical memory (bytes) snapshot=362852352
		Virtual memory (bytes) snapshot=6400389120
		Total committed heap usage (bytes)=336068608
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=717075
19/10/30 10:14:59 INFO mapreduce.ImportJobBase: Transferred 700.2686 KB in 29.3386 seconds (23.8685 KB/sec)
19/10/30 10:14:59 INFO mapreduce.ImportJobBase: Retrieved 5676 records.
19/10/30 10:14:59 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners
19/10/30 10:14:59 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
19/10/30 10:15:00 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
19/10/30 10:15:00 WARN hive.TableDefWriter: Column Longitude had to be cast to a less precise type in Hive
19/10/30 10:15:00 WARN hive.TableDefWriter: Column Latitude had to be cast to a less precise type in Hive
19/10/30 10:15:00 INFO hive.HiveImport: Loading uploaded data into Hive

Logging initialized using configuration in jar:file:/usr/hdp/2.5.3.0-37/hive/lib/hive-common-1.2.1000.2.5.3.0-37.jar!/hive-log4j.properties
OK
Time taken: 2.898 seconds
Loading data to table hqc.xx
Table hqc.xx stats: [numFiles=1, numRows=0, totalSize=717075, rawDataSize=0]
OK
Time taken: 0.627 seconds

5.重写,覆盖原数据

加入 --hive-overwrite

[hdfs@hqc-test-hdp1 root]$ sqoop import --connect "jdbc:sqlserver://10.35.xx.xx:1433;DatabaseName=ICP" -username xx -password xx --query "SELECT * from [xx.xx] WHERE \$CONDITIONS" --hive-overwrite --hive-import -hive-database hqc --hive-table xx --target-dir /apps/hive/warehouse/hqc.db/xx -m 1
Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
find: failed to restore initial working directory: 权限不够
19/10/30 16:12:59 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
19/10/30 16:12:59 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/10/30 16:12:59 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
19/10/30 16:12:59 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
19/10/30 16:13:00 INFO manager.SqlManager: Using default fetchSize of 1000
19/10/30 16:13:00 INFO tool.CodeGenTool: Beginning code generation
19/10/30 16:13:00 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
19/10/30 16:13:01 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
19/10/30 16:13:01 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.5.3.0-37/hadoop-mapreduce
错误: 读取/usr/hdp/2.5.3.0-37/sqoop/lib/mysql-connector-java.jar时出错; cannot read zip file
错误: 读取/usr/hdp/2.5.3.0-37/hive/lib/mysql-connector-java.jar时出错; cannot read zip file
注: /tmp/sqoop-hdfs/compile/2dcbcc5ee20eac3b80e2f2109726b44c/QueryResult.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
19/10/30 16:13:03 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/2dcbcc5ee20eac3b80e2f2109726b44c/QueryResult.jar
19/10/30 16:13:03 INFO mapreduce.ImportJobBase: Beginning query import.
19/10/30 16:13:05 INFO impl.TimelineClientImpl: Timeline service address: http://hqc-test-hdp2:8188/ws/v1/timeline/
19/10/30 16:13:05 INFO client.RMProxy: Connecting to ResourceManager at hqc-test-hdp1/10.35:8050
19/10/30 16:13:06 INFO client.AHSProxy: Connecting to Application History server at hqc-test-hdp2/10.35:10200
19/10/30 16:13:08 INFO db.DBInputFormat: Using read commited transaction isolation
19/10/30 16:13:08 INFO mapreduce.JobSubmitter: number of splits:1
19/10/30 16:13:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1564035532438_0023
19/10/30 16:13:09 INFO impl.YarnClientImpl: Submitted application application_1564035532438_0023
19/10/30 16:13:09 INFO mapreduce.Job: The url to track the job: http://hqc-test-hdp1:8088/proxy/application_1564035532438_0023/
19/10/30 16:13:09 INFO mapreduce.Job: Running job: job_1564035532438_0023
19/10/30 16:13:19 INFO mapreduce.Job: Job job_1564035532438_0023 running in uber mode : false
19/10/30 16:13:19 INFO mapreduce.Job:  map 0% reduce 0%
19/10/30 16:13:29 INFO mapreduce.Job:  map 100% reduce 0%
19/10/30 16:13:29 INFO mapreduce.Job: Job job_1564035532438_0023 completed successfully
19/10/30 16:13:29 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=158455
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=87
		HDFS: Number of bytes written=5217236
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Other local map tasks=1
		Total time spent by all maps in occupied slots (ms)=7426
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=7426
		Total vcore-milliseconds taken by all map tasks=7426
		Total megabyte-milliseconds taken by all map tasks=38021120
	Map-Reduce Framework
		Map input records=16993
		Map output records=16993
		Input split bytes=87
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=108
		CPU time spent (ms)=9300
		Physical memory (bytes) snapshot=398262272
		Virtual memory (bytes) snapshot=6417539072
		Total committed heap usage (bytes)=415760384
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=5217236
19/10/30 16:13:29 INFO mapreduce.ImportJobBase: Transferred 4.9755 MB in 24.7647 seconds (205.7342 KB/sec)
19/10/30 16:13:29 INFO mapreduce.ImportJobBase: Retrieved 16993 records.
19/10/30 16:13:29 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners
19/10/30 16:13:30 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
19/10/30 16:13:30 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
19/10/30 16:13:30 WARN hive.TableDefWriter: Column AlarmTimeStamp had to be cast to a less precise type in Hive
19/10/30 16:13:30 WARN hive.TableDefWriter: Column HandlerTime had to be cast to a less precise type in Hive
19/10/30 16:13:30 INFO hive.HiveImport: Loading uploaded data into Hive

Logging initialized using configuration in jar:file:/usr/hdp/2.5.3.0-37/hive/lib/hive-common-1.2.1000.2.5.3.0-37.jar!/hive-log4j.properties
OK
Time taken: 2.809 seconds
Loading data to table hqc.xx
Table hqc.xx stats: [numFiles=1, numRows=0, totalSize=5217236, rawDataSize=0]
OK
Time taken: 1.321 seconds

备注

1.create-hive-table & hive-import 区别
问题:
Can anyone tell the difference between create-hive-table & hive-import method? Both will create a hive table, but still what is the significance of each?
回答:
The difference is that create-hive-table will create table in Hive based on the source table in database but will NOT transfer any data. Command "import --hive-import" will both create table in Hive and import data from the source table.
只创建表:
sqoop create-hive-table --connect "jdbc:sqlserver://192.168.13.1:1433;username=root;password=12345;databasename=test" --table test --hive-table myhive2 --hive-partition-key partition_time --map-column-hive ID=String,name=String,addr=String
全量导入(sqlserver条件用where,oracle用and):
sqoop import --connect "jdbc:sqlserver://192.168.13.1:1433;username=root;password=12345;databasename=test" --query "select * from test i where \$CONDITIONS" --target-dir /user/hive/warehouse/myhive2/partition_time=20171023 --hive-import -m 5 --hive-table myhive2 --split-by ID --hive-partition-key partition_time --hive-partition-value 20171023

2.sqoop help
Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/10/30 08:42:05 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
usage: sqoop COMMAND [ARGS]

Available commands:
  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  import-mainframe   Import datasets from a mainframe server to HDFS
  job                Work with saved jobs
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  merge              Merge results of incremental imports
  metastore          Run a standalone Sqoop metastore
  version            Display version information
  
3.sqoop export 从hdfs/hive等导入关系型数据库,支持全量、增量、更新三种模式
--update-mode  updateonly/allowinsert
  全量导出
HQL示例:insert overwrite  directory ‘/user/root/export/test’ row format delimited fields terminated by ‘,’ STORED AS textfile select F1,F2,F3 from ;
SQOOP脚本:sqoop export --connect jdbc:mysql://localhost:3306/wht --username root --password cloudera --table  --fields-terminated-by ','  --columns F1,F2,F3 --export-dir /user/root/export/test
 
--update-mode  allowinsert 更新且增加
HQL示例:insert overwrite  directory ‘/user/root/export/test’ row format delimited fields terminated by ‘,’ STORED AS textfile select F1,F2,F3 from  where ;
SQOOP脚本:sqoop export --connect jdbc:mysql://localhost:3306/wht --username root --password cloudera --table  --fields-terminated-by ‘,’  --columns F1,F2,F3 --update-key F4 --update-mode  allowinsert --export-dir /user/root/export/test
 
 
--update-mode  updateonly 只更新
HQL示例:insert overwrite  directory ‘/user/root/export/test’ row format delimited fields terminated by ‘,’ STORED AS textfile select F1,F2,F3 from  where ;
SQOOP脚本:sqoop export --connect jdbc:mysql://localhost:3306/wht --username root --password cloudera --table  --fields-terminated-by ‘,’  --columns F1,F2,F3 --update-key F4 --update-mode  updateonly --export-dir /user/root/export/test

usage: sqoop export [GENERIC-ARGS] [TOOL-ARGS]
Common arguments:
   --connect                          Specify JDBC connect
                                                string
   --connection-manager             Specify connection manager
                                                class name
   --connection-param-file     Specify connection
                                                parameters file
   --driver                         Manually specify JDBC
                                                driver class to use
   --hadoop-home                          Override
                                                $HADOOP_MAPRED_HOME_ARG
   --hadoop-mapred-home                    Override
                                                $HADOOP_MAPRED_HOME_ARG
   --help                                       Print usage instructions
-P                                              Read password from console
   --password                         Set authentication
                                                password
   --password-alias             Credential provider
                                                password alias
   --password-file               Set authentication
                                                password file path
   --relaxed-isolation                          Use read-uncommitted
                                                isolation for imports
   --skip-dist-cache                            Skip copying jars to
                                                distributed cache
   --temporary-rootdir                 Defines the temporary root
                                                directory for the import
   --username                         Set authentication
                                                username
   --verbose                                    Print more information
                                                while working

Export control arguments:
   --batch                                                    Indicates
                                                              underlying
                                                              statements
                                                              to be
                                                              executed in
                                                              batch mode
   --call                                                Populate the
                                                              table using
                                                              this stored
                                                              procedure
                                                              (one call
                                                              per row)
   --clear-staging-table                                      Indicates
                                                              that any
                                                              data in
                                                              staging
                                                              table can be
                                                              deleted
   --columns                                  Columns to
                                                              export to
                                                              table
   --direct                                                   Use direct
                                                              export fast
                                                              path
   --export-dir                                          HDFS source
                                                              path for the
                                                              export
-m,--num-mappers                                           Use 'n' map
                                                              tasks to
                                                              export in
                                                              parallel
   --mapreduce-job-name                                 Set name for
                                                              generated
                                                              mapreduce
                                                              job
   --staging-table                                Intermediate
                                                              staging
                                                              table
   --table                                        Table to
                                                              populate
   --update-key                                          Update
                                                              records by
                                                              specified
                                                              key column
   --update-mode                                        Specifies
                                                              how updates
                                                              are
                                                              performed
                                                              when new
                                                              rows are
                                                              found with
                                                              non-matching
                                                              keys in
                                                              database
   --validate                                                 Validate the
                                                              copy using
                                                              the
                                                              configured
                                                              validator
   --validation-failurehandler     Fully
                                                              qualified
                                                              class name
                                                              for
                                                              ValidationFa
                                                              ilureHandler
   --validation-threshold               Fully
                                                              qualified
                                                              class name
                                                              for
                                                              ValidationTh
                                                              reshold
   --validator                                     Fully
                                                              qualified
                                                              class name
                                                              for the
                                                              Validator

Input parsing arguments:
   --input-enclosed-by                Sets a required field encloser
   --input-escaped-by                 Sets the input escape
                                            character
   --input-fields-terminated-by       Sets the input field separator
   --input-lines-terminated-by        Sets the input end-of-line
                                            char
   --input-optionally-enclosed-by     Sets a field enclosing
                                            character

Output line formatting arguments:
   --enclosed-by                Sets a required field enclosing
                                      character
   --escaped-by                 Sets the escape character
   --fields-terminated-by       Sets the field separator character
   --lines-terminated-by        Sets the end-of-line character
   --mysql-delimiters                 Uses MySQL's default delimiter set:
                                      fields: ,  lines: \n  escaped-by: \
                                      optionally-enclosed-by: '
   --optionally-enclosed-by     Sets a field enclosing character

Code generation arguments:
   --bindir                         Output directory for compiled
                                         objects
   --class-name                    Sets the generated class name.
                                         This overrides --package-name.
                                         When combined with --jar-file,
                                         sets the input class.
   --input-null-non-string     Input null non-string
                                         representation
   --input-null-string         Input null string representation
   --jar-file                      Disable code generation; use
                                         specified jar
   --map-column-java                Override mapping for specific
                                         columns to java types
   --null-non-string           Null non-string representation
   --null-string               Null string representation
   --outdir                         Output directory for generated
                                         code
   --package-name                  Put auto-generated classes in
                                         this package

HCatalog arguments:
   --hcatalog-database                         HCatalog database name
   --hcatalog-home                            Override $HCAT_HOME
   --hcatalog-partition-keys         Sets the partition
                                                    keys to use when
                                                    importing to hive
   --hcatalog-partition-values     Sets the partition
                                                    values to use when
                                                    importing to hive
   --hcatalog-table                            HCatalog table name
   --hive-home                                 Override $HIVE_HOME
   --hive-partition-key              Sets the partition key
                                                    to use when importing
                                                    to hive
   --hive-partition-value          Sets the partition
                                                    value to use when
                                                    importing to hive
   --map-column-hive                           Override mapping for
                                                    specific column to
                                                    hive types.

Generic Hadoop command-line arguments:
(must preceed any tool-specific arguments)
Generic options supported are
-conf      specify an application configuration file
-D             use value for given property
-fs       specify a namenode
-jt     specify a ResourceManager
-files     specify comma separated files to be copied to the map reduce cluster
-libjars     specify comma separated jar files to include in the classpath.
-archives     specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

At minimum, you must specify --connect, --export-dir, and --table

4.sqoop import help
usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS]
Common arguments:
   --connect                          Specify JDBC connect
                                                string
   --connection-manager             Specify connection manager
                                                class name
   --connection-param-file     Specify connection
                                                parameters file
   --driver                         Manually specify JDBC
                                                driver class to use
   --hadoop-home                          Override
                                                $HADOOP_MAPRED_HOME_ARG
   --hadoop-mapred-home                    Override
                                                $HADOOP_MAPRED_HOME_ARG
   --help                                       Print usage instructions
-P                                              Read password from console
   --password                         Set authentication
                                                password
   --password-alias             Credential provider
                                                password alias
   --password-file               Set authentication
                                                password file path
   --relaxed-isolation                          Use read-uncommitted
                                                isolation for imports
   --skip-dist-cache                            Skip copying jars to
                                                distributed cache
   --temporary-rootdir                 Defines the temporary root
                                                directory for the import
   --username                         Set authentication
                                                username
   --verbose                                    Print more information
                                                while working

Import control arguments:
   --append                                                   Imports data
                                                              in append
                                                              mode
   --as-avrodatafile                                          Imports data
                                                              to Avro data
                                                              files
   --as-parquetfile                                           Imports data
                                                              to Parquet
                                                              files
   --as-sequencefile                                          Imports data
                                                              to
                                                              SequenceFile
                                                              s
   --as-textfile                                              Imports data
                                                              as plain
                                                              text
                                                              (default)
   --autoreset-to-one-mapper                                  Reset the
                                                              number of
                                                              mappers to
                                                              one mapper
                                                              if no split
                                                              key
                                                              available
   --boundary-query                                Set boundary
                                                              query for
                                                              retrieving
                                                              max and min
                                                              value of the
                                                              primary key
   --columns                                  Columns to
                                                              import from
                                                              table
   --compression-codec                                 Compression
                                                              codec to use
                                                              for import
   --delete-target-dir                                        Imports data
                                                              in delete
                                                              mode
   --direct                                                   Use direct
                                                              import fast
                                                              path
   --direct-split-size                                     Split the
                                                              input stream
                                                              every 'n'
                                                              bytes when
                                                              importing in
                                                              direct mode
-e,--query                                         Import
                                                              results of
                                                              SQL
                                                              'statement'
   --fetch-size                                            Set number
                                                              'n' of rows
                                                              to fetch
                                                              from the
                                                              database
                                                              when more
                                                              rows are
                                                              needed
   --inline-lob-limit                                      Set the
                                                              maximum size
                                                              for an
                                                              inline LOB
-m,--num-mappers                                           Use 'n' map
                                                              tasks to
                                                              import in
                                                              parallel
   --mapreduce-job-name                                 Set name for
                                                              generated
                                                              mapreduce
                                                              job
   --merge-key                                        Key column
                                                              to use to
                                                              join results
   --split-by                                    Column of
                                                              the table
                                                              used to
                                                              split work
                                                              units
   --split-limit                                        Upper Limit
                                                              of rows per
                                                              split for
                                                              split
                                                              columns of
                                                              Date/Time/Ti
                                                              mestamp and
                                                              integer
                                                              types. For
                                                              date or
                                                              timestamp
                                                              fields it is
                                                              calculated
                                                              in seconds.
                                                              split-limit
                                                              should be
                                                              greater than
                                                              0
   --table                                        Table to
                                                              read
   --target-dir                                          HDFS plain
                                                              table
                                                              destination
   --validate                                                 Validate the
                                                              copy using
                                                              the
                                                              configured
                                                              validator
   --validation-failurehandler     Fully
                                                              qualified
                                                              class name
                                                              for
                                                              ValidationFa
                                                              ilureHandler
   --validation-threshold               Fully
                                                              qualified
                                                              class name
                                                              for
                                                              ValidationTh
                                                              reshold
   --validator                                     Fully
                                                              qualified
                                                              class name
                                                              for the
                                                              Validator
   --warehouse-dir                                       HDFS parent
                                                              for table
                                                              destination
   --where                                      WHERE clause
                                                              to use
                                                              during
                                                              import
-z,--compress                                                 Enable
                                                              compression

Incremental import arguments:
   --check-column         Source column to check for incremental
                                  change
   --incremental     Define an incremental import of type
                                  'append' or 'lastmodified'
   --last-value            Last imported value in the incremental
                                  check column

Output line formatting arguments:
   --enclosed-by                Sets a required field enclosing
                                      character
   --escaped-by                 Sets the escape character
   --fields-terminated-by       Sets the field separator character
   --lines-terminated-by        Sets the end-of-line character
   --mysql-delimiters                 Uses MySQL's default delimiter set:
                                      fields: ,  lines: \n  escaped-by: \
                                      optionally-enclosed-by: '
   --optionally-enclosed-by     Sets a field enclosing character

Input parsing arguments:
   --input-enclosed-by                Sets a required field encloser
   --input-escaped-by                 Sets the input escape
                                            character
   --input-fields-terminated-by       Sets the input field separator
   --input-lines-terminated-by        Sets the input end-of-line
                                            char
   --input-optionally-enclosed-by     Sets a field enclosing
                                            character

Hive arguments:
   --create-hive-table                         Fail if the target hive
                                               table exists
   --hive-compute-stats                        Overwrite existing data in
                                               the Hive table
   --hive-database              Sets the database name to
                                               use when importing to hive
   --hive-delims-replacement              Replace Hive record \0x01
                                               and row delimiters (\n\r)
                                               from imported string fields
                                               with user-defined string
   --hive-drop-import-delims                   Drop Hive record \0x01 and
                                               row delimiters (\n\r) from
                                               imported string fields
   --hive-home                            Override $HIVE_HOME
   --hive-import                               Import tables into Hive
                                               (Uses Hive's default
                                               delimiters if none are
                                               set.)
   --hive-overwrite                            Overwrite existing data in
                                               the Hive table
   --hive-partition-key         Sets the partition key to
                                               use when importing to hive
   --hive-partition-value     Sets the partition value to
                                               use when importing to hive
   --hive-table                    Sets the table name to use
                                               when importing to hive
   --map-column-hive                      Override mapping for
                                               specific column to hive
                                               types.

HBase arguments:
   --column-family     Sets the target column family for the
                               import
   --hbase-bulkload            Enables HBase bulk loading
   --hbase-create-table        If specified, create missing HBase tables
   --hbase-row-key        Specifies which input column to use as the
                               row key
   --hbase-table        Import to 
in HBase HCatalog arguments: --hcatalog-database HCatalog database name --hcatalog-home Override $HCAT_HOME --hcatalog-partition-keys Sets the partition keys to use when importing to hive --hcatalog-partition-values Sets the partition values to use when importing to hive --hcatalog-table HCatalog table name --hive-home Override $HIVE_HOME --hive-partition-key Sets the partition key to use when importing to hive --hive-partition-value Sets the partition value to use when importing to hive --map-column-hive Override mapping for specific column to hive types. HCatalog import specific options: --create-hcatalog-table Create HCatalog before import --drop-and-create-hcatalog-table Drop and Create HCatalog before import --hcatalog-storage-stanza HCatalog storage stanza for table creation Accumulo arguments: --accumulo-batch-size Batch size in bytes --accumulo-column-family Sets the target column family for the import --accumulo-create-table If specified, create missing Accumulo tables --accumulo-instance Accumulo instance name. --accumulo-max-latency Max write latency in milliseconds --accumulo-password Accumulo password. --accumulo-row-key Specifies which input column to use as the row key --accumulo-table
Import to
in Accumulo --accumulo-user Accumulo user name. --accumulo-visibility Visibility token to be applied to all rows imported --accumulo-zookeepers Comma-separated list of zookeepers (host:port) Code generation arguments: --bindir Output directory for compiled objects --class-name Sets the generated class name. This overrides --package-name. When combined with --jar-file, sets the input class. --input-null-non-string Input null non-string representation --input-null-string Input null string representation --jar-file Disable code generation; use specified jar --map-column-java Override mapping for specific columns to java types --null-non-string Null non-string representation --null-string Null string representation --outdir Output directory for generated code --package-name Put auto-generated classes in this package Generic Hadoop command-line arguments: (must preceed any tool-specific arguments) Generic options supported are -conf specify an application configuration file -D use value for given property -fs specify a namenode -jt specify a ResourceManager -files specify comma separated files to be copied to the map reduce cluster -libjars specify comma separated jar files to include in the classpath. -archives specify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] At minimum, you must specify --connect and --table Arguments to mysqldump and other subprograms may be supplied after a '--' on the command line.

你可能感兴趣的:(大数据动物园,Hive)