光于前裕于后

使用Sqoop将SQL Server视图中数据导入Hive

环境版本： ·HDP-2.5.3  ·Hive 1.2.1  ·Sqoop 1.4.6  ·SQL Server 2012

文章目录

1.下载sqljdbc4.jar放在$SQOOP_HOME/lib下
2.测试SQL Server连接

2.1 List available databases on a server
2.2 List available tables in a database
2.3 执行查询语句

3.全量导入HDFS
4.全量导入Hive
5.重写，覆盖原数据
备注

1.下载sqljdbc4.jar放在$SQOOP_HOME/lib下

2.测试SQL Server连接

2.1 List available databases on a server

[root@hqc-test-hdp1 ~]# sqoop list-databases --connect jdbc:sqlserver://10.35.xx.xx -username xx -password xx
Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/10/29 16:13:24 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
19/10/29 16:13:24 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/10/29 16:13:24 INFO manager.SqlManager: Using default fetchSize of 1000
master
AI

2.2 List available tables in a database

只显示dbo中表，且不显示视图

[root@hqc-test-hdp1 ~]# sqoop list-tables --connect "jdbc:sqlserver://10.35.xx.xx:1433;DatabaseName=AI" -username xx -password xx
Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/10/30 08:52:21 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
19/10/30 08:52:21 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/10/30 08:52:21 INFO manager.SqlManager: Using default fetchSize of 1000
tt
table1
Dictionary

# SELECT TOP 1000  * FROM [dbo].[Dictionary]

2.3 执行查询语句

[root@hqc-test-hdp1 ~]# sqoop eval --connect jdbc:sqlserver://10.35.xx.xx -username xx -password xx --query "SELECT TOP 5 * from [xx.xx]"
Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/10/29 16:22:22 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
19/10/29 16:22:22 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/10/29 16:22:22 INFO manager.SqlManager: Using default fetchSize of 1000
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Name                 | Code                 | Expr1    | Sname                | Cname                | Aname                | Longitude            | Latitude             | Position             | 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
略
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

3.全量导入HDFS

因为视图不是表，不能使用–table，只能使用–query。

[root@hqc-test-hdp1 ~]# su hdfs
[hdfs@hqc-test-hdp1 root]$ sqoop import --connect "jdbc:sqlserver://10.35.xx.xx:1433;DatabaseName=ICP" -username xx -password xx --query "SELECT * from [xx.xx] WHERE \$CONDITIONS" --target-dir /apps/hive/warehouse/hqc.db/xx -m 1
Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
find: failed to restore initial working directory: 权限不够
19/10/30 09:27:22 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
19/10/30 09:27:22 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/10/30 09:27:22 INFO manager.SqlManager: Using default fetchSize of 1000
19/10/30 09:27:22 INFO tool.CodeGenTool: Beginning code generation
19/10/30 09:27:23 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE 1=1 AND  (1 = 0) 
19/10/30 09:27:23 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE 1=1 AND  (1 = 0) 
19/10/30 09:27:23 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.5.3.0-37/hadoop-mapreduce
错误: 读取/usr/hdp/2.5.3.0-37/sqoop/lib/mysql-connector-java.jar时出错; cannot read zip file
错误: 读取/usr/hdp/2.5.3.0-37/hive/lib/mysql-connector-java.jar时出错; cannot read zip file
注: /tmp/sqoop-hdfs/compile/3c7b7eafcd1020b0b4e6d390fb32265b/QueryResult.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
19/10/30 09:27:26 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/3c7b7eafcd1020b0b4e6d390fb32265b/QueryResult.jar
19/10/30 09:27:26 INFO mapreduce.ImportJobBase: Beginning query import.
19/10/30 09:27:27 INFO impl.TimelineClientImpl: Timeline service address: http://hqc-test-hdp2:8188/ws/v1/timeline/
19/10/30 09:27:27 INFO client.RMProxy: Connecting to ResourceManager at hqc-test-hdp1/10.35:8050
19/10/30 09:27:28 INFO client.AHSProxy: Connecting to Application History server at hqc-test-hdp2/10.35:10200
19/10/30 09:27:30 INFO db.DBInputFormat: Using read commited transaction isolation
19/10/30 09:27:30 INFO mapreduce.JobSubmitter: number of splits:1
19/10/30 09:27:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1564035532438_0008
19/10/30 09:27:30 INFO impl.YarnClientImpl: Submitted application application_1564035532438_0008
19/10/30 09:27:30 INFO mapreduce.Job: The url to track the job: http://hqc-test-hdp1:8088/proxy/application_1564035532438_0008/
19/10/30 09:27:30 INFO mapreduce.Job: Running job: job_1564035532438_0008
19/10/30 09:27:46 INFO mapreduce.Job: Job job_1564035532438_0008 running in uber mode : false
19/10/30 09:27:46 INFO mapreduce.Job:  map 0% reduce 0%
19/10/30 09:27:55 INFO mapreduce.Job:  map 100% reduce 0%
19/10/30 09:27:55 INFO mapreduce.Job: Job job_1564035532438_0008 completed successfully
19/10/30 09:27:55 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=158794
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=87
		HDFS: Number of bytes written=717075
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Other local map tasks=1
		Total time spent by all maps in occupied slots (ms)=6086
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=6086
		Total vcore-milliseconds taken by all map tasks=6086
		Total megabyte-milliseconds taken by all map tasks=31160320
	Map-Reduce Framework
		Map input records=5676
		Map output records=5676
		Input split bytes=87
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=133
		CPU time spent (ms)=6280
		Physical memory (bytes) snapshot=369975296
		Virtual memory (bytes) snapshot=6399401984
		Total committed heap usage (bytes)=329777152
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=717075
19/10/30 09:27:55 INFO mapreduce.ImportJobBase: Transferred 700.2686 KB in 28.168 seconds (24.8605 KB/sec)
19/10/30 09:27:55 INFO mapreduce.ImportJobBase: Retrieved 5676 records.

4.全量导入Hive

不需要手动提前建hive表！不需要手动提前建hive表！不需要手动提前建hive表！–hive-table 指定表名就可以了，sqoop不同版本情况可能不同，请注意文章开头的环境版本。
因为视图不是表，不能使用–table，只能使用–query。使用 import --query时，–split-by 可以没有，但–target-dir必须有，路径写一个临时路径即可。据我观察，它是先把数据抽取到hdfs该路径下，然后把数据载入hive(表对应的hdfs目录)，最后把该文件夹及里面文件删除。

[hdfs@hqc-test-hdp1 root]$ sqoop import --connect "jdbc:sqlserver://10.35.xx.xx:1433;DatabaseName=ICP" -username xx -password xx --query "SELECT * from [xx.xx] WHERE \$CONDITIONS" --hive-import -hive-database hqc --hive-table xx --target-dir /apps/hive/warehouse/hqc.db/xx -m 1
Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
find: failed to restore initial working directory: 权限不够
19/10/30 10:14:24 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
19/10/30 10:14:24 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/10/30 10:14:24 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
19/10/30 10:14:24 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
19/10/30 10:14:25 INFO manager.SqlManager: Using default fetchSize of 1000
19/10/30 10:14:25 INFO tool.CodeGenTool: Beginning code generation
19/10/30 10:14:25 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
19/10/30 10:14:26 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
19/10/30 10:14:26 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.5.3.0-37/hadoop-mapreduce
错误: 读取/usr/hdp/2.5.3.0-37/sqoop/lib/mysql-connector-java.jar时出错; cannot read zip file
错误: 读取/usr/hdp/2.5.3.0-37/hive/lib/mysql-connector-java.jar时出错; cannot read zip file
注: /tmp/sqoop-hdfs/compile/f1f7d212fbef24d849cb7d0604d2b0e5/QueryResult.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
19/10/30 10:14:28 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/f1f7d212fbef24d849cb7d0604d2b0e5/QueryResult.jar
19/10/30 10:14:29 INFO mapreduce.ImportJobBase: Beginning query import.
19/10/30 10:14:30 INFO impl.TimelineClientImpl: Timeline service address: http://hqc-test-hdp2:8188/ws/v1/timeline/
19/10/30 10:14:30 INFO client.RMProxy: Connecting to ResourceManager at hqc-test-hdp1/10.35:8050
19/10/30 10:14:31 INFO client.AHSProxy: Connecting to Application History server at hqc-test-hdp2/10.35:10200
19/10/30 10:14:33 INFO db.DBInputFormat: Using read commited transaction isolation
19/10/30 10:14:33 INFO mapreduce.JobSubmitter: number of splits:1
19/10/30 10:14:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1564035532438_0010
19/10/30 10:14:34 INFO impl.YarnClientImpl: Submitted application application_1564035532438_0010
19/10/30 10:14:34 INFO mapreduce.Job: The url to track the job: http://hqc-test-hdp1:8088/proxy/application_1564035532438_0010/
19/10/30 10:14:34 INFO mapreduce.Job: Running job: job_1564035532438_0010
19/10/30 10:14:44 INFO mapreduce.Job: Job job_1564035532438_0010 running in uber mode : false
19/10/30 10:14:44 INFO mapreduce.Job:  map 0% reduce 0%
19/10/30 10:14:59 INFO mapreduce.Job:  map 100% reduce 0%
19/10/30 10:14:59 INFO mapreduce.Job: Job job_1564035532438_0010 completed successfully
19/10/30 10:14:59 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=158786
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=87
		HDFS: Number of bytes written=717075
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Other local map tasks=1
		Total time spent by all maps in occupied slots (ms)=12530
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=12530
		Total vcore-milliseconds taken by all map tasks=12530
		Total megabyte-milliseconds taken by all map tasks=64153600
	Map-Reduce Framework
		Map input records=5676
		Map output records=5676
		Input split bytes=87
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=99
		CPU time spent (ms)=5950
		Physical memory (bytes) snapshot=362852352
		Virtual memory (bytes) snapshot=6400389120
		Total committed heap usage (bytes)=336068608
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=717075
19/10/30 10:14:59 INFO mapreduce.ImportJobBase: Transferred 700.2686 KB in 29.3386 seconds (23.8685 KB/sec)
19/10/30 10:14:59 INFO mapreduce.ImportJobBase: Retrieved 5676 records.
19/10/30 10:14:59 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners
19/10/30 10:14:59 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
19/10/30 10:15:00 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
19/10/30 10:15:00 WARN hive.TableDefWriter: Column Longitude had to be cast to a less precise type in Hive
19/10/30 10:15:00 WARN hive.TableDefWriter: Column Latitude had to be cast to a less precise type in Hive
19/10/30 10:15:00 INFO hive.HiveImport: Loading uploaded data into Hive

Logging initialized using configuration in jar:file:/usr/hdp/2.5.3.0-37/hive/lib/hive-common-1.2.1000.2.5.3.0-37.jar!/hive-log4j.properties
OK
Time taken: 2.898 seconds
Loading data to table hqc.xx
Table hqc.xx stats: [numFiles=1, numRows=0, totalSize=717075, rawDataSize=0]
OK
Time taken: 0.627 seconds

5.重写，覆盖原数据

加入 --hive-overwrite

[hdfs@hqc-test-hdp1 root]$ sqoop import --connect "jdbc:sqlserver://10.35.xx.xx:1433;DatabaseName=ICP" -username xx -password xx --query "SELECT * from [xx.xx] WHERE \$CONDITIONS" --hive-overwrite --hive-import -hive-database hqc --hive-table xx --target-dir /apps/hive/warehouse/hqc.db/xx -m 1
Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
find: failed to restore initial working directory: 权限不够
19/10/30 16:12:59 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
19/10/30 16:12:59 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/10/30 16:12:59 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
19/10/30 16:12:59 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
19/10/30 16:13:00 INFO manager.SqlManager: Using default fetchSize of 1000
19/10/30 16:13:00 INFO tool.CodeGenTool: Beginning code generation
19/10/30 16:13:00 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
19/10/30 16:13:01 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
19/10/30 16:13:01 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.5.3.0-37/hadoop-mapreduce
错误: 读取/usr/hdp/2.5.3.0-37/sqoop/lib/mysql-connector-java.jar时出错; cannot read zip file
错误: 读取/usr/hdp/2.5.3.0-37/hive/lib/mysql-connector-java.jar时出错; cannot read zip file
注: /tmp/sqoop-hdfs/compile/2dcbcc5ee20eac3b80e2f2109726b44c/QueryResult.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
19/10/30 16:13:03 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/2dcbcc5ee20eac3b80e2f2109726b44c/QueryResult.jar
19/10/30 16:13:03 INFO mapreduce.ImportJobBase: Beginning query import.
19/10/30 16:13:05 INFO impl.TimelineClientImpl: Timeline service address: http://hqc-test-hdp2:8188/ws/v1/timeline/
19/10/30 16:13:05 INFO client.RMProxy: Connecting to ResourceManager at hqc-test-hdp1/10.35:8050
19/10/30 16:13:06 INFO client.AHSProxy: Connecting to Application History server at hqc-test-hdp2/10.35:10200
19/10/30 16:13:08 INFO db.DBInputFormat: Using read commited transaction isolation
19/10/30 16:13:08 INFO mapreduce.JobSubmitter: number of splits:1
19/10/30 16:13:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1564035532438_0023
19/10/30 16:13:09 INFO impl.YarnClientImpl: Submitted application application_1564035532438_0023
19/10/30 16:13:09 INFO mapreduce.Job: The url to track the job: http://hqc-test-hdp1:8088/proxy/application_1564035532438_0023/
19/10/30 16:13:09 INFO mapreduce.Job: Running job: job_1564035532438_0023
19/10/30 16:13:19 INFO mapreduce.Job: Job job_1564035532438_0023 running in uber mode : false
19/10/30 16:13:19 INFO mapreduce.Job:  map 0% reduce 0%
19/10/30 16:13:29 INFO mapreduce.Job:  map 100% reduce 0%
19/10/30 16:13:29 INFO mapreduce.Job: Job job_1564035532438_0023 completed successfully
19/10/30 16:13:29 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=158455
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=87
		HDFS: Number of bytes written=5217236
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Other local map tasks=1
		Total time spent by all maps in occupied slots (ms)=7426
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=7426
		Total vcore-milliseconds taken by all map tasks=7426
		Total megabyte-milliseconds taken by all map tasks=38021120
	Map-Reduce Framework
		Map input records=16993
		Map output records=16993
		Input split bytes=87
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=108
		CPU time spent (ms)=9300
		Physical memory (bytes) snapshot=398262272
		Virtual memory (bytes) snapshot=6417539072
		Total committed heap usage (bytes)=415760384
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=5217236
19/10/30 16:13:29 INFO mapreduce.ImportJobBase: Transferred 4.9755 MB in 24.7647 seconds (205.7342 KB/sec)
19/10/30 16:13:29 INFO mapreduce.ImportJobBase: Retrieved 16993 records.
19/10/30 16:13:29 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners
19/10/30 16:13:30 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
19/10/30 16:13:30 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
19/10/30 16:13:30 WARN hive.TableDefWriter: Column AlarmTimeStamp had to be cast to a less precise type in Hive
19/10/30 16:13:30 WARN hive.TableDefWriter: Column HandlerTime had to be cast to a less precise type in Hive
19/10/30 16:13:30 INFO hive.HiveImport: Loading uploaded data into Hive

Logging initialized using configuration in jar:file:/usr/hdp/2.5.3.0-37/hive/lib/hive-common-1.2.1000.2.5.3.0-37.jar!/hive-log4j.properties
OK
Time taken: 2.809 seconds
Loading data to table hqc.xx
Table hqc.xx stats: [numFiles=1, numRows=0, totalSize=5217236, rawDataSize=0]
OK
Time taken: 1.321 seconds

备注

1.create-hive-table & hive-import 区别
问题：
Can anyone tell the difference between create-hive-table & hive-import method? Both will create a hive table, but still what is the significance of each?
回答:
The difference is that create-hive-table will create table in Hive based on the source table in database but will NOT transfer any data. Command "import --hive-import" will both create table in Hive and import data from the source table.
只创建表:
sqoop create-hive-table --connect "jdbc:sqlserver://192.168.13.1:1433;username=root;password=12345;databasename=test" --table test --hive-table myhive2 --hive-partition-key partition_time --map-column-hive ID=String,name=String,addr=String
全量导入(sqlserver条件用where,oracle用and):
sqoop import --connect "jdbc:sqlserver://192.168.13.1:1433;username=root;password=12345;databasename=test" --query "select * from test i where \$CONDITIONS" --target-dir /user/hive/warehouse/myhive2/partition_time=20171023 --hive-import -m 5 --hive-table myhive2 --split-by ID --hive-partition-key partition_time --hive-partition-value 20171023

2.sqoop help
Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/10/30 08:42:05 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
usage: sqoop COMMAND [ARGS]

Available commands:
  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  import-mainframe   Import datasets from a mainframe server to HDFS
  job                Work with saved jobs
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  merge              Merge results of incremental imports
  metastore          Run a standalone Sqoop metastore
  version            Display version information
  
3.sqoop export 从hdfs/hive等导入关系型数据库，支持全量、增量、更新三种模式
--update-mode  updateonly/allowinsert
  全量导出
HQL示例：insert overwrite  directory ‘/user/root/export/test’ row format delimited fields terminated by ‘,’ STORED AS textfile select F1,F2,F3 from ;
SQOOP脚本：sqoop export --connect jdbc:mysql://localhost:3306/wht --username root --password cloudera --table  --fields-terminated-by ','  --columns F1,F2,F3 --export-dir /user/root/export/test
 
--update-mode  allowinsert 更新且增加
HQL示例：insert overwrite  directory ‘/user/root/export/test’ row format delimited fields terminated by ‘,’ STORED AS textfile select F1,F2,F3 from  where ;
SQOOP脚本：sqoop export --connect jdbc:mysql://localhost:3306/wht --username root --password cloudera --table  --fields-terminated-by ‘,’  --columns F1,F2,F3 --update-key F4 --update-mode  allowinsert --export-dir /user/root/export/test
 
 
--update-mode  updateonly 只更新
HQL示例：insert overwrite  directory ‘/user/root/export/test’ row format delimited fields terminated by ‘,’ STORED AS textfile select F1,F2,F3 from  where ;
SQOOP脚本：sqoop export --connect jdbc:mysql://localhost:3306/wht --username root --password cloudera --table  --fields-terminated-by ‘,’  --columns F1,F2,F3 --update-key F4 --update-mode  updateonly --export-dir /user/root/export/test

usage: sqoop export [GENERIC-ARGS] [TOOL-ARGS]
Common arguments:
   --connect                          Specify JDBC connect
                                                string
   --connection-manager             Specify connection manager
                                                class name
   --connection-param-file     Specify connection
                                                parameters file
   --driver                         Manually specify JDBC
                                                driver class to use
   --hadoop-home                          Override
                                                $HADOOP_MAPRED_HOME_ARG
   --hadoop-mapred-home                    Override
                                                $HADOOP_MAPRED_HOME_ARG
   --help                                       Print usage instructions
-P                                              Read password from console
   --password                         Set authentication
                                                password
   --password-alias             Credential provider
                                                password alias
   --password-file               Set authentication
                                                password file path
   --relaxed-isolation                          Use read-uncommitted
                                                isolation for imports
   --skip-dist-cache                            Skip copying jars to
                                                distributed cache
   --temporary-rootdir                 Defines the temporary root
                                                directory for the import
   --username                         Set authentication
                                                username
   --verbose                                    Print more information
                                                while working

Export control arguments:
   --batch                                                    Indicates
                                                              underlying
                                                              statements
                                                              to be
                                                              executed in
                                                              batch mode
   --call                                                Populate the
                                                              table using
                                                              this stored
                                                              procedure
                                                              (one call
                                                              per row)
   --clear-staging-table                                      Indicates
                                                              that any
                                                              data in
                                                              staging
                                                              table can be
                                                              deleted
   --columns                                  Columns to
                                                              export to
                                                              table
   --direct                                                   Use direct
                                                              export fast
                                                              path
   --export-dir                                          HDFS source
                                                              path for the
                                                              export
-m,--num-mappers                                           Use 'n' map
                                                              tasks to
                                                              export in
                                                              parallel
   --mapreduce-job-name                                 Set name for
                                                              generated
                                                              mapreduce
                                                              job
   --staging-table                                Intermediate
                                                              staging
                                                              table
   --table                                        Table to
                                                              populate
   --update-key                                          Update
                                                              records by
                                                              specified
                                                              key column
   --update-mode                                        Specifies
                                                              how updates
                                                              are
                                                              performed
                                                              when new
                                                              rows are
                                                              found with
                                                              non-matching
                                                              keys in
                                                              database
   --validate                                                 Validate the
                                                              copy using
                                                              the
                                                              configured
                                                              validator
   --validation-failurehandler     Fully
                                                              qualified
                                                              class name
                                                              for
                                                              ValidationFa
                                                              ilureHandler
   --validation-threshold               Fully
                                                              qualified
                                                              class name
                                                              for
                                                              ValidationTh
                                                              reshold
   --validator                                     Fully
                                                              qualified
                                                              class name
                                                              for the
                                                              Validator

Input parsing arguments:
   --input-enclosed-by                Sets a required field encloser
   --input-escaped-by                 Sets the input escape
                                            character
   --input-fields-terminated-by       Sets the input field separator
   --input-lines-terminated-by        Sets the input end-of-line
                                            char
   --input-optionally-enclosed-by     Sets a field enclosing
                                            character

Output line formatting arguments:
   --enclosed-by                Sets a required field enclosing
                                      character
   --escaped-by                 Sets the escape character
   --fields-terminated-by       Sets the field separator character
   --lines-terminated-by        Sets the end-of-line character
   --mysql-delimiters                 Uses MySQL's default delimiter set:
                                      fields: ,  lines: \n  escaped-by: \
                                      optionally-enclosed-by: '
   --optionally-enclosed-by     Sets a field enclosing character

Code generation arguments:
   --bindir                         Output directory for compiled
                                         objects
   --class-name                    Sets the generated class name.
                                         This overrides --package-name.
                                         When combined with --jar-file,
                                         sets the input class.
   --input-null-non-string     Input null non-string
                                         representation
   --input-null-string         Input null string representation
   --jar-file                      Disable code generation; use
                                         specified jar
   --map-column-java                Override mapping for specific
                                         columns to java types
   --null-non-string           Null non-string representation
   --null-string               Null string representation
   --outdir                         Output directory for generated
                                         code
   --package-name                  Put auto-generated classes in
                                         this package

HCatalog arguments:
   --hcatalog-database                         HCatalog database name
   --hcatalog-home                            Override $HCAT_HOME
   --hcatalog-partition-keys         Sets the partition
                                                    keys to use when
                                                    importing to hive
   --hcatalog-partition-values     Sets the partition
                                                    values to use when
                                                    importing to hive
   --hcatalog-table                            HCatalog table name
   --hive-home                                 Override $HIVE_HOME
   --hive-partition-key              Sets the partition key
                                                    to use when importing
                                                    to hive
   --hive-partition-value          Sets the partition
                                                    value to use when
                                                    importing to hive
   --map-column-hive                           Override mapping for
                                                    specific column to
                                                    hive types.

Generic Hadoop command-line arguments:
(must preceed any tool-specific arguments)
Generic options supported are
-conf      specify an application configuration file
-D             use value for given property
-fs       specify a namenode
-jt     specify a ResourceManager
-files     specify comma separated files to be copied to the map reduce cluster
-libjars     specify comma separated jar files to include in the classpath.
-archives     specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

At minimum, you must specify --connect, --export-dir, and --table

4.sqoop import help
usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS]
Common arguments:
   --connect                          Specify JDBC connect
                                                string
   --connection-manager             Specify connection manager
                                                class name
   --connection-param-file     Specify connection
                                                parameters file
   --driver                         Manually specify JDBC
                                                driver class to use
   --hadoop-home                          Override
                                                $HADOOP_MAPRED_HOME_ARG
   --hadoop-mapred-home                    Override
                                                $HADOOP_MAPRED_HOME_ARG
   --help                                       Print usage instructions
-P                                              Read password from console
   --password                         Set authentication
                                                password
   --password-alias             Credential provider
                                                password alias
   --password-file               Set authentication
                                                password file path
   --relaxed-isolation                          Use read-uncommitted
                                                isolation for imports
   --skip-dist-cache                            Skip copying jars to
                                                distributed cache
   --temporary-rootdir                 Defines the temporary root
                                                directory for the import
   --username                         Set authentication
                                                username
   --verbose                                    Print more information
                                                while working

Import control arguments:
   --append                                                   Imports data
                                                              in append
                                                              mode
   --as-avrodatafile                                          Imports data
                                                              to Avro data
                                                              files
   --as-parquetfile                                           Imports data
                                                              to Parquet
                                                              files
   --as-sequencefile                                          Imports data
                                                              to
                                                              SequenceFile
                                                              s
   --as-textfile                                              Imports data
                                                              as plain
                                                              text
                                                              (default)
   --autoreset-to-one-mapper                                  Reset the
                                                              number of
                                                              mappers to
                                                              one mapper
                                                              if no split
                                                              key
                                                              available
   --boundary-query                                Set boundary
                                                              query for
                                                              retrieving
                                                              max and min
                                                              value of the
                                                              primary key
   --columns                                  Columns to
                                                              import from
                                                              table
   --compression-codec                                 Compression
                                                              codec to use
                                                              for import
   --delete-target-dir                                        Imports data
                                                              in delete
                                                              mode
   --direct                                                   Use direct
                                                              import fast
                                                              path
   --direct-split-size                                     Split the
                                                              input stream
                                                              every 'n'
                                                              bytes when
                                                              importing in
                                                              direct mode
-e,--query                                         Import
                                                              results of
                                                              SQL
                                                              'statement'
   --fetch-size                                            Set number
                                                              'n' of rows
                                                              to fetch
                                                              from the
                                                              database
                                                              when more
                                                              rows are
                                                              needed
   --inline-lob-limit                                      Set the
                                                              maximum size
                                                              for an
                                                              inline LOB
-m,--num-mappers                                           Use 'n' map
                                                              tasks to
                                                              import in
                                                              parallel
   --mapreduce-job-name                                 Set name for
                                                              generated
                                                              mapreduce
                                                              job
   --merge-key                                        Key column
                                                              to use to
                                                              join results
   --split-by                                    Column of
                                                              the table
                                                              used to
                                                              split work
                                                              units
   --split-limit                                        Upper Limit
                                                              of rows per
                                                              split for
                                                              split
                                                              columns of
                                                              Date/Time/Ti
                                                              mestamp and
                                                              integer
                                                              types. For
                                                              date or
                                                              timestamp
                                                              fields it is
                                                              calculated
                                                              in seconds.
                                                              split-limit
                                                              should be
                                                              greater than
                                                              0
   --table                                        Table to
                                                              read
   --target-dir                                          HDFS plain
                                                              table
                                                              destination
   --validate                                                 Validate the
                                                              copy using
                                                              the
                                                              configured
                                                              validator
   --validation-failurehandler     Fully
                                                              qualified
                                                              class name
                                                              for
                                                              ValidationFa
                                                              ilureHandler
   --validation-threshold               Fully
                                                              qualified
                                                              class name
                                                              for
                                                              ValidationTh
                                                              reshold
   --validator                                     Fully
                                                              qualified
                                                              class name
                                                              for the
                                                              Validator
   --warehouse-dir                                       HDFS parent
                                                              for table
                                                              destination
   --where                                      WHERE clause
                                                              to use
                                                              during
                                                              import
-z,--compress                                                 Enable
                                                              compression

Incremental import arguments:
   --check-column         Source column to check for incremental
                                  change
   --incremental     Define an incremental import of type
                                  'append' or 'lastmodified'
   --last-value            Last imported value in the incremental
                                  check column

Output line formatting arguments:
   --enclosed-by                Sets a required field enclosing
                                      character
   --escaped-by                 Sets the escape character
   --fields-terminated-by       Sets the field separator character
   --lines-terminated-by        Sets the end-of-line character
   --mysql-delimiters                 Uses MySQL's default delimiter set:
                                      fields: ,  lines: \n  escaped-by: \
                                      optionally-enclosed-by: '
   --optionally-enclosed-by     Sets a field enclosing character

Input parsing arguments:
   --input-enclosed-by                Sets a required field encloser
   --input-escaped-by                 Sets the input escape
                                            character
   --input-fields-terminated-by       Sets the input field separator
   --input-lines-terminated-by        Sets the input end-of-line
                                            char
   --input-optionally-enclosed-by     Sets a field enclosing
                                            character

Hive arguments:
   --create-hive-table                         Fail if the target hive
                                               table exists
   --hive-compute-stats                        Overwrite existing data in
                                               the Hive table
   --hive-database              Sets the database name to
                                               use when importing to hive
   --hive-delims-replacement              Replace Hive record \0x01
                                               and row delimiters (\n\r)
                                               from imported string fields
                                               with user-defined string
   --hive-drop-import-delims                   Drop Hive record \0x01 and
                                               row delimiters (\n\r) from
                                               imported string fields
   --hive-home                            Override $HIVE_HOME
   --hive-import                               Import tables into Hive
                                               (Uses Hive's default
                                               delimiters if none are
                                               set.)
   --hive-overwrite                            Overwrite existing data in
                                               the Hive table
   --hive-partition-key         Sets the partition key to
                                               use when importing to hive
   --hive-partition-value     Sets the partition value to
                                               use when importing to hive
   --hive-table                    Sets the table name to use
                                               when importing to hive
   --map-column-hive                      Override mapping for
                                               specific column to hive
                                               types.

HBase arguments:
   --column-family     Sets the target column family for the
                               import
   --hbase-bulkload            Enables HBase bulk loading
   --hbase-create-table        If specified, create missing HBase tables
   --hbase-row-key        Specifies which input column to use as the
                               row key
   --hbase-table        Import to  in HBase

HCatalog arguments:
   --hcatalog-database                         HCatalog database name
   --hcatalog-home                            Override $HCAT_HOME
   --hcatalog-partition-keys         Sets the partition
                                                    keys to use when
                                                    importing to hive
   --hcatalog-partition-values     Sets the partition
                                                    values to use when
                                                    importing to hive
   --hcatalog-table                            HCatalog table name
   --hive-home                                 Override $HIVE_HOME
   --hive-partition-key              Sets the partition key
                                                    to use when importing
                                                    to hive
   --hive-partition-value          Sets the partition
                                                    value to use when
                                                    importing to hive
   --map-column-hive                           Override mapping for
                                                    specific column to
                                                    hive types.

HCatalog import specific options:
   --create-hcatalog-table             Create HCatalog before import
   --drop-and-create-hcatalog-table    Drop and Create HCatalog before
                                       import
   --hcatalog-storage-stanza      HCatalog storage stanza for table
                                       creation

Accumulo arguments:
   --accumulo-batch-size           Batch size in bytes
   --accumulo-column-family      Sets the target column family for
                                         the import
   --accumulo-create-table               If specified, create missing
                                         Accumulo tables
   --accumulo-instance         Accumulo instance name.
   --accumulo-max-latency       Max write latency in milliseconds
   --accumulo-password         Accumulo password.
   --accumulo-row-key Specifies which input column to
                                         use as the row key
   --accumulo-table               
              Import to  in Accumulo
   --accumulo-user                 Accumulo user name.
   --accumulo-visibility            Visibility token to be applied to
                                         all rows imported
   --accumulo-zookeepers     Comma-separated list of
                                         zookeepers (host:port)

Code generation arguments:
   --bindir                         Output directory for compiled
                                         objects
   --class-name                    Sets the generated class name.
                                         This overrides --package-name.
                                         When combined with --jar-file,
                                         sets the input class.
   --input-null-non-string     Input null non-string
                                         representation
   --input-null-string         Input null string representation
   --jar-file                      Disable code generation; use
                                         specified jar
   --map-column-java                Override mapping for specific
                                         columns to java types
   --null-non-string           Null non-string representation
   --null-string               Null string representation
   --outdir                         Output directory for generated
                                         code
   --package-name                  Put auto-generated classes in
                                         this package

Generic Hadoop command-line arguments:
(must preceed any tool-specific arguments)
Generic options supported are
-conf      specify an application configuration file
-D             use value for given property
-fs       specify a namenode
-jt     specify a ResourceManager
-files     specify comma separated files to be copied to the map reduce cluster
-libjars     specify comma separated jar files to include in the classpath.
-archives     specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]


At minimum, you must specify --connect and --table
Arguments to mysqldump and other subprograms may be supplied
after a '--' on the command line.
 
  

                            
                        
                    
                    
                    
                    
                    
                    
                
                
                    
                        
                        
                             
                        
                        
                        
                            
                        
                        
                        
                            
                        
                    
                
            
        
    
    
        你可能感兴趣的:(大数据动物园,Hive)
        
            
                
                    nosql数据库技术与应用知识点
                        皆过客，揽星河
NoSQLnosql数据库大数据数据分析数据结构非关系型数据库
                        Nosql知识回顾大数据处理流程数据采集(flume、爬虫、传感器)数据存储(本门课程NoSQL所处的阶段)Hdfs、MongoDB、HBase等数据清洗(入仓)Hive等数据处理、分析(Spark、Flink等)数据可视化数据挖掘、机器学习应用(Python、SparkMLlib等)大数据时代存储的挑战(三高)高并发(同一时间很多人访问)高扩展(要求随时根据需求扩展存储)高效率(要求读写速度快)
                    
                    浅谈MapReduce
                        Android路上的人
Hadoop分布式计算mapreduce分布式框架hadoop
                        从今天开始，本人将会开始对另一项技术的学习，就是当下炙手可热的Hadoop分布式就算技术。目前国内外的诸多公司因为业务发展的需要，都纷纷用了此平台。国内的比如BAT啦，国外的在这方面走的更加的前面，就不一一列举了。但是Hadoop作为Apache的一个开源项目，在下面有非常多的子项目，比如HDFS，HBase,Hive，Pig,等等，要先彻底学习整个Hadoop，仅仅凭借一个的力量，是远远不够的。
                    
                    Presto【基础 01】简介+架构+数据源+数据模型
                        2401_84254343
程序员架构
                        一个Catalog包含Schema和Connector。例如，配置JMX的Catalog，通过JXMConnector访问JXM信息。当执行一条SQL语句时，可以同时运行在多个Catalog。Presto处理table时，是通过表的完全限定（fully-qualified）名来找到Catalog。例如，一个表的权限定名是hive.test_data.test，则test是表名，test_data是
                    
                    大数据毕业设计hadoop+spark+hive知识图谱租房数据分析可视化大屏 租房推荐系统 58同城租房爬虫 房源推荐系统 房价预测系统 计算机毕业设计 机器学习 深度学习 人工智能
                        2401_84572577
程序员大数据hadoop人工智能
                        做了那么多年开发，自学了很多门编程语言，我很明白学习资源对于学一门新语言的重要性，这些年也收藏了不少的Python干货，对我来说这些东西确实已经用不到了，但对于准备自学Python的人来说，或许它就是一个宝藏，可以给你省去很多的时间和精力。别在网上瞎学了，我最近也做了一些资源的更新，只要你是我的粉丝，这期福利你都可拿走。我先来介绍一下这些东西怎么用，文末抱走。（1）Python所有方向的学习路线（
                    
                    大数据之flink与hive
                        星辰_mya
大数据flinkhive
                        其实吧我不太想写flink，因为线上经验确实不多，这也是我需要补的地方，没有条件创造条件，先来一篇吧flink：高性能低延迟流批一体的分布式计算框架基于事件时间对实时数据精准处理快速响应支持批处理，高效离线分析和数据挖掘数据仓库的引擎丰富数据源/接收器，集成多种数据存储格式和源，比较常见就是咱们今天的主题hive了checkpoint恢复机制，故障恢复快速恢复计算任务分布式弹性扩展，据业务灵活增加
                    
                    hive血缘关系之输入表与目标表的解析
                        zxfBdd
hive大数据治理大数据
                        接了一个新需求：需要做数据仓库的血缘关系。正所谓兵来将挡水来土掩，那咱就动手吧。血缘关系是数据治理的一块，其实有专门的第三方数据治理框架，但考虑到目前的线上环境已经趋于稳定，引入新的框架无疑是劳民伤财，伤筋动骨，所以就想以最小的代价把这个事情给做了。目前我们考虑做的血缘关系呢只是做输入表和输出表，最后会形成一张表与表之间的链路图。这个东西的好处就是有助于仓库人员梳理业务，后面可能还会做字段之间的血
                    
                    初级练习[3]:Hive SQL子查询应用
                        大数据深度洞察
Hivehivesqlhadoop数据仓库大数据数据库
                        目录环境准备看如下链接子查询查询所有课程成绩均小于60分的学生的学号、姓名查询没有学全所有课的学生的学号、姓名解释：没有学全所有课，也就是该学生选修的课程数<总的课程数。查询出只选修了三门课程的全部学生的学号和姓名环境准备看如下链接环境准备https://blog.csdn.net/qq_45115959/article/details/142057624?spm=1001.2014.3001.5
                    
                    Linux下载压缩包：tar.gz、zip、tar.bz2格式全攻略
                        promise524
Linuxlinux运维服务器后端bashshell
                        在Linux中，下载各种格式的压缩包（如.tar.gz、.zip、.tar.bz2等）通常使用命令行工具如wget和curl。1.使用wget下载压缩包wget是Linux中最常用的文件下载工具，支持HTTP、HTTPS、FTP等协议，可以直接从命令行下载文件。基本命令：wget[URL]下载.tar.gz文件wgethttps://test.com/archive.tar.gz此命令将从指定的U
                    
                    Anaconda版本和Python版本对应关系
                        纬领网络
pythonanaconda3
                        官网下载地址：https://repo.anaconda.com/archive/下载地址：https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/anaconda3版本基础python版本Anaconda3-2024.06-1Python3.12.4Anaconda3-2024.02-1Python3.11.7Anaconda3-2023.09
                    
                    R语言包AMORE安装报错问题以及RStudio与Rtools环境配置
                        卡卡_R-Python
R语言数据分析与可视化r语言开发语言
                        在使用R语言进行AMORE安装时会遇到报错，这时候需要采用解决办法：'''AMORE包安装，需要离线官网下载安装包：Indexof/src/contrib/Archive/AMORE(r-project.org)https://cran.r-project.org/src/contrib/Archive/AMORE/一、出现的问题最近开始学习R语言，安装了最新版的R4.4.1和RStudio，但安
                    
                    中级练习[3]：Hive SQL用户行为与商品销售数据分析
                        大数据深度洞察
Hivehive数据仓库大数据sql
                        目录1.用户累计消费金额及VIP等级查询1.1题目需求1.2代码实现2.首次下单后第二天连续下单的用户比率查询2.1题目需求2.2代码实现3.每个商品销售首年的年份、销售数量和销售金额统计3.1题目需求3.2代码实现1.用户累计消费金额及VIP等级查询1.1题目需求从订单信息表(order_info)中统计每个用户截止其每个下单日期的累积消费金额，以及每个用户在其每个下单日期的VIP等级。VIP等
                    
                    Python基础知识进阶之正则表达式_头歌python正则表达式进阶
                        前端陈萨龙
程序员python学习面试
                        最后硬核资料：关注即可领取PPT模板、简历模板、行业经典书籍PDF。技术互助：技术群大佬指点迷津，你的问题可能不是问题，求资源在群里喊一声。面试题库：由技术群里的小伙伴们共同投稿，热乎的大厂面试真题，持续更新中。知识体系：含编程语言、算法、大数据生态圈组件（Mysql、Hive、Spark、Flink）、数据仓库、Python、前端等等。网上学习资料一大堆，但如果学到的知识不成体系，遇到问题时只是
                    
                    编程常用命令总结
                        Yellow0523
LinuxBigData大数据
                        编程命令大全1.软件环境变量的配置JavaScalaSparkHadoopHive2.大数据软件常用命令Spark基本命令Spark-SQL命令Hive命令HDFS命令YARN命令Zookeeper命令kafka命令Hibench命令MySQL命令3.Linux常用命令Git命令conda命令pip命令查看Linux系统的详细信息查看Linux系统架构(X86还是ARM，两种方法都可)端口号命令L
                    
                    博客园怎么了？
                        YYH1992

                        新年好，给大家拜个早年！今年来到安徽过年，无聊中，不知不觉中又来到博客园了（忠实粉丝哦），却发现一件奇怪的事情，请看截图难道博客园被挂马了？抑或其它问题？如果真有问题，还请dudu抓紧时间修正，免得影响我们园子的声誉！我要下线了，出去买回家的车票了，只能年后回家了。。。转载于:https://www.cnblogs.com/HollisYao/archive/2008/02/06/1065351.
                    
                    linux下文件的复制、移动与删除
                        搬砖中年人

                        一、文件复制命令cp命令格式：cp[-adfilprsu]源文件(source)目标文件(destination)cp[option]source1source2source3...directory参数说明：-a:是指archive的意思，也说是指复制所有的目录-d:若源文件为连接文件(linkfile)，则复制连接文件属性而非文件本身-f:强制(force)，若有重复或其它疑问时，不会询问用户
                    
                    2024年最全使用Python求解方程_python解方程(1)，字节面试官迟到
                        2401_84569545
程序员python学习面试
                        最后硬核资料：关注即可领取PPT模板、简历模板、行业经典书籍PDF。技术互助：技术群大佬指点迷津，你的问题可能不是问题，求资源在群里喊一声。面试题库：由技术群里的小伙伴们共同投稿，热乎的大厂面试真题，持续更新中。知识体系：含编程语言、算法、大数据生态圈组件（Mysql、Hive、Spark、Flink）、数据仓库、Python、前端等等。网上学习资料一大堆，但如果学到的知识不成体系，遇到问题时只是
                    
                    兼容 Trino Connector，扩展 Apache Doris 数据源接入能力｜Lakehouse 使用手册
                        vvvae1234
apache
                        ApacheDoris内置支持包括Hive、Iceberg、Hudi、Paimon、LakeSoul、JDBC在内的多种Catalog，并为其提供原生高性能且稳定的访问能力，以满足与数据湖的集成需求。而随着ApacheDoris用户的增加，新的数据源连接需求也随之增加。因此，从3.0版本开始，ApacheDoris引入了TrinoConnector兼容框架。Trino/Presto作为业界较早应用
                    
                    SAP HANA
                        makaitai
BWsap数据库工具报表layer服务器
                        原文地址：http://LiuAlex.com/archives/1776也是刚刚开始学习HANA的一些知识，一边看书一遍做笔记，说到底无非是用自己的语言来理解标准帮组文档所讲解的意思，肯定有理解失误的地方，毕竟没有参加过标准培训，即使有培训，从老师那边来的知识也不可能是完整的传授过来，中间多少的知识遗漏是正常的，所以多看看HELP的文档，应该可以原汁原味的理解作者的意思。这张图片是从SAPHAN
                    
                    Hive SQL查询汇总分析
                        大数据深度洞察
Hivehivesqlhadoop数据仓库数据库大数据
                        目录SQL查询汇总分析成绩查询查询编号为“02”的课程的总成绩查询参加考试的学生个数分组查询查询各科成绩最高和最低的分查询每门课程有多少学生参加了考试（有考试成绩）查询男生、女生人数分组结果的条件查询平均成绩大于60分的学生的学号和平均成绩查询至少选修四门课程的学生学号查询同姓（假设每个学生姓名的第一个字为姓）的学生名单并统计同姓人数大于2的姓查询每门课程的平均成绩，结果按平均成绩升序排序，平均成
                    
                    RMAN-08137 rman delete archivelog force
                        jnrjian
数据库oracle
                        deleteforcearchiveloguntiltime'trunc(sysdate-4)'backedup1timestodevicetypedisk;SymptomsDatabaseAClonedtoDatabaseBonCloneserver.GoldenGateisConfiguredonSourcedatbaseA.DatabaseBwhichisclonedfromSourcedo
                    
                    hive表格统计信息不准确
                        weixin_41956627
hivehivehadoop数据仓库
                        问题描述有个hive分区表，orc存储格式，有个分区，查询selectcount(1)fromtablewheredt='yyyyMMdd'结果是0，但查询select*fromtablewheredt='yyyyMMdd'又能查到数据，去hdfs对应目录下查看，也能看到有数据文件解决执行如下sqlANALYZETABLEdb.table1PARTITION(dt='20240908')COMPU
                    
                    Conda创建环境失败：000和404错误
                        柚柚柚柚柚
conda
                        一、首先下载Anaconda1.打开网址Indexof/anaconda/archive/|清华大学开源软件镜像站|TsinghuaOpenSourceMirror，滑到最底部，下载Anaconda3-5.3.1-Linux-x86_64.sh。2.使用winscp拖动本地的Anaconda3-5.3.1-Linux-x86_64.sh到服务器的个人工作目录下。二、安装Anaconda软件，创建虚
                    
                    C#中两个问号的含义
                        weixin_30363981
测试
                        stringstrParam=Request.Params["param"]??"";取??左边的值,如果??左边的值为null则取右边的值转载于:https://www.cnblogs.com/shadowtale/archive/2012/10/19/2731152.html
                    
                    如何下载各个版本的tomcat-比如tomcat9
                        耳边轻语999
tomcatjava
                        1，找到tomcat官网https://tomcat.apache.org/ApacheTomcat®-Welcome!找到tomcat9，或者archives1.1，找到对应版本1.2，找到小版本1.3，找到bin2，Indexof/dist/tomcat/tomcat-9/v9.0.39/bin2.1，下载对应的解压版本或者安装版本
                    
                    Percona-toolkit工具详解
                        小一_d28d

                        1.pt工具安装[root@master~]#yuminstall-ypercona-toolkit-3.1.0-2.el7.x86_64.rpm2.常用工具使用介绍2.1pt-archiver归档表#重要参数--limit100每次取100行数据用pt-archive处理--txn-size100设置100行为一个事务提交一次，--where'id>/root/db/checksum.logpt
                    
                    Ubuntu更换apt-get的下载源
                        愤愤的有痣青年

                        将以下内容替换/etc/apt/sources.list中的内容deb-srchttp://archive.ubuntu.com/ubuntuxenialmainrestricted#Addedbysoftware-propertiesdebhttp://mirrors.aliyun.com/ubuntu/xenialmainrestricteddeb-srchttp://mirrors.aliy
                    
                    apt 下载指定架构的包及离线安装的方法
                        错误重复学习记录
linux
                        #设置系统架构sudodpkg--add-architectureamd64#安装apt-rdependssudoaptinstallapt-rdepends#创建单独的目录mkdir-p/home/apt/postgresql-client-common#仅下载安装包sudoapt-getinstall--download-onlysudomv/var/cache/apt/archives/*/
                    
                    游戏运营环节的一些关键转化率
                        turtle081025
数据分析游戏网络游戏运营
                        转载于http://www.gamedatas.com/archives/134转化率这个指标在各行各业的数据分析中运用的非常之广泛，例如：电商中就会存在，点击到订单生成的一系列转化率，传统的销售行业也会在做广告的时候考虑该广告能够转化多少订单，而在游戏行业，转化率同样是一个不容忽视的指标。一般来说，游戏运营的过程中主要会关注到这些转化率：1.下载-安装（激活）转化率；2.安装（激活）-注册转化率
                    
                    Python API操作RocketMQ
                        京城小筑
#Python编程python
                        背景：开发背景:公司相关报表需求需要将订单业务数据同步至RocketMQ中，由于需要保证开发的一致性(多个部门协同开发)，所以采用读取Hive离线数据的方式通过PythonAPI写入RocketMQ中，便于其他开发同事调用~开发环境:本地调试系统MacPython3.7.5rocketmq0.4.4(Python模块)rocketmq-client-python2.0.0(Python模块)服务器
                    
                    hive搭建 -----内嵌模式和本地模式
                        lzhlizihang
hivehadoop
                        文章目录一、内嵌模式（使用较少）1、上传、解压、重命名2、配置环境变量3、配置conf下的hive-env.sh4、修改conf下的hive-site.xml5、启动hadoop集群6、给hdfs创建文件夹7、修改hive-site.xml中的非法字符8、初始化元数据9、测试是否成功10、内嵌模式的缺点二、本地模式（最常用）1、检查mysql是否正常2、上传、解压、重命名3、配置环境变量4、修改c
                    
                                用MiddleGenIDE工具生成hibernate的POJO（根据数据表生成POJO类）
                                    AdyZhang
POJOeclipseHibernateMiddleGenIDE
                                    推荐:MiddlegenIDE插件,   是一个Eclipse   插件.     用它可以直接连接到数据库,   根据表按照一定的HIBERNATE规则作出BEAN和对应的XML ，用完后你可以手动删除它加载的JAR包和XML文件!     今天开始试着使用
                                
                                .9.png
                                    Cb123456
android
                                      “点九”是andriod平台的应用软件开发里的一种特殊的图片形式，文件扩展名为：.9.png 
　　智能手机中有自动横屏的功能,同一幅界面会在随着手机(或平板电脑)中的方向传感器的参数不同而改变显示的方向,在界面改变方向后,界面上的图形会因为长宽的变化而产生拉伸,造成图形的失真变形。 
　　我们都知道android平台有多种不同的分辨率，很多控件的切图文件在被放大拉伸后，边
                                
                                算法的效率
                                    天子之骄
算法效率复杂度最坏情况运行时间大O阶平均情况运行时间
                                    算法的效率 
效率是速度和空间消耗的度量。集中考虑程序的速度，也称运行时间或执行时间，用复杂度的阶(O)这一标准来衡量。空间的消耗或需求也可以用大O表示，而且它总是小于或等于时间需求。 
  
以下是我的学习笔记： 
  
1.求值与霍纳法则，即为秦九韶公式。 
  
2.测定运行时间的最可靠方法是计数对运行时间有贡献的基本操作的执行次数。运行时间与这个计数成正比。 
                                
                                java数据结构
                                    何必如此
java数据结构
                                    Java 数据结构 
 
Java工具包提供了强大的数据结构。在Java中的数据结构主要包括以下几种接口和类： 
 
枚举（Enumeration） 
位集合（BitSet） 
向量（Vector） 
栈（Stack） 
字典（Dictionary） 
哈希表（Hashtable） 
属性（Properties） 
以上这些类是传统遗留的，在Java2中引入了一种新的框架-集合框架(Collect
                                
                                MybatisHelloWorld
                                    3213213333332132

                                    
//测试入口TestMyBatis   
package com.base.helloworld.test;

import java.io.IOException;

import org.apache.ibatis.io.Resources;
import org.apache.ibatis.session.SqlSession;
import org.apache.ibat
                                
                                Java|urlrewrite|URL重写|多个参数
                                    7454103
javaxmlWeb工作
                                     个人工作经验！ 如有不当之处，敬请指点    
1.0  web -info 目录下建立     urlrewrite.xml  文件 类似如下： 
<?xml version="1.0" encoding="UTF-8" ?>  
  <!DOCTYPE u
                                
                                达梦数据库+ibatis
                                    darkranger
sqlmysqlibatisSQL Server
                                    --插入数据方面 
 
如果您需要数据库自增... 
那么在插入的时候不需要指定自增列. 
 
如果想自己指定ID列的值, 那么要设置 
set identity_insert  数据库名.模式名.表名; 
----然后插入数据; 
example: 
create table zhabei.test( 
id bigint identity(1,1) primary key, 
nam
                                
                                XML 解析 四种方式
                                    aijuans
android
                                    XML现在已经成为一种通用的数据交换格式,平台的无关性使得很多场合都需要用到XML。本文将详细介绍用Java解析XML的四种方法。   
 
  
 
    
 
XML现在已经成为一种通用的数据交换格式,它的平台无关性,语言无关性,系统无关性,给数据集成与交互带来了极大的方便。对于XML本身的语法知识与技术细节,需要阅读相关的技术文献,这里面包括的内容有DOM(Document Object 
                                
                                spring中配置文件占位符的使用
                                    avords

                                    1.类 
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN//EN" "http://www.springframework.o
                                
                                前端工程化-公共模块的依赖和常用的工作流
                                    bee1314
webpack
                                    题记： 一个人的项目，还有工程化的问题嘛？       我们在推进模块化和组件化的过程中，肯定会不断的沉淀出我们项目的模块和组件。对于这些沉淀出的模块和组件怎么管理？另外怎么依赖也是个问题？   你真的想这样嘛？       var BreadCrumb = require(‘../../../../uikit/breadcrumb’); //真心ugly。     
                                
                                上司说「看你每天准时下班就知道你工作量不饱和」，该如何回应？
                                    bijian1013
项目管理沟通IT职业规划
                                    问题：上司说「看你每天准时下班就知道你工作量不饱和」，如何回应 
        正常下班时间6点，只要是6点半前下班的，上司都认为没有加班。 
  
Eno-Bea回答，注重感受，不一定是别人的 
        虽然我不知道你具体从事什么工作与职业，但是我大概猜测，你是从事一项不太容易出现阶段性成果的工作
                                
                                TortoiseSVN，过滤文件
                                    征客丶
SVN
                                    环境： 
TortoiseSVN 1.8 
 
配置： 
在文件夹空白处右键 
选择  TortoiseSVN -> Settings 
在 Global ignote pattern 中添加要过滤的文件： 
多类型用英文空格分开 
*name ： 过滤所有名称为 name 的文件或文件夹 
*.name ： 过滤所有后缀为 name 的文件或文件夹 
 
 
 
--------
                                
                                【Flume二】HDFS sink细说
                                    bit1129
Flume
                                    1. Flume配置 
  
a1.sources=r1  
a1.channels=c1  
a1.sinks=k1  

###Flume负责启动44444端口
  
a1.sources.r1.type=avro  
a1.sources.r1.bind=0.0.0.0  
a1.sources.r1.port=44444  
a1.sources.r1.chan
                                
                                The Eight Myths of Erlang Performance
                                    bookjovi
erlang
                                    erlang有一篇guide很有意思： http://www.erlang.org/doc/efficiency_guide 
里面有个The Eight Myths of Erlang Performance： http://www.erlang.org/doc/efficiency_guide/myths.html 
  
  
Myth: Funs are sl
                                
                                java多线程网络传输文件(非同步)-2008-08-17
                                    ljy325
java多线程socket
                                    利用 Socket 套接字进行面向连接通信的编程。客户端读取本地文件并发送；服务器接收文件并保存到本地文件系统中。 
使用说明:请将TransferClient, TransferServer, TempFile三个类编译，他们的类包是FileServer. 
客户端: 
修改TransferClient: serPort, serIP, filePath, blockNum,的值来符合您机器的系
                                
                                读《研磨设计模式》-代码笔记-模板方法模式
                                    bylijinnan
java设计模式
                                    声明： 本文只为方便我个人查阅和理解，详细的分析以及源代码请移步 原作者的博客http://chjavach.iteye.com/ 
 
 



import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
                                
                                配置心得
                                    chenyu19891124
配置
                                    时间就这样不知不觉的走过了一个春夏秋冬，转眼间来公司已经一年了，感觉时间过的很快，时间老人总是这样不停走，从来没停歇过。 
作为一名新手的配置管理员，刚开始真的是对配置管理是一点不懂，就只听说咱们公司配置主要是负责升级，而具体该怎么做却一点都不了解。经过老员工的一点点讲解，慢慢的对配置有了初步了解，对自己所在的岗位也慢慢的了解。 
做了一年的配置管理给自总结下： 
1.改变 
从一个以前对配置毫无
                                
                                对“带条件选择的并行汇聚路由问题”的再思考
                                    comsci
算法工作软件测试嵌入式领域模型
                                    2008年上半年，我在设计并开发基于”JWFD流程系统“的商业化改进型引擎的时候，由于采用了新的嵌入式公式模块而导致出现“带条件选择的并行汇聚路由问题”(请参考2009-02-27博文)，当时对这个问题的解决办法是采用基于拓扑结构的处理思想，对汇聚点的实际前驱分支节点通过算法预测出来，然后进行处理，简单的说就是找到造成这个汇聚模型的分支起点，对这个起始分支节点实际走的路径数进行计算，然后把这个实际
                                
                                Oracle 10g 的clusterware 32位 下载地址
                                    daizj
oracle
                                    Oracle 10g 的clusterware 32位 下载地址 
 
http://pan.baidu.com/share/link?shareid=531580&uk=421021908 
 
 
http://pan.baidu.com/share/link?shareid=137223&uk=321552738 
 
http://pan.baidu.com/share/l
                                
                                非常好的介绍：Linux定时执行工具cron
                                    dongwei_6688
linux
                                    Linux经过十多年的发展，很多用户都很了解Linux了，这里介绍一下Linux下cron的理解，和大家讨论讨论。cron是一个Linux 定时执行工具，可以在无需人工干预的情况下运行作业，本文档不讲cron实现原理，主要讲一下Linux定时执行工具cron的具体使用及简单介绍。 
新增调度任务推荐使用crontab -e命令添加自定义的任务（编辑的是/var/spool/cron下对应用户的cr
                                
                                Yii assets目录生成及修改
                                    dcj3sjt126com
yii
                                    assets的作用是方便模块化，插件化的，一般来说出于安全原因不允许通过url访问protected下面的文件，但是我们又希望将module单独出来，所以需要使用发布，即将一个目录下的文件复制一份到assets下面方便通过url访问。 
assets设置对应的方法位置 \framework\web\CAssetManager.php 
  
assets配置方法 在m
                                
                                mac工作软件推荐
                                    dcj3sjt126com
mac
                                    mac上的Terminal + bash ＋ screen组合现在已经非常好用了，但是还是经不起iterm＋zsh＋tmux的冲击。在同事的强烈推荐下，趁着升级mac系统的机会，顺便也切换到iterm＋zsh＋tmux的环境下了。 
我为什么要要iterm2 
切换过来也是脑袋一热的冲动，我也调查过一些资料，看了下iterm的一些优点： 
 
 * 兼容性好，远程服务器 vi 什么的低版本能很好兼
                                
                                Memcached(三)、封装Memcached和Ehcache
                                    frank1234
memcachedehcachespring ioc
                                    本文对Ehcache和Memcached进行了简单的封装，这样对于客户端程序无需了解ehcache和memcached的差异，仅需要配置缓存的Provider类就可以在二者之间进行切换，Provider实现类通过Spring IoC注入。 
cache.xml 
 
<?xml version="1.0" encoding="UTF-8"?>

                                
                                Remove Duplicates from Sorted List II
                                    hcx2013
remove
                                    Given a sorted linked list, delete all nodes that have duplicate numbers, leaving only distinct numbers from the original list. 
For example,Given 1->2->3->3->4->4->5,
                                
                                Spring4新特性——注解、脚本、任务、MVC等其他特性改进
                                    jinnianshilongnian
spring4
                                    Spring4新特性——泛型限定式依赖注入 
Spring4新特性——核心容器的其他改进 
Spring4新特性——Web开发的增强 
Spring4新特性——集成Bean Validation 1.1(JSR-349)到SpringMVC  
Spring4新特性——Groovy Bean定义DSL 
Spring4新特性——更好的Java泛型操作API  
Spring4新
                                
                                MySQL安装文档
                                    liyong0802
mysql
                                      工作中用到的MySQL可能安装在两种操作系统中，即Windows系统和Linux系统。以Linux系统中情况居多。 
  安装在Windows系统时与其它Windows应用程序相同按照安装向导一直下一步就即，这里就不具体介绍，本文档只介绍Linux系统下MySQL的安装步骤。 
  Linux系统下安装MySQL分为三种：RPM包安装、二进制包安装和源码包安装。二
                                
                                使用VS2010构建HotSpot工程
                                    p2p2500
HotSpotOpenJDKVS2010
                                    1. 下载OpenJDK7的源码： 
     
http://download.java.net/openjdk/jdk7 
     
http://download.java.net/openjdk/ 
     
2. 环境配置 
    ▶
                                
                                Oracle实用功能之分组后列合并
                                    seandeng888
oracle分组实用功能合并
                                    1       实例解析  
由于业务需求需要对表中的数据进行分组后进行合并的处理，鉴于Oracle10g没有现成的函数实现该功能，且该功能如若用JAVA代码实现会比较复杂，因此，特将SQL语言的实现方式分享出来，希望对大家有所帮助。如下： 
表test 数据如下： 
ID,SUBJECTCODE,DIMCODE,VALUE 
1&nbs
                                
                                Java定时任务注解方式实现
                                    tuoni
javaspringjvmxmljni
                                    Spring 注解的定时任务，有如下两种方式： 
第一种： 
<?xml version="1.0" encoding="UTF-8"?>

<beans xmlns="http://www.springframework.org/schema/beans"

 xmlns:xsi="http
                                
                                11大Java开源中文分词器的使用方法和分词效果对比
                                    yangshangchuan
word分词器ansj分词器Stanford分词器FudanNLP分词器HanLP分词器
                                    本文的目标有两个： 
1、学会使用11大Java开源中文分词器 
2、对比分析11大Java开源中文分词器的分词效果 
本文给出了11大Java开源中文分词的使用方法以及分词结果对比代码，至于效果哪个好，那要用的人结合自己的应用场景自己来判断。 
11大Java开源中文分词器，不同的分词器有不同的用法，定义的接口也不一样，我们先定义一个统一的接口： 
/**
 * 获取文本的所有分词结果, 对比
                                
                
            
        
    


    
        
            按字母分类：
            ABCDEFGHIJKLMNOPQRSTUVWXYZ其他
        
    


    
        
            首页 -
            关于我们 -
            站内搜索 -
            Sitemap -
            侵权投诉
        
        版权所有 IT知识库 CopyRight © 2000-2050 E-COM-NET.COM , All Rights Reserved.