qq467215628

hive读书笔记

文章目录

- hive cli
- - hive执行日志的位置
  - 指定主动使用本地模式
  - 指定数据仓库目录
  - hive查看和使用自定义及系统属性
- 数据类型
- - 默认字段分隔符
- sql
- - database
  - table
  - - 指定分隔符
    - 分区表
    - 修改表
  - 数据操作
  - - 装载数据
    - 静态分区和动态分区
    - 从查询结果创建表
    - 导出数据
  - 数据查询
  - - 查询是用正则表达式rlike
    - 排序order,sort的区别
    - cast 强制转换
    - 数据抽样,分桶
- 视图
- - - 区别
- 索引
- 调优
- - explain
  - join优化
  - 本地模式
  - 并行优化
  - 严格模式
  - 调整mapper和reducer的数量
  - JVM重用
  - 索引
  - 分区
  - 推测执行
  - 单个MR中多个group by
  - 虚拟列
- 压缩
- - 常用压缩格式
  - 配置
  - 测试
- 开发
- - 日志
  - debug
- 函数
- - 内置函数
  - 查看函数
  - 调用函数
  - 表生成函数UDTF
  - UDF
  - - 标准类型
    - 复杂类型
  - UDAF
  - UDTF
  - 宏
- stream
- - 使用内置的cat,cut,sed转换
  - 使用自定义脚本
  - wordcount
自定义记录格式
- 文本格式
- sequenceFile
- - RCFile
  - 自定义输入格式
- 使用正则表达式过滤日志
- xml
- json
附录1 内置函数大全
- 数学函数
- 集合函数
- 类型转换函数
- 日期函数
- 条件函数
- 字符函数
- 聚合函数
- 表生成函数
附录3 链接
附录4 日志样例

hive cli

hive执行日志的位置

日志配置文件:/conf/hive-log4j2.properties

property.hive.log.dir = ${sys:java.io.tmpdir}/${sys:user.name}

可以在hive cli中查询变量值


hive> set system:java.io.tmpdir;
system:java.io.tmpdir=/tmp
--root用户的默认日志位置
/tmp/root/hive.log

指定主动使用本地模式

指定数据仓库目录

hive查看和使用自定义及系统属性

--会打印所有属性
hive>set;
hive>set env:HOME;
hive> set hivevar:col_name=name;
hive> set col_name;
hive> create table test3(id int,${env:LOGNAME} string);
hive> create table test2(id int,${hivevar:col_name} string);
--启动是添加配置项
--显示当前db
hive --hiveconf hive.cli.print.current.db=true
hive (default)> set hiveconf: hive.cli.print.current.db;
hiveconf: hive.cli.print.current.db=true
--更改为不显示
hive (default)> set hiveconf: hive.cli.print.current.db=false;
hive> 
--显示系统属性,java对system属性有可读可写权限
hive> set system:user.name;
system:user.name=root
hive> set system:myname=hujiawei;
hive> set system:myname;
system:myname=hujiawei

#执行一条命令,-S 是静默模式,不会输出OK之类的信息
hive -S -e 'show tables;'   
hive -S -e 'set'|grep warehouse;
#创建src表,并加载数据
create table src(s String);
echo "one row">/tmp/myfile
hive -e "LOAD DATA LOCAL INPATH '/tmp/myfile' into table src"
#执行sql文件
hive -f test.hql 
#hive cli中执行文件
source /root/test/test.hql;
#hive cli加载时候自动加载$HOME/.hiverc文件,如果没有,可以创建一个
set hive.cli.print.current.db=true;
set hive.exec.mode.local.auto=true;
#hive -i 会在启动时候加载指定的文件

hive所有默认属性都在
/opt/install/hadoop/apache-hive-2.3.6-bin/conf/hive-default.xml.template
文件中显示配置了

在hive中执行shell命令:!pwd,执行hadoop命令:dfs -ls /
hive脚本注释方式同sql: –

数据类型

create table employes(
	name string,
    salary float,
    subordinates array<string>,
    deductions map<String,float>,
    address struct<street:string,city:string,state:string,zip:INT>
);

默认字段分隔符

create table employes(
	name string,
    salary float,
    subordinates array<string>,
    deductions map<String,float>,
    address struct<street:string,city:string,state:string,zip:INT>
)
row format delimited
fields terminated by '\001'
collection items terminated by '\002'
map keys terminated by '\003'
lines terminated by '\n'
stored as textfile;

sql

database

show datbase like 't*'
create database test;
drop datbase test;
desc database test;
--hive中一个database在hdfs中以.db结尾的目录,表是以表名为名的目录,可以在创建的时候指定位置
hive> create database test location '/test/test.db';
dfs -ls -R /test
--删除有表的database;
drop database test cascade;
--添加database描述
create database test comment 'this is a test db';
--添加database属性
create database test comment 'this is a test db' with dbproperties('creator'='hujiawie','date'='2019年12月10日');
--查看属性
desc database extended  test;
--修改数据库属性,可以新增,不能删除
alter database test set dbproperties('creators'='laohu');

table

create table employes(
	name string,
    salary float,
    subordinates array<string>,
    deductions map<String,float>,
    address struct<street:string,city:string,state:string,zip:INT>
)location '/test/employes' 
--拷贝表
hive> create table employees  like employes;
hive> show tables in mydb;

指定分隔符

hive> create table t1(
    >     id      int
    >    ,name    string
    >    ,hobby   array<string>
    >    ,add     map<String,string>
    > )
    > partitioned by (pt_d string)
    > row format delimited
    > fields terminated by ','
    > collection items terminated by '-'
    > map keys terminated by ':'
    > ;

分区表

hive> create table t1(
    >     id      int
    >    ,name    string
    >    ,hobby   array<string>
    >    ,add     map<String,string>
    > )
    > partitioned by (pt_d string)
    > row format delimited
    > fields terminated by ','
    > collection items terminated by '-'
    > map keys terminated by ':'
    > ;
--加载数据
1,xiaoming,book-TV-code,beijing:chaoyang-shagnhai:pudong
2,lilei,book-code,nanjing:jiangning-taiwan:taibei
3,lihua,music-book,heilongjiang:haerbin

load data local inpath '/root/test/myfile' overwrite into table t1 partition ( pt_d = '201701');
--加载另一个分区数据
1   xiaoming    ["book","TV","code"]    {
    "beijing":"chaoyang","shagnhai":"pudong"}  000000
2   lilei   ["book","code"] {
    "nanjing":"jiangning","taiwan":"taibei"}   000000
3   lihua   ["music","book"]    {
    "heilongjiang":"haerbin"}  000000
1   xiaoming    ["book","TV","code"]    {
    "beijing":"chaoyang","shagnhai":"pudong"}  201701
2   lilei   ["book","code"] {
    "nanjing":"jiangning","taiwan":"taibei"}   201701
3   lihua   ["music","book"]    {
    "heilongjiang":"haerbin"}  201701

load data local inpath '/root/test/myfile2' overwrite into table t1 partition ( pt_d = '000000');
--查看dfs上的目录和文件
hive> dfs -ls -R /user/hive/warehouse/mydb.db;
drwxr-xr-x   - root supergroup          0 2019-12-11 10:56 /user/hive/warehouse/mydb.db/employes
drwxr-xr-x   - root supergroup          0 2019-12-11 11:04 /user/hive/warehouse/mydb.db/t1
drwxr-xr-x   - root supergroup          0 2019-12-11 11:04 /user/hive/warehouse/mydb.db/t1/pt_d=000000
-rwxr-xr-x   1 root supergroup        474 2019-12-11 11:04 /user/hive/warehouse/mydb.db/t1/pt_d=000000/myfile2
drwxr-xr-x   - root supergroup          0 2019-12-11 11:02 /user/hive/warehouse/mydb.db/t1/pt_d=201701
-rwxr-xr-x   1 root supergroup        147 2019-12-11 11:02 /user/hive/warehouse/mydb.db/t1/pt_d=201701/myfile
--加载一个分区,会创建相应的目录
hive> alter table t1 add partition(pt_d ='3333');
--删除分区,会删除相应的文件(外部表不会删除,可以通过msck repair table table_name恢复)
alter table test1 drop partition (pt_d = ‘201701’);
--另外注意分区其实也是个字段,只不过把这个字段当作索引,通过建目录的方式,提高性能
hive> desc extended t1;
id                      int                                         
name                    string                                      
hobby                   array<string>                               
add                     map<string,string>                          
pt_d                    string                                      
                 
# Partition Information          
# col_name              data_type               comment             
                 
pt_d                    string  
#查看有多少分区
hive> show partitions t1;
pt_d=000000
pt_d=3333

修改表

--重命名
hive> alter table employes rename to employees;
--修改分区地址,无效
hive> alter table t1 partition(pt_d=3333) set location  "hdfs://localhost:9000/user/hive/warehouse/mydb.db/t1/pt_d=4444";
--修改列,这边修改列为值,after某列之后,可以改成first,就是第一个位置,
--但是更改列顺序要求两列类型相同,
hive> create table src (c1 string,c1 string);
hive> alter table src change column c1 c3 string comment 'test' after c2;
--增加列
hive> alter table src add columns(c4 string comment 'column4');
--删除列,全替换的方式
hive> alter table src replace columns(cl1 string,cl2 string);
--修改表属性
hive> alter table src set tblproperties('name'='hujiawei');
--查看表属性
hive> show tblproperties src;

数据操作

装载数据

local指定是从本地拷贝,如果没有local是从hdfs中移动(不可跨集群)
overwrite指定是追加还是覆盖

静态分区和动态分区

静态分区方法1:

静态分区方法2:

动态分区

动态分区配置参数

从查询结果创建表

create table test as select c1,c2 from src;
--或者用like创建表
hive> create table test2 like test;

注意这种创建表,如果src表是从外部文件加载进来的表,会将src的数据文件移动到目标表的位置

exec.FileSinkOperator: Moving tmp dir: hdfs://localhost:9000/user/hive/warehouse/src/.hive-staging_hive_2019-12-10_16-11-45_214_2063272
320733291437-1/_tmp.-ext-10002 to: hdfs://localhost:9000/user/hive/warehouse/src/.hive-staging_hive_2019-12-10_16-11-45_214_2063272320733291437-1/-ext-10002

导出数据

--1,
hive> from test t
    > insert overwrite local directory '/root/test/'
    > select * ;
hive> ! ls /root/test;
000000_0
hive> ! cat /root/test/000000_0;
20191212
20180112
20190212
20190312
20190712
--2
hive> from test t
    > insert overwrite local directory '/root/test/'
    > select * ;
--3 直接拷贝hdfs中的文件

 # 拷贝到本地
 fs -get 'hdfs://localhost:9000/user/hive/warehouse/test/data' .
 #拷贝到hdfs中另一个目录
  hs -cp 'hdfs://localhost:9000/user/hive/warehouse/test/data' '/test/'

数据查询

查询是用正则表达式rlike

--hive的 正则表达式是用的java的,
select * from src a where a.s rlike '^a.*';

排序order,sort的区别

--order同oracle中的order,是全局排序,耗时多
select * from test a order by a.id1;
--sort by 是对每个reducer的输出排序,但是多个reducer整合的时候不一定排序正常
hive> select * from test a sort by a.id1;
--distribute by 用于对多个排序字段时,指定某个字段相同的给同一个reducer处理
hive> select * from test a distribute by a.id1 sort by a.id1,a.id2;
--cluster by 相当于distribute by order by 的组合,
hive> select * from test a cluster by  a.id1;

cast 强制转换

hive> select cast(a.id1 as float) from test a ;

数据抽样,分桶

--将test 表数据随机分成2个桶,取其中一个桶
hive> select * from test  tablesample(bucket 1 out of 2 on rand()) a;
--随机分3个桶,取第二个桶
hive> select * from test tablesample(bucket 2 out of 3 on rand());
--不是随机分桶,对列值分桶
hive> select * from test tablesample(bucket 2 out of 3 on id1);
--注意这边分桶其实不是均分的,所以每个桶中数据量不一定相同

视图

区别

逻辑视图和物理视图的区别
hive不支持物化视图,实质是将视图的定义语句和查询语句组合一起,供hive执行查询计划,所以视图只是一个逻辑视图,提供简化查询的功能
不能对列分权
orace中可以将指定列作为视图,达到控制权限,某些用户不需要ta_staff表的查询权限,只需要有视图的权限,可以看到某些列,但是hive中不支持对列分权,因为用户必须要有这个表的查询权限(文件访问权限)才能看视图

但是hive中的视图可以通过where字句来限定某些行
```
hive> select * from test;
OK
1       2       3
88      77      66
33      22      11
1       2       3
88      77      66
33      22      11
Time taken: 0.123 seconds, Fetched: 6 row(s)
hive> 
```
关于hive的用户,角色,组的相关问题参考csdn¹

索引

创建索引

hive> create index test1_index on table test(id1) as 
    > 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
    > with deferred rebuild 
    > in table test_index_table;
--bitmap索引适合值较少的列,
create index index_test_2 on table test(id2) as 'BITMAP' with deferred rebuild in table test_index2_table ;

删除索引
```
hive> drop index test1_index on test;
```
查看索引
```
hive> show formatted index on test;
```

重建索引

alter index test1_index on test rebuild;

定制化索引
实现hive的接口,打包,添加后,在创建索引时用as指定类名

具体见cwiki²

调优

explain

可以查看查询语句转换成map reduce的具体过程

 explain select sum(id1) from test;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: test
            Statistics: Num rows: 6 Data size: 42 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: id1 (type: int)
              outputColumnNames: id1
              Statistics: Num rows: 6 Data size: 42 Basic stats: COMPLETE Column stats: NONE
              Group By Operator
                aggregations: sum(id1)
                mode: hash
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  sort order: 
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                  value expressions: _col0 (type: bigint)
      Reduce Operator Tree:
        Group By Operator
          aggregations: sum(VALUE._col0)
          mode: mergepartial
          outputColumnNames: _col0
          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.31 seconds, Fetched: 44 row(s)

另外可以使用explain extended 获取更详细的信心

explain extended select sum(id1) from test;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: test
            Statistics: Num rows: 6 Data size: 42 Basic stats: COMPLETE Column stats: NONE
            GatherStats: false
            Select Operator
              expressions: id1 (type: int)
              outputColumnNames: id1
              Statistics: Num rows: 6 Data size: 42 Basic stats: COMPLETE Column stats: NONE
              Group By Operator
                aggregations: sum(id1)
                mode: hash
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  null sort order: 
                  sort order: 
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                  tag: -1
                  value expressions: _col0 (type: bigint)
                  auto parallelism: false
      Path -> Alias:
        hdfs://localhost:9000/user/hive/warehouse/test [test]
      Path -> Partition:
        hdfs://localhost:9000/user/hive/warehouse/test 
          Partition
            base file name: test
            input format: org.apache.hadoop.mapred.TextInputFormat
            output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
            properties:
              COLUMN_STATS_ACCURATE {
    "BASIC_STATS":"true"}
              bucket_count -1
              column.name.delimiter ,
              columns id1,id2,id3
              columns.comments 
              columns.types int:int:int
              file.inputformat org.apache.hadoop.mapred.TextInputFormat
              file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              location hdfs://localhost:9000/user/hive/warehouse/test
              name default.test
              numFiles 4
              numRows 6
              rawDataSize 42
              serialization.ddl struct test { i32 id1, i32 id2, i32 id3}
              serialization.format 1
              serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              totalSize 48
              transient_lastDdlTime 1576480718
            serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
          
              input format: org.apache.hadoop.mapred.TextInputFormat
              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              properties:
                COLUMN_STATS_ACCURATE {
    "BASIC_STATS":"true"}
                bucket_count -1
                column.name.delimiter ,
                columns id1,id2,id3
                columns.comments 
                columns.types int:int:int
                file.inputformat org.apache.hadoop.mapred.TextInputFormat
                file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                location hdfs://localhost:9000/user/hive/warehouse/test
                name default.test
                numFiles 4
                numRows 6
                rawDataSize 42
                serialization.ddl struct test { i32 id1, i32 id2, i32 id3}
                serialization.format 1
                serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                totalSize 48
                transient_lastDdlTime 1576480718
              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              name: default.test
            name: default.test
      Truncated Path -> Alias:
        /test [test]
      Needs Tagging: false
      Reduce Operator Tree:
        Group By Operator
          aggregations: sum(VALUE._col0)
          mode: mergepartial
          outputColumnNames: _col0
          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            GlobalTableId: 0
            directory: hdfs://localhost:9000/tmp/hive/root/8d35f95d-893e-40ee-b831-6177341c7acb/hive_2019-12-17_11-12-25_359_70780381862038030-1/-mr-10001/.hive-staging_hive_2019-12-17_11-12-25_359_70780381862038030-1/-ext-10002
            NumFilesPerFileSink: 1
            Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
            Stats Publishing Key Prefix: hdfs://localhost:9000/tmp/hive/root/8d35f95d-893e-40ee-b831-6177341c7acb/hive_2019-12-17_11-12-25_359_70780381862038030-1/-mr-10001/.hive-staging_hive_2019-12-17_11-12-25_359_70780381862038030-1/-ext-10002/
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                properties:
                  columns _col0
                  columns.types bigint
                  escape.delim \
                  hive.serialization.extend.additional.nesting.levels true
                  serialization.escape.crlf true
                  serialization.format 1
                  serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
            TotalFiles: 1
            GatherStats: false
            MultiFileSpray: false

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.324 seconds, Fetched: 119 row(s)

join优化

大表放join右边,小表放join左边

原因1:小表可以放在内存中缓存,用大表中记录挨个匹配小表的记录
实际原因是: 写在关联左侧的表每有1条重复的关联键时底层就会多1次运算处理
具体参考csdn³

本地模式

只针对小数据集,没有实际意义

并行优化

对job执行独立的阶段可以执行并行,提高速率

严格模式

严格模式下:

分区表下where必须指定分区
order by 必须加上limit
对笛卡尔乘积禁用

调整mapper和reducer的数量

根据输入和输出文件数量大小调整

JVM重用

可以减少新建task过程中初始化和销毁jvm的开销,缺点是耗时最长的task会长时间占用插槽,导致堵塞

索引

同oracle,对有索引列的条件查询会显著提高效率,但是维护索引会耗时,需要rebuild

分区

提升很明显,但是会导致namenode的文件过多,内存爆炸

推测执行

在分布式集群环境下，因为程序Bug（包括Hadoop本身的bug），负载不均衡或者资源分布不均等原因，会造成同一个作业的多个任务之间运行速度不一致，有些任务的运行速度可能明显慢于其他任务（比如一个作业的某个任务进度只有50%，而其他所有任务已经运行完毕），则这些任务会拖慢作业的整体执行进度。为了避免这种情况发生，Hadoop采用了推测执行（Speculative Execution）机制，它根据一定的法则推测出“拖后腿”的任务，并为这样的任务启动一个备份任务，让该任务与原始任务同时处理同一份数据，并最终选用最先成功运行完成任务的计算结果作为最终结果。

如果用户因为输入数据量很大而需要执行长时间的map或者Reduce task的话，那么启动推测执行造成的浪费是非常巨大大。

单个MR中多个group by

设置是否启用该功能,能将多个group by操作组装到单个MAP REDUCE中

虚拟列

用于诊断结果,通过参数配置开启

压缩

常用压缩格式

压缩能节省磁盘空间,提高文件传输速率,但是会消耗cpu加/解压缩的

对于量大但是不怎么计算的数据，一般用gzip（压缩比最高，压缩解压缩速度最慢）
对于量小但是经常需要计算的数据，一般用lzo或者snappy

配置

开启hadoop压缩格式

vi /opt/install/hadoop/hadoop-2.7.1/etc/hadoop/core-site.xml

添加

<property>
	<name>io.compression.codecsname>
	<value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodecvalue>
property>

--配置后不需要重启hadoop就能够在hive中显示可用的压缩格式
hive> set io.compression.codecs;
io.compression.codecs=org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec

开启中间压缩

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-RejrvQ5W-1577407917780)(…//pic/image-20191218101039825.png)]
开启输出结果压缩

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-F4IMck81-1577407917781)(…//pic/image-20191218101113926.png)]
使用sequence file

测试

使用中间结果压缩

hive> set hive.exec.compress.intermediate=true;
hive> create table interemediate_com_om row format delimited fields terminated by '\t' as select * from test;
Automatically selecting local only mode for query
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20191218101653_df7baa3b-9570-40a7-b480-be91907bfb1e
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2019-12-18 10:17:03,294 Stage-1 map = 0%,  reduce = 0%
Ended Job = job_local1362494287_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://localhost:9000/user/hive/warehouse/.hive-staging_hive_2019-12-18_10-16-53_073_1921170228225164888-1/-ext-10002
Moving data to directory hdfs://localhost:9000/user/hive/warehouse/interemediate_com_om
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 48 HDFS Write: 132 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 11.773 seconds
 dfs -ls hdfs://localhost:9000/user/hive/warehouse/interemediate_com_om;
Found 1 items
-rwxr-xr-x   1 root supergroup         48 2019-12-18 10:17 hdfs://localhost:9000/user/hive/warehouse/interemediate_com_om/000000_0
hive> dfs -cat hdfs://localhost:9000/user/hive/warehouse/interemediate_com_om/*;
1       2       3
88      77      66
33      22      11
1       2       3
88      77      66
33      22      11
--最终结果仍然是文本格式

对输出结果用gzip压缩
hadoop中配置

<property>
       <name>mapred.output.compressname>
       <value>truevalue>
property>
<property>
    <name>mapred.compress.map.outputname>
    <value>truevalue>
property>
<property>
    <name>mapred.output.compression.codecname>
    <value>org.apache.hadoop.io.compress.GzipCodecvalue>
property>

hive中执行

--检查压缩编码格式
hive> set mapred.output.compression.codec;
mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec
--检查是否输出文件压缩
hive> set hive.exec.compress.output;
hive.exec.compress.output=false
hive> set hive.exec.compress.output=true;
--执行语句
hive> create table final_com_on_gz2 row format delimited fields terminated by '\t' as select * from test;
Automatically selecting local only mode for query
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20191218111200_4d4e7160-4f68-451c-86cf-9cb5c8b54129
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2019-12-18 11:12:12,231 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local1413027372_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://localhost:9000/user/hive/warehouse/.hive-staging_hive_2019-12-18_11-12-00_983_5336074740355100025-1/-ext-10002
Moving data to directory hdfs://localhost:9000/user/hive/warehouse/final_com_on_gz2
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 48 HDFS Write: 127 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 12.716 seconds
hive> dfs -ls hdfs://localhost:9000/user/hive/warehouse/final_com_on_gz2;
Found 1 items
-rwxr-xr-x   1 root supergroup         47 2019-12-18 11:12 hdfs://localhost:9000/user/hive/warehouse/final_com_on_gz2/000000_0.gz
--可以将.gz文件 用hadoop -fs -get 命令拷贝到本地,然后使用zcat查看

输出为sequencefile

--检查压缩格式
hive> set mapred.output.compression.codec;
mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec
hive> set hive.exec.compress.output;
hive.exec.compress.output=false
--对结果进行压缩
hive> set hive.exec.compress.output=true;
hive> set mapred.output.compression.type;
mapred.output.compression.type=RECORD
--使用sequencefile
hive> set mapred.output.compression.type=BLOCK;
--执行语句
hive> create table final_comp_on_gz_seq row format delimited fields terminated by '\t' stored as sequencefile as select * from test;
Automatically selecting local only mode for query
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20191223091402_98a1b33e-d0d1-4b5b-93a7-13b108e9ae59
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2019-12-23 09:14:13,261 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local1832209723_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://localhost:9000/user/hive/warehouse/.hive-staging_hive_2019-12-23_09-14-02_976_8408522203594088276-1/-ext-10002
Moving data to directory hdfs://localhost:9000/user/hive/warehouse/final_comp_on_gz_seq
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 48 HDFS Write: 355 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 11.52 seconds'
---检查文件,
hive> dfs -ls -R /user/hive/warehouse/final_comp_on_gz_seq
    > ;
-rwxr-xr-x   1 root supergroup        271 2019-12-23 09:14 /user/hive/warehouse/final_comp_on_gz_seq/000000_0
--文件是压缩后的二进制,cat看不了
hive> dfs -cat /user/hive/warehouse/final_comp_on_gz_seq/*;
SEQorg.apache.hadoop.io.BytesWritableorg.apache.hadoop.io.Textorg.apache.hadoop.io.compress.GzipCodeckQ=kQ=
--可以用-text读取sequencefile
hive> dfs -text /user/hive/warehouse/final_comp_on_gz_seq/*;
        1       2       3
        88      77      66
        33      22      11
        1       2       3
        88      77      66
        33      22      11

归档
归档是将多个文件归并成一个,减少namenode的压力

 --创建测试表,带分区
 create table hive_text(line string) partitioned by (folder string);
 --加载数据
 hive> ! ls ${env:HIVE_HOME};
bin
binary-package-licenses
conf
derby.log
examples
hcatalog
jdbc
lib
LICENSE
metastore_db
NOTICE
RELEASE_NOTES.txt
scripts
hive> alter table hive_text add partition (folder='docs');
hive> load  data local inpath '${env:HIVE_HOME}/NOTICE' into table hive_text partition (folder='docs');
Loading data to table default.hive_text partition (folder=docs)
OK
Time taken: 1.158 seconds
hive> load  data local inpath '${env:HIVE_HOME}/RELEASE_NOTES.txt' into table hive_text partition (folder='docs');
Loading data to table default.hive_text partition (folder=docs)
OK
Time taken: 0.931 seconds
hive> select * from hive_text;
--设置允许归档
set hive.archive.enabled=true;
--归档文件夹
hive>  alter table hive_text archive partition (folder='docs');
intermediate.archived is hdfs://localhost:9000/user/hive/warehouse/hive_text/folder=docs_INTERMEDIATE_ARCHIVED
intermediate.original is hdfs://localhost:9000/user/hive/warehouse/hive_text/folder=docs_INTERMEDIATE_ORIGINAL
Creating data.har for hdfs://localhost:9000/user/hive/warehouse/hive_text/folder=docs
in hdfs://localhost:9000/user/hive/warehouse/hive_text/folder=docs/.hive-staging_hive_2019-12-23_10-08-32_249_9145301772833183402-1/-ext-10000/partlevel
Please wait... (this may take a while)
Moving hdfs://localhost:9000/user/hive/warehouse/hive_text/folder=docs/.hive-staging_hive_2019-12-23_10-08-32_249_9145301772833183402-1/-ext-10000/partlevel to hdfs://localhost:9000/user/hive/warehouse/hive_text/folder=docs_INTERMEDIATE_ARCHIVED
Moving hdfs://localhost:9000/user/hive/warehouse/hive_text/folder=docs to hdfs://localhost:9000/user/hive/warehouse/hive_text/folder=docs_INTERMEDIATE_ORIGINAL
Moving hdfs://localhost:9000/user/hive/warehouse/hive_text/folder=docs_INTERMEDIATE_ARCHIVED to hdfs://localhost:9000/user/hive/warehouse/hive_text/folder=docs
OK
Time taken: 25.028 seconds
--如果归档出错,查看hive执行日志,发现下面错误:
2019-12-23T10:03:40,679 ERROR [07dc3327-de2f-4586-9b96-229aad0d6bc3 main] exec.DDLTask: java.lang.NoClassDefFoundError: org/apache/hadoop/tools/HadoopArchives
--是因为hadoop的归档jar要拷贝到hive的lib目录下,2.7.3版本的hadoop压缩包是:

cd $HADOOP_HOME;
find . -name hadoop*archive*.jar;
./share/hadoop/tools/lib/hadoop-archives-2.7.1.jar
cp ./share/hadoop/tools/lib/hadoop-archives-2.7.1.jar $HIVE_HOME/lib

开发

日志

debug模式开启hive-cli

 hive -hiveconf hive.root.logger=DEBUG,console

debug

ecs-d0b0:~ # hive --help --debug

Allows to debug Hive by connecting to it via JDI API

Usage: hive --debug[:comma-separated parameters list]

Parameters:

recursive=<y|n>             Should child JVMs also be started in debug mode. Default: y
port=<port_number>          Port on which main JVM listens for debug connection. Default: 8000
mainSuspend=<y|n>           Should main JVM wait with execution for the debugger to connect. Default: y
childSuspend=<y|n>          Should child JVMs wait with execution for the debugger to connect. Default: n
swapSuspend                 Swaps suspend options between main an child JVMs

函数

内置函数

见 [附录1](#附录1 内置函数大全)

查看函数

--显示有哪些udf
hive> show functions;
--查看函数详情
hive> desc function concat;
hive> desc function  extended  concat;

调用函数

--不使用表
hive> select concat('1','2') ;
--使用了表
hive> select concat('1','2') from src;

表生成函数UDTF

--构建表
hive> create table test(id int,arr array);
hive> insert into test select 1,array(11,12,13);
hive> insert into test select 2,array(21,22,23);
--array函数,explode函数
hive> select explode(array(1,2,3));
OK
1
2
3
--explode函数不能直接在查询中用,
hive> select id ,explode(arr) from test;
--FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions
--可以配合lateral view使用,
hive> select id ,ex_arr from test lateral view explode(arr) subView as ex_arr ;
1       11
1       12
1       13
2       21
2       22
2       23

UDF

标准类型

idea中创建maven工程


<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>parentartifactId>
        <groupId>com.hujiawei666groupId>
        <version>1.0-SNAPSHOTversion>
    parent>
    <modelVersion>4.0.0modelVersion>

    <artifactId>myUDFartifactId>

    <dependencies>
        <dependency>
            <groupId>junitgroupId>
            <artifactId>junitartifactId>
            <version>4.12version>
            <scope>testscope>
        dependency>
        <dependency>
            <groupId>org.apache.hadoopgroupId>
            <artifactId>hadoop-clientartifactId>
            <version>2.7.1version>
            <scope>providedscope>
        dependency>
        
        <dependency>
            <groupId>org.apache.hivegroupId>
            <artifactId>hive-cliartifactId>
            <version>2.3.6version>
            <scope>providedscope>
        dependency>
        <dependency>
            <groupId>org.apache.hivegroupId>
            <artifactId>hive-execartifactId>
            <version>2.3.6version>
        dependency>

    dependencies>

    <build>
        <finalName>myUDFfinalName>
        
            
                
                
                
                
                    
                    
                    
                
            
            
                
                
                
                
                    
                        
                    
                
                
                    
                    
                        
                        
                        
                            
                        
                    
                
            
        
    build>
project>

编写java

package com.hujiawei666.udf;


import org.apache.hadoop.hive.ql.exec.UDF;

import java.text.SimpleDateFormat;
import java.util.Date;

/**
 * @author hujw
 * date: 2019/12/13 11:17
 **/
public class IsVocation extends UDF {
     
    private SimpleDateFormat df;
    private int minLength = 6;
    private int[] volcation_mon_arr;


    public IsVocation() {
     
        df = new SimpleDateFormat("yyyyMM");
        volcation_mon_arr = new int[]{
     
                1, 2, 7, 8
        };

    }

    public boolean evaluate(String dateString) {
     
        int month = getMonth(dateString);
        if (month == 0) {
     
            return false;
        } else {
     
            for (int i = 0; i < volcation_mon_arr.length; i++) {
     
                if (volcation_mon_arr[i] == month) {
     
                    return true;
                }
            }
        }
        return false;
    }

    private int getMonth(String dateString) {
     
        if (dateString == null || dateString.length() < minLength) {
     
            return 0;
        }
        try {
     
            String monStr = dateString.substring(0, 6);
            Date mon = df.parse(monStr);
            if (mon != null) {
     
                int month = mon.getMonth() + 1;
                return month;
            }
        } catch (Exception e) {
     
            System.out.println("dateString " + dateString + "is not a valid format");
            e.printStackTrace();
        }
        return 0;
    }

    public static void main(String[] args) {
     
        System.out.println(new IsVocation().evaluate("20191207"));
        System.out.println(new IsVocation().evaluate("20190135"));
        System.out.println(new IsVocation().evaluate("20190255"));
        System.out.println(new IsVocation().evaluate("20190211"));
        System.out.println(new IsVocation().evaluate("20190311"));
    }
}

工程目录结构

4.maven-package打包完成后上传服务器

5.hive中调用

--创建示例表
hive> create table test(dt string);
hive> load data local in path /root/test/data into table test;

ecs-d0b0:~ # cat test/data
20191212
20180112
20190212
20190312
20190712

hive> select * from test;
20191212
20180112
20190212
20190312
20190712
hive> add jar /root/myUdf.jar;
Added [/root/myUdf.jar] to class path
Added resources: [/root/myUdf.jar]
hive> create function is_vocation as 'com.hujiawei666.udf.IsVocation';
hive> select a.dt,is_vocation(a.dt) from test a;
20191212        false
20180112        true
20190212        true
20190312        false
20190712        true

函数放到hdfs中调用

ecs-d0b0:~ # hs -put isVocation.jar /user/ 
create function default.is_vocation3 as 'com.ai.ctc.zjjs.udf.IsVocation' using jar 'hdfs://localhost:9000/user/isVocation.jar';

复杂类型

函数编写

package com.hujiawei666.udf;

import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;

/**
 * @author hujw
 * date: 2019/12/23 10:45
 **/
public class MyNvl extends GenericUDF {
     
    private GenericUDFUtils.ReturnObjectInspectorResolver returnOIResolver;
    private ObjectInspector[] argumentsIOs;

    @Override
    public ObjectInspector initialize(ObjectInspector[] objectInspectors) throws UDFArgumentException {
     
        argumentsIOs = objectInspectors;
        if (objectInspectors.length != 2) {
     
            throw new UDFArgumentException("the operator 'NVL' accepts 2 arguments");
        }
        returnOIResolver = new GenericUDFUtils.ReturnObjectInspectorResolver(true);
        if (!(returnOIResolver.update(argumentsIOs[0]) &&
                returnOIResolver.update(argumentsIOs[1])
        )) {
     
            throw new UDFArgumentTypeException(2, "the 1st and 2nd args of function should be the same type," +
                    "but they are different:" + argumentsIOs[0].getTypeName() +
                    " and " + argumentsIOs[1].getTypeName());
        }
        return returnOIResolver.get();
    }

    @Override
    public Object evaluate(DeferredObject[] deferredObjects) throws HiveException {
     
        Object retVal = returnOIResolver.convertIfNecessary(deferredObjects[0].get(), argumentsIOs[0]);
        if (retVal == null) {
     
            retVal = returnOIResolver.convertIfNecessary(deferredObjects[1].get(), argumentsIOs[1]);
        }
        return retVal;
    }

    @Override
    public String getDisplayString(String[] strings) {
     
        StringBuffer sb=new StringBuffer();
        sb.append("if");
        sb.append(strings[0]);
        sb.append("is null returns");
        sb.append(strings[1]);
        return sb.toString();
    }
}

创建函数

add jar /root/test/myUdf.jar;
create function mynvl as 'com.hujiawei666.udf.MyNvl';
hive> select mynvl(1,2) as c1,mynvl(null,2) as c2 ,mynvl('','aa') as c3,mynvl(null,'b');
OK
1       2               b
Time taken: 0.062 seconds, Fetched: 1 row(s)

UDAF

自定义聚合(aggregation)函数

package com.hujiawei666.udf;

import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
import org.apache.hadoop.hive.ql.parse.SemanticException;
import org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMkCollectionEvaluator;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFParameterInfo;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
import org.slf4j.LoggerFactory;

/**
 * @author hujw
 * date: 2019/12/24 9:26
 **/
@Description(name = "collect",value = "_FUNC_(X) -return a list of objects." +
        "CAUTION will easily OOM on large data sets")
public class GenericUDAFCollect extends AbstractGenericUDAFResolver {
     
    static final org.slf4j.Logger LOGGER = LoggerFactory.getLogger(GenericUDAFCollect.class);

    public GenericUDAFCollect() {
     
    }

    @Override
    public GenericUDAFEvaluator getEvaluator(GenericUDAFParameterInfo info) throws SemanticException {
     
        if (info.isAllColumns()) {
     
            throw new SemanticException("The specified syntax for UDAF invocation is invalid.");
        } else {
     
            return this.getEvaluator(info.getParameters());
        }
    }

    @Override
    public GenericUDAFEvaluator getEvaluator(TypeInfo[] info) throws SemanticException {
     
        if (info.length != 1) {
     
            throw new UDFArgumentTypeException(info.length - 1, "Exactly one argument is expected");
        }
        if (info[0].getCategory() != ObjectInspector.Category.PRIMITIVE) {
     
            throw new UDFArgumentTypeException(0, "Only primitive  type arguments are accept but" +
                    info[0].getTypeName() + "was passed as parameter 1");
        }
        return new GenericUDAFMkListEvaluator();
    }
}

package com.hujiawei666.udf;

import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMkCollectionEvaluator;
import org.apache.hadoop.hive.serde2.objectinspector.*;

import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

/**
 * @author hujw
 * date: 2019/12/24 9:38
 **/
public class GenericUDAFMkListEvaluator extends GenericUDAFMkCollectionEvaluator {
     
    private PrimitiveObjectInspector inputOI;
    private StandardListObjectInspector loi;
    private StandardListObjectInspector internalMergeOI;

    @Override
    public ObjectInspector init(Mode m, ObjectInspector[] parameters) throws HiveException {
     
        super.init(m, parameters);
        if (m == Mode.PARTIAL1) {
     
            inputOI = (PrimitiveObjectInspector) parameters[0];
            return ObjectInspectorFactory.
                    getStandardListObjectInspector(
                            ObjectInspectorUtils.getStandardObjectInspector(inputOI));
        } else {
     
            if (!(parameters[0] instanceof StandardListObjectInspector)) {
     
                inputOI = (PrimitiveObjectInspector) ObjectInspectorUtils.
                        getStandardObjectInspector(parameters[0]);
                return ObjectInspectorFactory.getStandardListObjectInspector(inputOI);
            } else {
     
                internalMergeOI = (StandardListObjectInspector) parameters[0];
                inputOI = (PrimitiveObjectInspector) internalMergeOI.getListElementObjectInspector();
                loi = (StandardListObjectInspector) ObjectInspectorUtils.getStandardObjectInspector(internalMergeOI);
                return loi;
            }
        }
    }

    static class MkArrayAggregationBuffer implements AggregationBuffer {
     
        List<Object> container;
    }
    @Override
    public void reset(AggregationBuffer agg) throws HiveException {
     
        ((MkArrayAggregationBuffer) agg).container = new ArrayList<>();
    }
    @Override
    public AggregationBuffer getNewAggregationBuffer() throws HiveException {
     
        MkArrayAggregationBuffer ret=new MkArrayAggregationBuffer();
        reset(ret);
        return ret;
    }
    //mapside
    @Override
    public void iterate(AggregationBuffer agg, Object[] parameters) throws HiveException {
     
        assert parameters.length == 1;

        Object p = parameters[0];
        if (p != null) {
     
            MkArrayAggregationBuffer myagg = (MkArrayAggregationBuffer)agg;
            this.putIntoList(p, myagg);
        }
    }
    @Override
    public Object terminatePartial(AggregationBuffer agg) throws HiveException {
     
        MkArrayAggregationBuffer myagg = (MkArrayAggregationBuffer)agg;
        List<Object> ret = new ArrayList(myagg.container.size());
        ret.addAll(myagg.container);
        return ret;
    }
    @Override
    public void merge(AggregationBuffer agg, Object partial) throws HiveException {
     
        MkArrayAggregationBuffer myagg = (MkArrayAggregationBuffer)agg;
        List<Object> partialResult = (ArrayList)this.internalMergeOI.getList(partial);
        if (partialResult != null) {
     
            Iterator var5 = partialResult.iterator();

            while(var5.hasNext()) {
     
                Object i = var5.next();
                putIntoList(i, myagg);
            }
        }

    }
    @Override
    public Object terminate(AggregationBuffer agg) throws HiveException {
     
        MkArrayAggregationBuffer myagg = (MkArrayAggregationBuffer)agg;
        List<Object> ret = new ArrayList(myagg.container.size());
        ret.addAll(myagg.container);
        return ret;
    }

    private void putIntoList(Object p, MkArrayAggregationBuffer agg) {
     
        Object pCopy = ObjectInspectorUtils.copyToStandardObject(p, this.inputOI);
        agg.container.add(pCopy);
    }
}

执行

hive> add jar /root/test/myUdf.jar;
hive> create function collect as 'com.hujiawei666.udf.GenericUDAFCollect';
hive> create table collection_test(name string,age int) row format delimited fields terminated by ' ';
ecs-d0b0:~/test # cat afile.txt 
hujiawei 11
hujiawei 22
wangning 33
wangning 44
hive> load data local inpath '/root/test/afile.txt' into table collection_test;
hive> select * from collection_test;
OK
hujiawei        11
hujiawei        22
wangning        33
wangning        44
hive> select collect(name) from collection_test;
["hujiawei","hujiawei","wangning","wangning"]
c
hive> select concat_ws(',',collect(name)) from collection_test;
hujiawei,hujiawei,wangning,wangning
hive> desc function concat_ws;
concat_ws(separator, [string | array(string)]+) - returns the concatenation of the strings separated by the separator.
--配合使用,打到mysql的group_concat效果
hive> select name,concat_ws(',',collect(cast(age as string))) from collection_test group by name;
hujiawei        11,22
wangning        33,44

UDTF

表生成函数,生成多列,多个值,

简单的示例

package com.hujiawei666.udf;

import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableConstantIntObjectInspector;
import org.apache.hadoop.io.IntWritable;

import java.util.ArrayList;

/**
 * @author hujw
 * date: 2019/12/25 9:24
 **/
public class GenericUDTFFor extends GenericUDTF {
       
    IntWritable start;
    IntWritable end;
    IntWritable inc;
    Object[] forwardObj=null;

    @Override
    public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException {
       
        start= ((WritableConstantIntObjectInspector)argOIs[0]).getWritableConstantValue();
        end= ((WritableConstantIntObjectInspector)argOIs[1]).getWritableConstantValue();
        if (argOIs.length == 3) {
       
            inc = ((WritableConstantIntObjectInspector) argOIs[2]).getWritableConstantValue();
        } else {
       
            inc = new IntWritable(1);
        }
        this.forwardObj = new Object[1];
        ArrayList<String> fieldNames = new ArrayList<>();
        ArrayList<ObjectInspector> fieldOIs = new ArrayList<>();
        fieldNames.add("col0");
        fieldOIs.add(
                PrimitiveObjectInspectorFactory.getPrimitiveJavaObjectInspector(
                        PrimitiveObjectInspector.PrimitiveCategory.INT)
        );
        return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs);
    }
    @Override
    public void process(Object[] objects) throws HiveException {
       
        for (int i = start.get(); i < end.get(); i = i + inc.get()) {
       
            this.forwardObj[0] = new Integer(i);
            //这边是使用父类的方法,开始下一列
            forward(forwardObj);
        }
    }

    @Override
    public void close() throws HiveException {
       

    }
}

hive> add jar /root/test/myUdf.jar;
Added [/root/test/myUdf.jar] to class path
Added resources: [/root/test/myUdf.jar]
hive> create function forx as 'com.hujiawei666.udf.GenericUDTFFor';
hive> select forx(1,5);
1
2
3
4
hive> select forx(1,100,10);
1
11
21
31
41
51
61
71
81
91

自带的parse_url_tuple函数
能够将url解析成多个部分

hive> desc function parse_url_tuple;
parse_url_tuple(url, partname1, partname2, ..., partnameN) - extracts N (N>=1) parts from a URL.
It takes a URL and one or multiple partnames, and returns a tuple. All the input parameters and output column types are string.

hive>select parse_url_tuple('https://blog.csdn.net/qq467215628','HOST','PATH');
blog.csdn.net   /qq467215628

返回自定义结构多列
自定义函数book能将"20191225|Hive Programing Note|Hujiawei,Dukai"类型的字符串解析成一行多列,护着一列多行

package com.hujiawei666.udf;

import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaStringObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.StringObjectInspector;
import org.apache.hadoop.io.Text;

import java.util.ArrayList;

/**
 * @author hujw
 * date: 2019/12/25 9:54
 **/
public class UDTFBook extends GenericUDTF {
       
    private Text sent;
    Object[] forwardObj=null;

    @Override
    public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException {
       
        ArrayList<String> fieldNames = new ArrayList<>();
        ArrayList<ObjectInspector> fieldOIs = new ArrayList<>();

        fieldNames.add("isbn");
        fieldOIs.add(PrimitiveObjectInspectorFactory.getPrimitiveJavaObjectInspector(PrimitiveObjectInspector.PrimitiveCategory.INT));

        fieldNames.add("title");
        fieldOIs.add(PrimitiveObjectInspectorFactory.getPrimitiveJavaObjectInspector(PrimitiveObjectInspector.PrimitiveCategory.STRING));

        fieldNames.add("author");
        fieldOIs.add(PrimitiveObjectInspectorFactory.getPrimitiveJavaObjectInspector(PrimitiveObjectInspector.PrimitiveCategory.STRING));

        forwardObj = new Object[3];
        return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs);
    }
    @Override
    public void process(Object[] objects) throws HiveException {
       
        String parts = objects[0].toString();
        String[] part=parts.split("\\|");
        forwardObj[0] = Integer.parseInt(part[0]);
        forwardObj[1] = part[1];
        forwardObj[2] = part[2];
        forward(forwardObj);
    }

    @Override
    public void close() throws HiveException {
       

    }
}

create table book(str string);
insert into book values('20191225|Hive Programing Note|Hujiawei,Dukai');
add jar /root/test/myUdf.jar;
 create function book as 'com.hujiawei666.udf.UDTFBook';
  select book(str) from book;
hive>  select book(str) from book;
20191225        Hive Programing Note    Hujiawei,Dukai

宏

hive> create temporary macro sigmod (x double)1.0/(1.0+exp(-x)); 
hive> select sigmod(2);
0.8807970779778823
hive> create temporary macro aa (x double) x*x;
hive> select aa(2);
4.0

stream

使用内置的cat,cut,sed转换

hive> create table aa(c1 int,c2 int);
hive> select * from aa;
OK
1       2
3       4
hive> select transform(c1,c2) using '/bin/cat' as newA,newB from aa;
1       2
3       4
--转换类型
hive> select transform(c1,c2) using '/bin/cat' as (newA int,newB double) from aa;
1       2.0
3       4.0
--字符串替换
hive> select transform(c1,c2) using '/bin/sed s/2/x/' from aa;
1       x
3       4

使用自定义脚本

新建 ctof.sh将摄氏温度转换为华氏温度

#!/usr/bin/env bash
while read line; do
res=`echo "scale=2;((9/5)*$line)+32"|bc`
echo $res
done

导入到hive中

hive> add file /root/test/ctof.sh;
hive> select transform(c1) using 'ctof.sh' as newA from aa;
33.80
37.40

使用perl脚本分割字符串
afile.txt

ecs-d0b0:~/test # cat afile.txt 
k1=v1,k2=v2
k3=v3,k4=v4,k5=v5

split_kv.pl

#!/user/bin/perl
while () {
     
    my $line=$_;
    chomp($line);
    my @kvs=split(/,/,$line);
    foreach my $p (@kvs){
     
        my @kv = split(/=/,$p);
        print $kv[0] . "\t" . $kv[1] . "\n";
    }
}

hive> create table tb_split(line string);
hive> load data local inpath '/root/test/afile.txt' into table tb_split;
hive> select * from tb_split;
k1=v1,k2=v2
k3=v3,k4=v4,k5=v5
hive> select transform(line) using 'perl /root/test/split_kv.pl' as  (key,value) from tb_split;
k1      v1
k2      v2
k3      v3
k4      v4
k5      v5

也可以执行c的程序

#include 
#include
int main(){
     
    char str[1000];
    char *token;
    while(fgets(str,1000,stdin)!=NULL){
     
        char *p;
        char *buff;
        buff=str;
        p = strsep(&buff, ",");
        while(p!=NULL){
     
            printf("%s\n", p);
            p = strsep(&buff, ",");
        }
    }
    return 0;
}

gcc test.c -o getValue

  hive> select transform(line) using '/root/test/getValue' as newLine from tb_split;
  k1=v1
  k2=v2

  k3=v3
  k4=v4
  k5=v5

使用perl达到聚合函数效果
sum.pl

#!/user/bin/perl
my $sum=0;
while(){
     
    my $line=$_;
    chomp($line);
    $sum=$sum+$line;
}

hive> select * from tb_sum;
1
2
3
2
3
4
hive> ADD FILE /root/test/sum.pl
hive> select transform(num) using 'perl sum.pl' as sum from tb_sum;
15

wordcount

直接用hive完成词频计数

man sh >sh.txt

hive> desc test;
line                    string         
hive> load data local inpath '/root/test/sh.txt' into table test;
hive> create table word as select word,count(1) as count from (select explode(split(line,' ')) as word from test) w group by word order by word;
hive> select * from word a where a.count >100;
        5829
If      110
a       113
and     128
be      221
command 151
current 110
cursor  105
is      128
of      239
shall   248
the     927
to      172

使用transform用py脚本执行
mapper.py

import sys
for line in sys.stdin:
    words=line.strip().split();
    for word in words:
        print "%s\t1" %(word.lower())

reducer.py

import sys
(last_key,last_count)=(None,0);
for line in sys.stdin:
    (key,count)=line.strip().split("\t")
    ##这边认为相同单词在一起,就是输入是分组的,利用cluster by可以实现
    if(last_key and last_key!=key):
        print "%s\t%d" % (last_key,last_count)
        (last_key,last_count)=(key,int(count))
    else:
        last_key=key
        last_count+=int(count)
if last_key:
    print "%s\t%d" % (last_key,last_count)

测试脚本

ecs-d0b0:~/test # echo 'new new old'|python mapper.py |python reducer.py     
new     2
old     1

hive中执行

 --注意这边的cluster by,将相同的单词分组了
 from (from test select transform(line) using 'python /root/test/mapper.py' as word,count cluster by word) wc insert overwrite  table word_count select transform(word,count) using 'python /root/test/reducer.py' as word,count; 
 hive> select * from word_count where count>100;
a       128
and     128
be      221
command 163
current 110
cursor  105
if      132
in      109
is      128
of      240
shall   248
the     994
to      172
--使用distribute和sort来替代cluster by
 from (from test select transform(line) using 'python /root/test/mapper.py' as word,count distribute by word sort by word desc) wc insert overwrite  table word_count select transform(word,count) using 'python /root/test/reducer.py' as word,count;

使用java处理stream

自定义记录格式

文本格式

可见文本格式,可以自定义分隔符等,可以方便的使用cat,sed,vi等工具查看文本内容,但是占空间

sequenceFile

是压缩的格式,内容为二进制编码,hive可以直接读写sequencefile,可以使用dfs -text 方便的查看文件内容

RCFile

纵表形式,key-value这种组合,

示例:


hive> select * from test;
11      12
21      22
hive> create table test_col(k int,v int) row format serde 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' stored as inputformat 'org.apache.hadoop.hive.ql.io.RCFileInputFormat' outputformat 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat';
hive> from test insert overwrite table test_col select c1,c2;
select * from test_col;
11      12
21      22

ecs-d0b0:~ # hive --service rcfilecat   'hdfs://localhost:9000/user/hive/warehouse/test_col/000000_0'
11      12
21      22

自定义输入格式

使用自定义inputformat,来创建dual表,无论有多少数据,只返回一行

package com.hujiawei666.hive;

import org.apache.hadoop.mapred.*;

import java.io.IOException;

/**
 * @author hujw
 * date: 2019/12/26 9:18
 **/
public class DualInputFormat3 implements InputFormat {
     
    @Override
    public InputSplit[] getSplits(JobConf jobConf, int i) throws IOException {
     
        InputSplit[] splits=new DualInputSplit[1];
        splits[0]=new DualInputSplit();
        return splits;
    }

    @Override
    public RecordReader getRecordReader(InputSplit inputSplit, JobConf jobConf, Reporter reporter) throws IOException {
     
        return new DualRecordReader(jobConf, inputSplit);
    }

}

package com.hujiawei666.hive;

import org.apache.hadoop.mapred.InputSplit;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

/**
 * @author hujw
 * date: 2019/12/26 9:20
 **/
public class DualInputSplit implements InputSplit {
     

    @Override
    public long getLength() throws IOException {
     
        return 1;
    }

    @Override
    public String[] getLocations() throws IOException {
     
        return new String[]{
     "localhost"};
    }

    @Override
    public void write(DataOutput dataOutput) throws IOException {
     

    }

    @Override
    public void readFields(DataInput dataInput) throws IOException {
     

    }
}

package com.hujiawei666.hive;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.InputSplit;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.RecordReader;

import java.io.IOException;

/**
 * @author hujw
 * date: 2019/12/26 9:21
 **/
public class DualRecordReader implements RecordReader {
     
    boolean hasNext =true;

    public DualRecordReader(JobConf jc, InputSplit s) {
     
    }

    public DualRecordReader() {
     
    }

    /**
     * 只返回一行,第一次取有一行,后面取都没有了
     * @param o
     * @param o2
     * @return
     * @throws IOException
     */
    @Override
    public boolean next(Object o, Object o2) throws IOException {
     
        if (hasNext) {
     
            hasNext = false;
            return true;
        } else {
     
            return hasNext;
        }
    }

    @Override
    public Object createKey() {
     
        return new Text("");
    }

    @Override
    public Object createValue() {
     
        return new Text("");
    }

    @Override
    public long getPos() throws IOException {
     
        return 0;
    }

    @Override
    public void close() throws IOException {
     

    }

    @Override
    public float getProgress() throws IOException {
     
        if (hasNext) {
     
            return 0.0f;
        } else {
     
            return 1.0f;
        }
    }
}

hive>  add jar /root/test/myUdf3.jar;
hive> create table dual(fake String) stored as inputformat 'com.hujiawei666.hive.DualInputFormat3' outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
--这边插入两行,
hive> insert into dual values("");
hive> insert into dual values("");
--执行算数,只返回一行,不需要加limit
hive> select 1+1 from dual;
2
--查询所有数据也查不出
> select * from dual;
OK

Time taken: 0.172 seconds, Fetched: 1 row(s)
--查询count是两个
hive> select count(1)  from dual;
OK
2
Time taken: 0.153 seconds, Fetched: 1 row(s)

使用正则表达式过滤日志

截取一段日志,样例见[附录4](#附录4 日志样例)

正则表达式为:

(\[\w+\]) ([\d+-:\. ]+) (\[[\w\d-]*\]) - ([\.\w+]*) - ([\.\w+]*) - ([=><]*) ( *\w+: )(.*$)

hive中使用正则表达式将日志的分字段导入到表中

--注意这边的正则表达式,要和java中运行的正则相同,即\要换成\\,进行两次转义
create table tb_log(
    level string,
    time string,
    proc_id string,
    class string,
    log_class string,
    direct string,
    method string,
    result string
)row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
with serdeproperties(
"input.regex"="(\\[\\w+\\]) ([\\d+-:\\. ]+) (\\[[\\w\\d-]*\\]) - ([\\.\\w+]*) - ([\\.\\w+]*) - ([=><]*) ( *\\w+: )(.*$)",
"output.format.string"="%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s"    
)
stored as textfile;
load data local inpath '/root/test/test.log' into table tb_log;
hive> select time from tb_log;
2019-11-11 15:03:05.152
2019-11-11 15:03:05.152
2019-11-11 15:03:05.162
...

xml

xpath函数示例:

 select xpath('b1b2','//@id');
["foo","bar"]
hive> select xpath('b1b2b3c1c2','a/*[@class="bb"]/text()');
["b1","c1"]
hive> select xpath_int('24','a/b+a/c');
6

xpath语法见 https://www.w3school.com.cn/xpath/xpath_syntax.asp

json

hive自带的jsonserde,需要添加配置,或者add jar

/opt/install/hadoop/apache-hive-2.3.6-bin/conf/hive-site.xml

<property>
   <name>hive.aux.jars.pathname>
   <value>/opt/install/hadoop/apache-hive-2.3.6-bin/lib/hive-hcatalog-core-2.3.6.jarvalue>
property>

{
     
	"name": "tom",
	"sex": 1,
	"age": 22
}

 hs -put test.json /user/testJson/

create external table json_param(
name string,
sex int,
age int
)row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
with serdeproperties(
"name"="$.name",
"sex"="$.sex",
"age"="$.age"
)location '/user/testJson';
select * from json_param

附录1 内置函数大全

参见: https://blog.csdn.net/TheRa1nMan/article/details/89408718

数学函数

Return Type	Name (Signature)	Description
DOUBLE	round(DOUBLE a)	Returns the rounded `BIGINT` value of `a`.返回对a四舍五入的BIGINT值
DOUBLE	round(DOUBLE a, INT d)	Returns `a` rounded to `d` decimal places.返回DOUBLE型d的保留n位小数的DOUBLW型的近似值
DOUBLE	bround(DOUBLE a)	Returns the rounded BIGINT value of `a` using HALF_EVEN rounding mode (as of Hive 1.3.0, 2.0.0). Also known as Gaussian rounding or bankers’ rounding. Example: bround(2.5) = 2, bround(3.5) = 4. 银行家舍入法（1_4：舍，69：进，5->前位数是偶：舍，5->前位数是奇：进）
DOUBLE	bround(DOUBLE a, INT d)	Returns `a` rounded to `d` decimal places using HALF_EVEN rounding mode (as of Hive 1.3.0, 2.0.0). Example: bround(8.25, 1) = 8.2, bround(8.35, 1) = 8.4. 银行家舍入法,保留d位小数
BIGINT	floor(DOUBLE a)	Returns the maximum `BIGINT` value that is equal to or less than `a`向下取整，最数轴上最接近要求的值的左边的值如：6.10->6 -3.4->-4
BIGINT	ceil(DOUBLE a), ceiling(DOUBLE a)	Returns the minimum BIGINT value that is equal to or greater than `a`.求其不小于小给定实数的最小整数如：ceil(6) = ceil(6.1)= ceil(6.9) = 6
DOUBLE	rand(), rand(INT seed)	Returns a random number (that changes from row to row) that is distributed uniformly from 0 to 1. Specifying the seed will make sure the generated random number sequence is deterministic.每行返回一个DOUBLE型随机数seed是随机因子
DOUBLE	exp(DOUBLE a), exp(DECIMAL a)	Returns `ea` where `e` is the base of the natural logarithm. Decimal version added in Hive 0.13.0.返回e的a幂次方， a可为小数
DOUBLE	ln(DOUBLE a), ln(DECIMAL a)	Returns the natural logarithm of the argument `a`. Decimal version added in Hive 0.13.0.以自然数为底d的对数，a可为小数
DOUBLE	log10(DOUBLE a), log10(DECIMAL a)	Returns the base-10 logarithm of the argument `a`. Decimal version added in Hive 0.13.0.以10为底d的对数，a可为小数
DOUBLE	log2(DOUBLE a), log2(DECIMAL a)	Returns the base-2 logarithm of the argument `a`. Decimal version added in Hive 0.13.0.以2为底数d的对数，a可为小数
DOUBLE	log(DOUBLE base, DOUBLE a)log(DECIMAL base, DECIMAL a)	Returns the base-`base` logarithm of the argument `a`. Decimal versions added in Hive 0.13.0.以base为底的对数，base 与 a都是DOUBLE类型
DOUBLE	pow(DOUBLE a, DOUBLE p), power(DOUBLE a, DOUBLE p)	Returns `ap`.计算a的p次幂
DOUBLE	sqrt(DOUBLE a), sqrt(DECIMAL a)	Returns the square root of `a`. Decimal version added in Hive 0.13.0.计算a的平方根
STRING	bin(BIGINT a)	Returns the number in binary format (see http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_bin).计算二进制a的STRING类型，a为BIGINT类型
STRING	hex(BIGINT a) hex(STRING a) hex(BINARY a)	If the argument is an `INT` or `binary`, `hex` returns the number as a `STRING` in hexadecimal format. Otherwise if the number is a `STRING`, it converts each character into its hexadecimal representation and returns the resulting `STRING`. (Seehttp://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_hex, `BINARY` version as of Hive 0.12.0.)计算十六进制a的STRING类型，如果a为STRING类型就转换成字符相对应的十六进制
BINARY	unhex(STRING a)	Inverse of hex. Interprets each pair of characters as a hexadecimal number and converts to the byte representation of the number. (`BINARY` version as of Hive 0.12.0, used to return a string.)hex的逆方法
STRING	conv(BIGINT num, INT from_base, INT to_base), conv(STRING num, INT from_base, INT to_base)	Converts a number from a given base to another (see http://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html#function_conv).将GIGINT/STRING类型的num从from_base进制转换成to_base进制
DOUBLE	abs(DOUBLE a)	Returns the absolute value.计算a的绝对值
INT or DOUBLE	pmod(INT a, INT b), pmod(DOUBLE a, DOUBLE b)	Returns the positive value of `a mod b`.a对b取模
DOUBLE	sin(DOUBLE a), sin(DECIMAL a)	Returns the sine of `a` (`a` is in radians). Decimal version added in Hive 0.13.0.求a的正弦值
DOUBLE	asin(DOUBLE a), asin(DECIMAL a)	Returns the arc sin of `a` if -1<=a<=1 or NULL otherwise. Decimal version added in Hive 0.13.0.求d的反正弦值
DOUBLE	cos(DOUBLE a), cos(DECIMAL a)	Returns the cosine of `a` (`a` is in radians). Decimal version added in Hive 0.13.0.求余弦值
DOUBLE	acos(DOUBLE a), acos(DECIMAL a)	Returns the arccosine of `a` if -1<=a<=1 or NULL otherwise. Decimal version added in Hive 0.13.0.求反余弦值
DOUBLE	tan(DOUBLE a), tan(DECIMAL a)	Returns the tangent of `a` (`a` is in radians). Decimal version added in Hive 0.13.0.求正切值
DOUBLE	atan(DOUBLE a), atan(DECIMAL a)	Returns the arctangent of `a`. Decimal version added in Hive 0.13.0.求反正切值
DOUBLE	degrees(DOUBLE a), degrees(DECIMAL a)	Converts value of `a` from radians to degrees. Decimal version added in Hive 0.13.0.奖弧度值转换角度值
DOUBLE	radians(DOUBLE a), radians(DOUBLE a)	Converts value of `a` from degrees to radians. Decimal version added in Hive 0.13.0.将角度值转换成弧度值
INT or DOUBLE	positive(INT a), positive(DOUBLE a)	Returns `a`.返回a
INT or DOUBLE	negative(INT a), negative(DOUBLE a)	Returns `-a`.返回a的相反数
DOUBLE or INT	sign(DOUBLE a), sign(DECIMAL a)	Returns the sign of `a` as ‘1.0’ (if `a` is positive) or ‘-1.0’ (if `a` is negative), ‘0.0’ otherwise. The decimal version returns INT instead of DOUBLE. Decimal version added in Hive 0.13.0.如果a是正数则返回1.0，是负数则返回-1.0，否则返回0.0
DOUBLE	e()	Returns the value of `e`.数学常数e
DOUBLE	pi()	Returns the value of `pi`.数学常数pi
BIGINT	factorial(INT a)	Returns the factorial of `a` (as of Hive 1.2.0). Valid `a` is [0…20]. 求a的阶乘
DOUBLE	cbrt(DOUBLE a)	Returns the cube root of `a` double value (as of Hive 1.2.0). 求a的立方根
INT BIGINT	shiftleft(TINYINT\|SMALLINT\|INT a, INT b)shiftleft(BIGINT a, INT b)	Bitwise left shift (as of Hive 1.2.0). Shifts `a` `b` positions to the left.Returns int for tinyint, smallint and int `a`. Returns bigint for bigint `a`.按位左移
INTBIGINT	shiftright(TINYINT\|SMALLINT\|INT a, INTb)shiftright(BIGINT a, INT b)	Bitwise right shift (as of Hive 1.2.0). Shifts `a` `b` positions to the right.Returns int for tinyint, smallint and int `a`. Returns bigint for bigint `a`.按拉右移
INTBIGINT	shiftrightunsigned(TINYINT\|SMALLINT\|INTa, INT b),shiftrightunsigned(BIGINT a, INT b)	Bitwise unsigned right shift (as of Hive 1.2.0). Shifts `a` `b` positions to the right.Returns int for tinyint, smallint and int `a`. Returns bigint for bigint `a`.无符号按位右移（<<<）
T	greatest(T v1, T v2, …)	Returns the greatest value of the list of values (as of Hive 1.1.0). Fixed to return NULL when one or more arguments are NULL, and strict type restriction relaxed, consistent with “>” operator (as of Hive 2.0.0). 求最大值
T	least(T v1, T v2, …)	Returns the least value of the list of values (as of Hive 1.1.0). Fixed to return NULL when one or more arguments are NULL, and strict type restriction relaxed, consistent with “<” operator (as of Hive 2.0.0). 求最小值

集合函数

Return Type	Name(Signature)	Description
int	size(Map)	Returns the number of elements in the map type.求map的长度
int	size(Array)	Returns the number of elements in the array type.求数组的长度
array	map_keys(Map)	Returns an unordered array containing the keys of the input map.返回map中的所有key
array	map_values(Map)	Returns an unordered array containing the values of the input map.返回map中的所有value
boolean	array_contains(Array, value)	Returns TRUE if the array contains value.如该数组Array包含value返回true。，否则返回false
array	sort_array(Array)	Sorts the input array in ascending order according to the natural ordering of the array elements and returns it (as of version 0.9.0).按自然顺序对数组进行排序并返回

类型转换函数

Return Type	Name(Signature)	Description
binary	binary(string\|binary)	Casts the parameter into a binary.将输入的值转换成二进制
Expected “=” to follow "type"	cast(expr as )	Converts the results of the expression expr to . For example, cast(‘1’ as BIGINT) will convert the string ‘1’ to its integral representation. A null is returned if the conversion does not succeed. If cast(expr as boolean) Hive returns true for a non-empty string.将expr转换成type类型如：cast(“1” as BIGINT) 将字符串1转换成了BIGINT类型，如果转换失败将返回NULL

日期函数

Return Type	Name(Signature)	Description
string	from_unixtime(bigint unixtime[, string format])	Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the format of “1970-01-01 00:00:00”.将时间的秒值转换成format格式（format可为“yyyy-MM-dd hh:mm:ss”,“yyyy-MM-dd hh”,“yyyy-MM-dd hh:mm”等等）如from_unixtime(1250111000,“yyyy-MM-dd”) 得到2009-03-12
bigint	unix_timestamp()	Gets current Unix timestamp in seconds.获取本地时区下的时间戳
bigint	unix_timestamp(string date)	Converts time string in format `yyyy-MM-dd HH:mm:ss` to Unix timestamp (in seconds), using the default timezone and the default locale, return 0 if fail: unix_timestamp(‘2009-03-20 11:30:01’) = 1237573801将格式为yyyy-MM-dd HH:mm:ss的时间字符串转换成时间戳如unix_timestamp(‘2009-03-20 11:30:01’) = 1237573801
bigint	unix_timestamp(string date, string pattern)	Convert time string with given pattern (see [http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html]) to Unix time stamp (in seconds), return 0 if fail: unix_timestamp(‘2009-03-20’, ‘yyyy-MM-dd’) = 1237532400.将指定时间字符串格式字符串转换成Unix时间戳，如果格式不对返回0 如：unix_timestamp(‘2009-03-20’, ‘yyyy-MM-dd’) = 1237532400
string	to_date(string timestamp)	Returns the date part of a timestamp string: to_date(“1970-01-01 00:00:00”) = “1970-01-01”.返回时间字符串的日期部分
int	year(string date)	Returns the year part of a date or a timestamp string: year(“1970-01-01 00:00:00”) = 1970, year(“1970-01-01”) = 1970.返回时间字符串的年份部分
int	quarter(date/timestamp/string)	Returns the quarter of the year for a date, timestamp, or string in the range 1 to 4 (as of Hive 1.3.0). Example: quarter(‘2015-04-08’) = 2.返回当前时间属性哪个季度如quarter(‘2015-04-08’) = 2
int	month(string date)	Returns the month part of a date or a timestamp string: month(“1970-11-01 00:00:00”) = 11, month(“1970-11-01”) = 11.返回时间字符串的月份部分
int	day(string date) dayofmonth(date)	Returns the day part of a date or a timestamp string: day(“1970-11-01 00:00:00”) = 1, day(“1970-11-01”) = 1.返回时间字符串的天
int	hour(string date)	Returns the hour of the timestamp: hour(‘2009-07-30 12:58:59’) = 12, hour(‘12:58:59’) = 12.返回时间字符串的小时
int	minute(string date)	Returns the minute of the timestamp.返回时间字符串的分钟
int	second(string date)	Returns the second of the timestamp.返回时间字符串的秒
int	weekofyear(string date)	Returns the week number of a timestamp string: weekofyear(“1970-11-01 00:00:00”) = 44, weekofyear(“1970-11-01”) = 44.返回时间字符串位于一年中的第几个周内如weekofyear(“1970-11-01 00:00:00”) = 44, weekofyear(“1970-11-01”) = 44
int	datediff(string enddate, string startdate)	Returns the number of days from startdate to enddate: datediff(‘2009-03-01’, ‘2009-02-27’) = 2.计算开始时间startdate到结束时间enddate相差的天数
string	date_add(string startdate, int days)	Adds a number of days to startdate: date_add(‘2008-12-31’, 1) = ‘2009-01-01’.从开始时间startdate加上days
string	date_sub(string startdate, int days)	Subtracts a number of days to startdate: date_sub(‘2008-12-31’, 1) = ‘2008-12-30’.从开始时间startdate减去days
timestamp	from_utc_timestamp(timestamp, string timezone)	Assumes given timestamp is UTC and converts to given timezone (as of Hive 0.8.0). For example, from_utc_timestamp(‘1970-01-01 08:00:00’,‘PST’) returns 1970-01-01 00:00:00.如果给定的时间戳并非UTC，则将其转化成指定的时区下时间戳
timestamp	to_utc_timestamp(timestamp, string timezone)	Assumes given timestamp is in given timezone and converts to UTC (as of Hive 0.8.0). For example, to_utc_timestamp(‘1970-01-01 00:00:00’,‘PST’) returns 1970-01-01 08:00:00.如果给定的时间戳指定的时区下时间戳，则将其转化成UTC下的时间戳
date	current_date	Returns the current date at the start of query evaluation (as of Hive 1.2.0). All calls of current_date within the same query return the same value.返回当前时间日期
timestamp	current_timestamp	Returns the current timestamp at the start of query evaluation (as of Hive 1.2.0). All calls of current_timestamp within the same query return the same value.返回当前时间戳
string	add_months(string start_date, int num_months)	Returns the date that is num_months after start_date (as of Hive 1.1.0). start_date is a string, date or timestamp. num_months is an integer. The time part of start_date is ignored. If start_date is the last day of the month or if the resulting month has fewer days than the day component of start_date, then the result is the last day of the resulting month. Otherwise, the result has the same day component as start_date.返回当前时间下再增加num_months个月的日期
string	last_day(string date)	Returns the last day of the month which the date belongs to (as of Hive 1.1.0). date is a string in the format ‘yyyy-MM-dd HH:mm:ss’ or ‘yyyy-MM-dd’. The time part of date is ignored.返回这个月的最后一天的日期，忽略时分秒部分（HH:mm:ss）
string	next_day(string start_date, string day_of_week)	Returns the first date which is later than start_date and named as day_of_week (as of Hive1.2.0). start_date is a string/date/timestamp. day_of_week is 2 letters, 3 letters or full name of the day of the week (e.g. Mo, tue, FRIDAY). The time part of start_date is ignored. Example: next_day(‘2015-01-14’, ‘TU’) = 2015-01-20.返回当前时间的下一个星期X所对应的日期如：next_day(‘2015-01-14’, ‘TU’) = 2015-01-20 以2015-01-14为开始时间，其下一个星期二所对应的日期为2015-01-20
string	trunc(string date, string format)	Returns date truncated to the unit specified by the format (as of Hive 1.2.0). Supported formats: MONTH/MON/MM, YEAR/YYYY/YY. Example: trunc(‘2015-03-17’, ‘MM’) = 2015-03-01.返回时间的最开始年份或月份如trunc(“2016-06-26”,“MM”)=2016-06-01 trunc(“2016-06-26”,“YY”)=2016-01-01 注意所支持的格式为MONTH/MON/MM, YEAR/YYYY/YY
double	months_between(date1, date2)	Returns number of months between dates date1 and date2 (as of Hive 1.2.0). If date1 is later than date2, then the result is positive. If date1 is earlier than date2, then the result is negative. If date1 and date2 are either the same days of the month or both last days of months, then the result is always an integer. Otherwise the UDF calculates the fractional portion of the result based on a 31-day month and considers the difference in time components date1 and date2. date1 and date2 type can be date, timestamp or string in the format ‘yyyy-MM-dd’ or ‘yyyy-MM-dd HH:mm:ss’. The result is rounded to 8 decimal places. Example: months_between(‘1997-02-28 10:30:00’, ‘1996-10-30’) = 3.94959677**返回date1与date2之间相差的月份，如date1>date2，则返回正，如果date1
string	date_format(date/timestamp/string ts, string fmt)	Converts a date/timestamp/string to a value of string in the format specified by the date format fmt (as of Hive 1.2.0). Supported formats are Java SimpleDateFormat formats –https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html. The second argument fmt should be constant. Example: date_format(‘2015-04-08’, ‘y’) = ‘2015’.date_format can be used to implement other UDFs, e.g.:dayname(date) is date_format(date, ‘EEEE’)dayofyear(date) is date_format(date, ‘D’)按指定格式返回时间date 如：date_format(“2016-06-22”,“MM-dd”)=06-22

条件函数

Return Type	Name(Signature)	Description
T	if(boolean testCondition, T valueTrue, T valueFalseOrNull)	Returns valueTrue when testCondition is true, returns valueFalseOrNull otherwise.如果testCondition 为true就返回valueTrue,否则返回valueFalseOrNull ，（valueTrue，valueFalseOrNull为泛型）
T	nvl(T value, T default_value)	Returns default value if value is null else returns value (as of HIve 0.11).如果value值为NULL就返回default_value,否则返回value
T	COALESCE(T v1, T v2, …)	Returns the first v that is not NULL, or NULL if all v’s are NULL.返回第一非null的值，如果全部都为NULL就返回NULL 如：COALESCE (NULL,44,55)=44/strong>
T	CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END	When a = b, returns c; when a = d, returns e; else returns f.如果a=b就返回c,a=d就返回e，否则返回f 如CASE 4 WHEN 5 THEN 5 WHEN 4 THEN 4 ELSE 3 END 将返回4
T	CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END	When a = true, returns b; when c = true, returns d; else returns e.如果a=ture就返回b,c= ture就返回d，否则返回e 如：CASE WHEN 5>0 THEN 5 WHEN 4>0 THEN 4 ELSE 0 END 将返回5；CASE WHEN 5<0 THEN 5 WHEN 4<0 THEN 4 ELSE 0 END 将返回0
boolean	isnull( a )	Returns true if a is NULL and false otherwise.如果a为null就返回true，否则返回false
boolean	isnotnull ( a )	Returns true if a is not NULL and false otherwise.如果a为非null就返回true，否则返回false

字符函数

Return Type	Name(Signature)	Description
int	ascii(string str)	Returns the numeric value of the first character of str.返回str中首个ASCII字符串的整数值
string	base64(binary bin)	Converts the argument from binary to a base 64 string (as of Hive 0.12.0)…将二进制bin转换成64位的字符串
string	concat(string\|binary A, string\|binary B…)	Returns the string or bytes resulting from concatenating the strings or bytes passed in as parameters in order. For example, concat(‘foo’, ‘bar’) results in ‘foobar’. Note that this function can take any number of input strings…对二进制字节码或字符串按次序进行拼接
array>	context_ngrams(array, array, int K, int pf)	Returns the top-k contextual N-grams from a set of tokenized sentences, given a string of “context”. See StatisticsAndDataMining for more information…与ngram类似，但context_ngram()允许你预算指定上下文(数组)来去查找子序列，具体看StatisticsAndDataMining(这里的解释更易懂)
string	concat_ws(string SEP, string A, string B…)	Like concat() above, but with custom separator SEP…与concat()类似，但使用指定的分隔符喜进行分隔
string	concat_ws(string SEP, array)	Like concat_ws() above, but taking an array of strings. (as of Hive 0.9.0).拼接Array中的元素并用指定分隔符进行分隔
string	decode(binary bin, string charset)	Decodes the first argument into a String using the provided character set (one of ‘US-ASCII’, ‘ISO-8859-1’, ‘UTF-8’, ‘UTF-16BE’, ‘UTF-16LE’, ‘UTF-16’). If either argument is null, the result will also be null. (As of Hive 0.12.0.).使用指定的字符集charset将二进制值bin解码成字符串，支持的字符集有：‘US-ASCII’, ‘ISO-8859-1’, ‘UTF-8’, ‘UTF-16BE’, ‘UTF-16LE’, ‘UTF-16’，如果任意输入参数为NULL都将返回NULL
binary	encode(string src, string charset)	Encodes the first argument into a BINARY using the provided character set (one of ‘US-ASCII’, ‘ISO-8859-1’, ‘UTF-8’, ‘UTF-16BE’, ‘UTF-16LE’, ‘UTF-16’). If either argument is null, the result will also be null. (As of Hive 0.12.0.).使用指定的字符集charset将字符串编码成二进制值，支持的字符集有：‘US-ASCII’, ‘ISO-8859-1’, ‘UTF-8’, ‘UTF-16BE’, ‘UTF-16LE’, ‘UTF-16’，如果任一输入参数为NULL都将返回NULL
int	find_in_set(string str, string strList)	Returns the first occurance of str in strList where strList is a comma-delimited string. Returns null if either argument is null. Returns 0 if the first argument contains any commas. For example, find_in_set(‘ab’, ‘abc,b,ab,c,def’) returns 3…返回以逗号分隔的字符串中str出现的位置，如果参数str为逗号或查找失败将返回0，如果任一参数为NULL将返回NULL回
string	format_number(number x, int d)	Formats the number X to a format like ‘#,###,###.##’, rounded to D decimal places, and returns the result as a string. If D is 0, the result has no decimal point or fractional part. (As of Hive 0.10.0; bug with float types fixed in Hive 0.14.0, decimal type support added in Hive 0.14.0).将数值X转换成"#,###,###.##"格式字符串，并保留d位小数，如果d为0，将进行四舍五入且不保留小数
string	get_json_object(string json_string, string path)	Extracts json object from a json string based on json path specified, and returns json string of the extracted json object. It will return null if the input json string is invalid. *NOTE: The json path can only have the characters [0-9a-z_], i.e., no upper-case or special characters. Also, the keys cannot start with numbers.*** This is due to restrictions on Hive column names…从指定路径上的JSON字符串抽取出JSON对象，并返回这个对象的JSON格式，如果输入的JSON是非法的将返回NULL,注意此路径上JSON字符串只能由数字字母下划线组成且不能有大写字母和特殊字符，且key不能由数字开头，这是由于Hive对列名的限制
boolean	in_file(string str, string filename)	Returns true if the string str appears as an entire line in filename…如果文件名为filename的文件中有一行数据与字符串str匹配成功就返回true
int	instr(string str, string substr)	Returns the position of the first occurrence of `substr` in `str`. Returns `null` if either of the arguments are `null` and returns `0` if `substr` could not be found in `str`. Be aware that this is not zero based. The first character in `str` has index 1…查找字符串str中子字符串substr出现的位置，如果查找失败将返回0，如果任一参数为Null将返回null，注意位置为从1开始的
int	length(string A)	Returns the length of the string…返回字符串的长度
int	locate(string substr, string str[, int pos])	Returns the position of the first occurrence of substr in str after position pos…查找字符串str中的pos位置后字符串substr第一次出现的位置
string	lower(string A) lcase(string A)	Returns the string resulting from converting all characters of B to lower case. For example, lower(‘fOoBaR’) results in ‘foobar’…将字符串A的所有字母转换成小写字母
string	lpad(string str, int len, string pad)	Returns str, left-padded with pad to a length of len…从左边开始对字符串str使用字符串pad填充，最终len长度为止，如果字符串str本身长度比len大的话，将去掉多余的部分
string	ltrim(string A)	Returns the string resulting from trimming spaces from the beginning(left hand side) of A. For example, ltrim(’ foobar ') results in 'foobar '…去掉字符串A前面的空格
array>	ngrams(array, int N, int K, int pf)	Returns the top-k N-grams from a set of tokenized sentences, such as those returned by the sentences() UDAF. See StatisticsAndDataMining for more information…返回出现次数TOP K的的子序列,n表示子序列的长度，具体看StatisticsAndDataMining (这里的解释更易懂)
string	parse_url(string urlString, string partToExtract [, string keyToExtract])	Returns the specified part from the URL. Valid values for partToExtract include HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO. For example, parse_url(‘http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1’, ‘HOST’) returns ‘facebook.com’. Also a value of a particular key in QUERY can be extracted by providing the key as the third argument, for example, parse_url(‘http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1’, ‘QUERY’, ‘k1’) returns ‘v1’…返回从URL中抽取指定部分的内容，参数url是URL字符串，而参数partToExtract是要抽取的部分，这个参数包含(HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO,例如：parse_url(‘http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1’, ‘HOST’) =‘facebook.com’，如果参数partToExtract值为QUERY则必须指定第三个参数key 如：parse_url(‘http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1’, ‘QUERY’, ‘k1’) =‘v1’
string	printf(String format, Obj… args)	Returns the input formatted according do printf-style format strings (as of Hive0.9.0)…按照printf风格格式输出字符串
string	regexp_extract(string subject, string pattern, int index)	Returns the string extracted using the pattern. For example, regexp_extract(‘foothebar’, ‘foo(.*?)(bar)’, 2) returns ‘bar.’ Note that some care is necessary in using predefined character classes: using ‘\s’ as the second argument will match the letter s; ‘\s’ is necessary to match whitespace, etc. The ‘index’ parameter is the Java regex Matcher group() method index. See docs/api/java/util/regex/Matcher.html for more information on the ‘index’ or Java regex group() method…抽取字符串subject中符合正则表达式pattern的第index个部分的子字符串，注意些预定义字符的使用，如第二个参数如果使用’\s’将被匹配到s,’\s’才是匹配空格
string	regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT)	Returns the string resulting from replacing all substrings in INITIAL_STRING that match the java regular expression syntax defined in PATTERN with instances of REPLACEMENT. For example, regexp_replace(“foobar”, “oo\|ar”, “”) returns ‘fb.’ Note that some care is necessary in using predefined character classes: using ‘\s’ as the second argument will match the letter s; ‘\s’ is necessary to match whitespace, etc…按照Java正则表达式PATTERN将字符串INTIAL_STRING中符合条件的部分成REPLACEMENT所指定的字符串，如里REPLACEMENT这空的话，抽符合正则的部分将被去掉如：regexp_replace(“foobar”, “oo\|ar”, “”) = ‘fb.’ 注意些预定义字符的使用，如第二个参数如果使用’\s’将被匹配到s,’\s’才是匹配空格
string	repeat(string str, int n)	Repeats str n times…重复输出n次字符串str
string	reverse(string A)	Returns the reversed string…反转字符串
string	rpad(string str, int len, string pad)	Returns str, right-padded with pad to a length of len…从右边开始对字符串str使用字符串pad填充，最终len长度为止，如果字符串str本身长度比len大的话，将去掉多余的部分
string	rtrim(string A)	Returns the string resulting from trimming spaces from the end(right hand side) of A. For example, rtrim(’ foobar ‘) results in ’ foobar’…去掉字符串后面出现的空格
array	sentences(string str, string lang, string locale)	Tokenizes a string of natural language text into words and sentences, where each sentence is broken at the appropriate sentence boundary and returned as an array of words. The ‘lang’ and ‘locale’ are optional arguments. For example, sentences(‘Hello there! How are you?’) returns ( (“Hello”, “there”), (“How”, “are”, “you”) )…字符串str将被转换成单词数组，如：sentences(‘Hello there! How are you?’) =( (“Hello”, “there”), (“How”, “are”, “you”) )
string	space(int n)	Returns a string of n spaces…返回n个空格
array	split(string str, string pat)	Splits str around pat (pat is a regular expression)…按照正则表达式pat来分割字符串str,并将分割后的数组字符串的形式返回
map	str_to_map(text[, delimiter1, delimiter2])	Splits text into key-value pairs using two delimiters. Delimiter1 separates text into K-V pairs, and Delimiter2 splits each K-V pair. Default delimiters are ‘,’ for delimiter1 and ‘=’ for delimiter2…将字符串str按照指定分隔符转换成Map，第一个参数是需要转换字符串，第二个参数是键值对之间的分隔符，默认为逗号;第三个参数是键值之间的分隔符，默认为"="
string	substr(string\|binary A, int start) substring(string\|binary A, int start)	Returns the substring or slice of the byte array of A starting from start position till the end of string A. For example, substr(‘foobar’, 4) results in ‘bar’ (see [http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_substr])…对于字符串A,从start位置开始截取字符串并返回
string	substr(string\|binary A, int start, int len) substring(string\|binary A, int start, int len)	Returns the substring or slice of the byte array of A starting from start position with length len. For example, substr(‘foobar’, 4, 1) results in ‘b’ (see [http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_substr])…对于二进制/字符串A,从start位置开始截取长度为length的字符串并返回
string	substring_index(string A, string delim, int count)	Returns the substring from string A before count occurrences of the delimiter delim (as of Hive 1.3.0). If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. Substring_index performs a case-sensitive match when searching for delim. Example: substring_index(‘www.apache.org’, ‘.’, 2) = ‘www.apache’…截取第count分隔符之前的字符串，如count为正则从左边开始截取，如果为负则从右边开始截取
string	translate(string\|char\|varchar input, string\|char\|varchar from, string\|char\|varchar to)	Translates the input string by replacing the characters present in the `from` string with the corresponding characters in the `to` string. This is similar to the `translate`function in PostgreSQL. If any of the parameters to this UDF are NULL, the result is NULL as well. (Available as of Hive 0.10.0, for string types)Char/varchar support added as of Hive 0.14.0…将input出现在from中的字符串替换成to中的字符串如：translate(“MOBIN”,“BIN”,“M”)="MOM"
string	trim(string A)	Returns the string resulting from trimming spaces from both ends of A. For example, trim(’ foobar ') results in ‘foobar’.将字符串A前后出现的空格去掉
binary	unbase64(string str)	Converts the argument from a base 64 string to BINARY. (As of Hive 0.12.0.).将64位的字符串转换二进制值
string	upper(string A) ucase(string A)	Returns the string resulting from converting all characters of A to upper case. For example, upper(‘fOoBaR’) results in ‘FOOBAR’…将字符串A中的字母转换成大写字母
string	initcap(string A)	Returns string, with the first letter of each word in uppercase, all other letters in lowercase. Words are delimited by whitespace. (As of Hive 1.1.0.).将字符串A转换第一个字母大写其余字母的字符串
int	levenshtein(string A, string B)	Returns the Levenshtein distance between two strings (as of Hive 1.2.0). For example, levenshtein(‘kitten’, ‘sitting’) results in 3…计算两个字符串之间的差异大小如：levenshtein(‘kitten’, ‘sitting’) = 3
string	soundex(string A)	Returns soundex code of the string (as of Hive 1.2.0). For example, soundex(‘Miller’) results in M460…将普通字符串转换成soundex字符串

聚合函数

Return Type	Name(Signature)	Description
BIGINT	count(*), count(expr), count(DISTINCT expr[, expr…])	count(*) - Returns the total number of retrieved rows, including rows containing NULL values.统计总行数，包括含有NULL值的行count(expr) - Returns the number of rows for which the supplied expression is non-NULL.统计提供非NULL的expr表达式值的行数count(DISTINCT expr[, expr]) - Returns the number of rows for which the supplied expression(s) are unique and non-NULL. Execution of this can be optimized with hive.optimize.distinct.rewrite.统计提供非NULL且去重后的expr表达式值的行数
DOUBLE	sum(col), sum(DISTINCT col)	Returns the sum of the elements in the group or the sum of the distinct values of the column in the group.sum(col),表示求指定列的和，sum(DISTINCT col)表示求去重后的列的和
DOUBLE	avg(col), avg(DISTINCT col)	Returns the average of the elements in the group or the average of the distinct values of the column in the group.avg(col),表示求指定列的平均值，avg(DISTINCT col)表示求去重后的列的平均值
DOUBLE	min(col)	Returns the minimum of the column in the group.求指定列的最小值
DOUBLE	max(col)	Returns the maximum value of the column in the group.求指定列的最大值
DOUBLE	variance(col), var_pop(col)	Returns the variance of a numeric column in the group.求指定列数值的方差
DOUBLE	var_samp(col)	Returns the unbiased sample variance of a numeric column in the group.求指定列数值的样本方差
DOUBLE	stddev_pop(col)	Returns the standard deviation of a numeric column in the group.求指定列数值的标准偏差
DOUBLE	stddev_samp(col)	Returns the unbiased sample standard deviation of a numeric column in the group.求指定列数值的样本标准偏差
DOUBLE	covar_pop(col1, col2)	Returns the population covariance of a pair of numeric columns in the group.求指定列数值的协方差
DOUBLE	covar_samp(col1, col2)	Returns the sample covariance of a pair of a numeric columns in the group.求指定列数值的样本协方差
DOUBLE	corr(col1, col2)	Returns the Pearson coefficient of correlation of a pair of a numeric columns in the group.返回两列数值的相关系数
DOUBLE	percentile(BIGINT col, p)	Returns the exact pth percentile of a column in the group (does not work with floating point types). p must be between 0 and 1. NOTE: A true percentile can only be computed for integer values. Use PERCENTILE_APPROX if your input is non-integral.返回col的p%分位数

表生成函数

Return Type	Name(Signature)	Description
Array Type	explode(array<TYPE> a)	For each element in a, generates a row containing that element.对于a中的每个元素，将生成一行且包含该元素
N rows	explode(ARRAY)	Returns one row for each element from the array…每行对应数组中的一个元素
N rows	explode(MAP)	Returns one row for each key-value pair from the input map with two columns in each row: one for the key and another for the value. (As of Hive 0.8.0.).每行对应每个map键-值，其中一个字段是map的键，另一个字段是map的值
N rows	posexplode(ARRAY)	Behaves like `explode` for arrays, but includes the position of items in the original array by returning a tuple of `(pos, value)`. (As of Hive 0.13.0.).与explode类似，不同的是还返回各元素在数组中的位置
N rows	stack(INT n, v_1, v_2, …, v_k)	Breaks up v_1, …, v_k into n rows. Each row will have k/n columns. n must be constant…把M列转换成N行，每行有M/N个字段，其中n必须是个常数
tuple	json_tuple(jsonStr, k1, k2, …)	Takes a set of names (keys) and a JSON string, and returns a tuple of values. This is a more efficient version of the `get_json_object` UDF because it can get multiple keys with just one call…从一个JSON字符串中获取多个键并作为一个元组返回，与get_json_object不同的是此函数能一次获取多个键值
tuple	parse_url_tuple(url, p1, p2, …)	This is similar to the `parse_url()` UDF but can extract multiple parts at once out of a URL. Valid part names are: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO, QUERY:…返回从URL中抽取指定N部分的内容，参数url是URL字符串，而参数p1,p2,…是要抽取的部分，这个参数包含HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO, QUERY:
	inline(ARRAY)	Explodes an array of structs into a table. (As of Hive 0.10.).将结构体数组提取出来并插入到表中

附录3 链接

附录4 日志样例

[DEBUG] 2019-11-11 15:03:05.152 [http-nio-8080-exec-2] - c.a.c.u.b.d.e.J.getOrgTypeIds - o.a.i.logging.jdbc.BaseJdbcLogger - ==>  Preparing: select a.org_type from js_org_type a where a.use_state = 'A' and a.bus_type = ? and a.org_id = ? 
[DEBUG] 2019-11-11 15:03:05.152 [http-nio-8080-exec-2] - c.a.c.u.b.d.e.J.getOrgTypeIds - o.a.i.logging.jdbc.BaseJdbcLogger - ==> Parameters: 1(String), 100012155(String)
[DEBUG] 2019-11-11 15:03:05.162 [http-nio-8080-exec-2] - c.a.c.u.b.d.e.J.getOrgTypeIds - o.a.i.logging.jdbc.BaseJdbcLogger - <==      Total: 0
[DEBUG] 2019-11-11 15:03:05.163 [http-nio-8080-exec-2] - c.a.c.u.b.d.J.selectByExample - o.a.i.logging.jdbc.BaseJdbcLogger - ==>  Preparing: select ORG_REF_ID, ORG_ID, PARENT_ORG_ID, LATN_ID, PARENT_ORG_CODE, BUS_TYPE, ORG_LEVEL, ORG_ORDER, CREATE_DATE, MODIFY_DATE, USE_STATE, EXT1, EXT2, EXT3 from JS_ORG_REF_DEF WHERE ( USE_STATE = ? and ORG_ID = ? and BUS_TYPE = ? ) 
[DEBUG] 2019-11-11 15:03:05.164 [http-nio-8080-exec-2] - c.a.c.u.b.d.J.selectByExample - o.a.i.logging.jdbc.BaseJdbcLogger - ==> Parameters: A(String), 100012155(String), 2(String)
[DEBUG] 2019-11-11 15:03:05.175 [http-nio-8080-exec-2] - c.a.c.u.b.d.J.selectByExample - o.a.i.logging.jdbc.BaseJdbcLogger - <==      Total: 0
[DEBUG] 2019-11-11 15:03:05.176 [http-nio-8080-exec-2] - c.a.c.u.b.d.J.selectByExample - o.a.i.logging.jdbc.BaseJdbcLogger - ==>  Preparing: select ORG_REF_ID, ORG_ID, PARENT_ORG_ID, LATN_ID, PARENT_ORG_CODE, BUS_TYPE, ORG_LEVEL, ORG_ORDER, CREATE_DATE, MODIFY_DATE, USE_STATE, EXT1, EXT2, EXT3 from JS_ORG_REF_DEF WHERE ( USE_STATE = ? and ORG_ID = ? and BUS_TYPE = ? ) 
[DEBUG] 2019-11-11 15:03:05.176 [http-nio-8080-exec-2] - c.a.c.u.b.d.J.selectByExample - o.a.i.logging.jdbc.BaseJdbcLogger - ==> Parameters: A(String), 100012155(String), 3(String)
[DEBUG] 2019-11-11 15:03:05.185 [http-nio-8080-exec-2] - c.a.c.u.b.d.J.selectByExample - o.a.i.logging.jdbc.BaseJdbcLogger - <==      Total: 0
[DEBUG] 2019-11-11 15:03:36.575 [http-nio-8080-exec-6] - c.a.c.u.b.d.T.selectByPrimaryKey - o.a.i.logging.jdbc.BaseJdbcLogger - ==>  Preparing: select STAFF_ID, NAME, PWD, SEX, TELEPHONE, CARD_TYPE, IDCARD, EMAIL, STATE, CREATE_USER, CREATE_DATE, MODIFY_USER, MODIFY_DATE, FLAG, TYEP, BIRTH, CITY_ID, STAFF_CODE, DIMAREA, YZFNO, WXNO, EMPLOYMENTMODE, OAID, STAFF_ACCOUNT, BLOC_CODE, IS_VISUAL_COMFIRM, ADDRESS, IS_AVA, TSTAFF_ID, BSS_STAFF_ID, IS_MAIN, IS_TEL_SMZ, IS_RISK , IDENTITY_PIC, PHOTO_IMAGE from TA_STAFF where STAFF_ID = ? 
[DEBUG] 2019-11-11 15:03:36.576 [http-nio-8080-exec-6] - c.a.c.u.b.d.T.selectByPrimaryKey - o.a.i.logging.jdbc.BaseJdbcLogger - ==> Parameters: 0000000001(String)
[DEBUG] 2019-11-11 15:03:36.615 [http-nio-8080-exec-6] - c.a.c.u.b.d.T.selectByPrimaryKey - o.a.i.logging.jdbc.BaseJdbcLogger - <==      Total: 1
[DEBUG] 2019-11-11 15:03:36.619 [http-nio-8080-exec-6] - c.a.c.u.b.d.e.JsOrgExtMapper.getRoot - o.a.i.logging.jdbc.BaseJdbcLogger - ==>  Preparing: select b.org_id, b.org_name, a.parent_org_id from js_org_ref_def a left join js_org_def b on a.org_id = b.org_id where a.parent_org_id = ? and a.bus_type = ? and b.use_state='A' order by a.org_order 
[DEBUG] 2019-11-11 15:03:36.619 [http-nio-8080-exec-6] - c.a.c.u.b.d.e.JsOrgExtMapper.getRoot - o.a.i.logging.jdbc.BaseJdbcLogger - ==> Parameters: 100012156(String), 1(String)
[DEBUG] 2019-11-11 15:03:36.758 [http-nio-8080-exec-6] - c.a.c.u.b.d.e.JsOrgExtMapper.getRoot - o.a.i.logging.jdbc.BaseJdbcLogger - <==      Total: 56
[DEBUG] 2019-11-11 15:04:17.376 [http-nio-8080-exec-1] - c.a.c.u.b.d.T.selectByPrimaryKey - o.a.i.logging.jdbc.BaseJdbcLogger - ==>  Preparing: select STAFF_ID, NAME, PWD, SEX, TELEPHONE, CARD_TYPE, IDCARD, EMAIL, STATE, CREATE_USER, CREATE_DATE, MODIFY_USER, MODIFY_DATE, FLAG, TYEP, BIRTH, CITY_ID, STAFF_CODE, DIMAREA, YZFNO, WXNO, EMPLOYMENTMODE, OAID, STAFF_ACCOUNT, BLOC_CODE, IS_VISUAL_COMFIRM, ADDRESS, IS_AVA, TSTAFF_ID, BSS_STAFF_ID, IS_MAIN, IS_TEL_SMZ, IS_RISK , IDENTITY_PIC, PHOTO_IMAGE from TA_STAFF where STAFF_ID = ? 
[DEBUG] 2019-11-11 15:04:17.376 [http-nio-8080-exec-1] - c.a.c.u.b.d.T.selectByPrimaryKey - o.a.i.logging.jdbc.BaseJdbcLogger - ==> Parameters: 0000000001(String)
[DEBUG] 2019-11-11 15:04:17.407 [http-nio-8080-exec-1] - c.a.c.u.b.d.T.selectByPrimaryKey - o.a.i.logging.jdbc.BaseJdbcLogger - <==      Total: 1
[DEBUG] 2019-11-11 15:04:17.410 [http-nio-8080-exec-1] - c.a.c.u.b.d.J.selectByExample - o.a.i.logging.jdbc.BaseJdbcLogger - ==>  Preparing: select BUS_TYPE_ID, BUS_TYPE_NAME, SYS_ID, STATE, EXT1, EXT2, EXT3 from JS_ORG_RELA_DEF WHERE ( STATE = ? ) 
[DEBUG] 2019-11-11 15:04:17.410 [http-nio-8080-exec-1] - c.a.c.u.b.d.J.selectByExample - o.a.i.logging.jdbc.BaseJdbcLogger - ==> Parameters: A(String)
[DEBUG] 2019-11-11 15:04:17.423 [http-nio-8080-exec-1] - c.a.c.u.b.d.J.selectByExample - o.a.i.logging.jdbc.BaseJdbcLogger - <==      Total: 3
[DEBUG] 2019-11-11 15:04:17.424 [http-nio-8080-exec-1] - c.a.c.u.b.d.C.selectByExample - o.a.i.logging.jdbc.BaseJdbcLogger - ==>  Preparing: select 'true' as QUERYID, LEVEL_ID, LEVEL_NAME, LEVEL_VALUE, STATE from CSS_CONTR_INCOME_LV WHERE ( STATE = ? ) 
[DEBUG] 2019-11-11 15:04:17.425 [http-nio-8080-exec-1] - c.a.c.u.b.d.C.selectByExample - o.a.i.logging.jdbc.BaseJdbcLogger - ==> Parameters: A(String)
[DEBUG] 2019-11-11 15:04:17.540 [http-nio-8080-exec-1] - c.a.c.u.b.d.C.selectByExample - o.a.i.logging.jdbc.BaseJdbcLogger - <==      Total: 3
[DEBUG] 2019-11-11 15:04:18.689 [http-nio-8080-exec-4] - c.a.c.u.b.d.T.selectByPrimaryKey - o.a.i.logging.jdbc.BaseJdbcLogger - ==>  Preparing: select STAFF_ID, NAME, PWD, SEX, TELEPHONE, CARD_TYPE, IDCARD, EMAIL, STATE, CREATE_USER, CREATE_DATE, MODIFY_USER, MODIFY_DATE, FLAG, TYEP, BIRTH, CITY_ID, STAFF_CODE, DIMAREA, YZFNO, WXNO, EMPLOYMENTMODE, OAID, STAFF_ACCOUNT, BLOC_CODE, IS_VISUAL_COMFIRM, ADDRESS, IS_AVA, TSTAFF_ID, BSS_STAFF_ID, IS_MAIN, IS_TEL_SMZ, IS_RISK , IDENTITY_PIC, PHOTO_IMAGE from TA_STAFF where STAFF_ID = ? 
[DEBUG] 2019-11-11 15:04:18.690 [http-nio-8080-exec-4] - c.a.c.u.b.d.T.selectByPrimaryKey - o.a.i.logging.jdbc.BaseJdbcLogger - ==> Parameters: 0000000001(String)
[DEBUG] 2019-11-11 15:04:18.730 [http-nio-8080-exec-4] - c.a.c.u.b.d.T.selectByPrimaryKey - o.a.i.logging.jdbc.BaseJdbcLogger - <==      Total: 1
[DEBUG] 2019-11-11 15:04:18.735 [http-nio-8080-exec-4] - c.a.c.u.b.d.e.JsOrgExtMapper.getRoot - o.a.i.logging.jdbc.BaseJdbcLogger - ==>  Preparing: select b.org_id, b.org_name, a.parent_org_id from js_org_ref_def a left join js_org_def b on a.org_id = b.org_id where a.parent_org_id = ? and a.bus_type = ? and b.use_state='A' order by a.org_order 
[DEBUG] 2019-11-11 15:04:18.736 [http-nio-8080-exec-4] - c.a.c.u.b.d.e.JsOrgExtMapper.getRoot - o.a.i.logging.jdbc.BaseJdbcLogger - ==> Parameters: -1(String), 1(String)
[DEBUG] 2019-11-11 15:04:18.778 [http-nio-8080-exec-4] - c.a.c.u.b.d.e.JsOrgExtMapper.getRoot - o.a.i.logging.jdbc.BaseJdbcLogger - <==      Total: 1
[DEBUG] 2019-11-11 15:04:24.502 [http-nio-8080-exec-5] - c.a.c.u.b.d.T.selectByPrimaryKey - o.a.i.logging.jdbc.BaseJdbcLogger - ==>  Preparing: select STAFF_ID, NAME, PWD, SEX, TELEPHONE, CARD_TYPE, IDCARD, EMAIL, STATE, CREATE_USER, CREATE_DATE, MODIFY_USER, MODIFY_DATE, FLAG, TYEP, BIRTH, CITY_ID, STAFF_CODE, DIMAREA, YZFNO, WXNO, EMPLOYMENTMODE, OAID, STAFF_ACCOUNT, BLOC_CODE, IS_VISUAL_COMFIRM, ADDRESS, IS_AVA, TSTAFF_ID, BSS_STAFF_ID, IS_MAIN, IS_TEL_SMZ, IS_RISK , IDENTITY_PIC, PHOTO_IMAGE from TA_STAFF where STAFF_ID = ? 
[DEBUG] 2019-11-11 15:04:24.502 [http-nio-8080-exec-5] - c.a.c.u.b.d.T.selectByPrimaryKey - o.a.i.logging.jdbc.BaseJdbcLogger - ==> Parameters: 0000000001(String)
[DEBUG] 2019-11-11 15:04:24.536 [http-nio-8080-exec-5] - c.a.c.u.b.d.T.selectByPrimaryKey - o.a.i.logging.jdbc.BaseJdbcLogger - <==      Total: 1
[DEBUG] 2019-11-11 15:04:24.539 [http-nio-8080-exec-5] - c.a.c.u.b.d.e.JsOrgExtMapper.getRoot - o.a.i.logging.jdbc.BaseJdbcLogger - ==>  Preparing: select b.org_id, b.org_name, a.parent_org_id from js_org_ref_def a left join js_org_def b on a.org_id = b.org_id where a.parent_org_id = ? and a.bus_type = ? and b.use_state='A' order by a.org_order 
[DEBUG] 2019-11-11 15:04:24.539 [http-nio-8080-exec-5] - c.a.c.u.b.d.e.JsOrgExtMapper.getRoot - o.a.i.logging.jdbc.BaseJdbcLogger - ==> Parameters: 100008669(String), 1(String)
[DEBUG] 2019-11-11 15:04:24.656 [http-nio-8080-exec-5] - c.a.c.u.b.d.e.JsOrgExtMapper.getRoot - o.a.i.logging.jdbc.BaseJdbcLogger - <==      Total: 24
[DEBUG] 2019-11-11 15:04:24.782 [http-nio-8080-exec-10] - c.a.c.u.b.d.T.selectByPrimaryKey - o.a.i.logging.jdbc.BaseJdbcLogger - ==>  Preparing: select STAFF_ID, NAME, PWD, SEX, TELEPHONE, CARD_TYPE, IDCARD, EMAIL, STATE, CREATE_USER, CREATE_DATE, MODIFY_USER, MODIFY_DATE, FLAG, TYEP, BIRTH, CITY_ID, STAFF_CODE, DIMAREA, YZFNO, WXNO, EMPLOYMENTMODE, OAID, STAFF_ACCOUNT, BLOC_CODE, IS_VISUAL_COMFIRM, ADDRESS, IS_AVA, TSTAFF_ID, BSS_STAFF_ID, IS_MAIN, IS_TEL_SMZ, IS_RISK , IDENTITY_PIC, PHOTO_IMAGE from TA_STAFF where STAFF_ID = ? 
[DEBUG] 2019-11-11 15:04:24.782 [http-nio-8080-exec-10] - c.a.c.u.b.d.T.selectByPrimaryKey - o.a.i.logging.jdbc.BaseJdbcLogger - ==> Parameters: 0000000001(String)
[DEBUG] 2019-11-11 15:04:24.811 [http-nio-8080-exec-10] - c.a.c.u.b.d.T.selectByPrimaryKey - o.a.i.logging.jdbc.BaseJdbcLogger - <==      Total: 1
[DEBUG] 2019-11-11 15:04:24.815 [http-nio-8080-exec-10] - c.a.c.u.b.d.J.selectByPrimaryKey - o.a.i.logging.jdbc.BaseJdbcLogger - ==>  Preparing: select ORG_ID, ORG_NAME, ORG_CODE, TELE_MANAGE_ARER, ADMIN_MANAGE_ARER, EXIST_TYPE, ORG_ORDER, AREA_CODE, CITY_FLAG, LATN_ID, JT_CODE, ORG_LEVEL, MANAGER_INFO, USE_STATE, CREATE_DATE, MODIFY_DATE, EXT1, EXT2, EXT3, EXT4, EXT5, EXT6, CODE_CBZX, CODE_OA, CODE_CSS, CODE_CB, CODE_XQ from JS_ORG_DEF where ORG_ID = ? 
[DEBUG] 2019-11-11 15:04:24.815 [http-nio-8080-exec-10] - c.a.c.u.b.d.J.selectByPrimaryKey - o.a.i.logging.jdbc.BaseJdbcLogger - ==> Parameters: 100008669(String)
[DEBUG] 2019-11-11 15:04:24.826 [http-nio-8080-exec-10] - c.a.c.u.b.d.J.selectByPrimaryKey - o.a.i.logging.jdbc.BaseJdbcLogger - <==      Total: 1
[DEBUG] 2019-11-11 15:04:24.826 [http-nio-8080-exec-10] - c.a.c.u.b.d.e.J.getFathers - o.a.i.logging.jdbc.BaseJdbcLogger - ==>  Preparing: select b.org_name, a.org_id, b.org_id from js_org_ref_def a left join js_org_def b on b.org_id = a.org_id where a.bus_type = ? start with a.org_id = ? connect by a.org_id = prior a.parent_org_id 
[DEBUG] 2019-11-11 15:04:24.827 [http-nio-8080-exec-10] - c.a.c.u.b.d.e.J.getFathers - o.a.i.logging.jdbc.BaseJdbcLogger - ==> Parameters: 1(String), 100008669(String)
[DEBUG] 2019-11-11 15:04:24.952 [http-nio-8080-exec-10] - c.a.c.u.b.d.e.J.getFathers - o.a.i.logging.jdbc.BaseJdbcLogger - <==      Total: 1

, 1(String)
[DEBUG] 2019-11-11 15:04:24.656 [http-nio-8080-exec-5] - c.a.c.u.b.d.e.JsOrgExtMapper.getRoot - o.a.i.logging.jdbc.BaseJdbcLogger - <== Total: 24
[DEBUG] 2019-11-11 15:04:24.782 [http-nio-8080-exec-10] - c.a.c.u.b.d.T.selectByPrimaryKey - o.a.i.logging.jdbc.BaseJdbcLogger - ==> Preparing: select STAFF_ID, NAME, PWD, SEX, TELEPHONE, CARD_TYPE, IDCARD, EMAIL, STATE, CREATE_USER, CREATE_DATE, MODIFY_USER, MODIFY_DATE, FLAG, TYEP, BIRTH, CITY_ID, STAFF_CODE, DIMAREA, YZFNO, WXNO, EMPLOYMENTMODE, OAID, STAFF_ACCOUNT, BLOC_CODE, IS_VISUAL_COMFIRM, ADDRESS, IS_AVA, TSTAFF_ID, BSS_STAFF_ID, IS_MAIN, IS_TEL_SMZ, IS_RISK , IDENTITY_PIC, PHOTO_IMAGE from TA_STAFF where STAFF_ID = ?
[DEBUG] 2019-11-11 15:04:24.782 [http-nio-8080-exec-10] - c.a.c.u.b.d.T.selectByPrimaryKey - o.a.i.logging.jdbc.BaseJdbcLogger - > Parameters: 0000000001(String)
[DEBUG] 2019-11-11 15:04:24.811 [http-nio-8080-exec-10] - c.a.c.u.b.d.T.selectByPrimaryKey - o.a.i.logging.jdbc.BaseJdbcLogger - < Total: 1
[DEBUG] 2019-11-11 15:04:24.815 [http-nio-8080-exec-10] - c.a.c.u.b.d.J.selectByPrimaryKey - o.a.i.logging.jdbc.BaseJdbcLogger - ==> Preparing: select ORG_ID, ORG_NAME, ORG_CODE, TELE_MANAGE_ARER, ADMIN_MANAGE_ARER, EXIST_TYPE, ORG_ORDER, AREA_CODE, CITY_FLAG, LATN_ID, JT_CODE, ORG_LEVEL, MANAGER_INFO, USE_STATE, CREATE_DATE, MODIFY_DATE, EXT1, EXT2, EXT3, EXT4, EXT5, EXT6, CODE_CBZX, CODE_OA, CODE_CSS, CODE_CB, CODE_XQ from JS_ORG_DEF where ORG_ID = ?
[DEBUG] 2019-11-11 15:04:24.815 [http-nio-8080-exec-10] - c.a.c.u.b.d.J.selectByPrimaryKey - o.a.i.logging.jdbc.BaseJdbcLogger - > Parameters: 100008669(String)
[DEBUG] 2019-11-11 15:04:24.826 [http-nio-8080-exec-10] - c.a.c.u.b.d.J.selectByPrimaryKey - o.a.i.logging.jdbc.BaseJdbcLogger - < Total: 1
[DEBUG] 2019-11-11 15:04:24.826 [http-nio-8080-exec-10] - c.a.c.u.b.d.e.J.getFathers - o.a.i.logging.jdbc.BaseJdbcLogger - ==> Preparing: select b.org_name, a.org_id, b.org_id from js_org_ref_def a left join js_org_def b on b.org_id = a.org_id where a.bus_type = ? start with a.org_id = ? connect by a.org_id = prior a.parent_org_id
[DEBUG] 2019-11-11 15:04:24.827 [http-nio-8080-exec-10] - c.a.c.u.b.d.e.J.getFathers - o.a.i.logging.jdbc.BaseJdbcLogger - > Parameters: 1(String), 100008669(String)
[DEBUG] 2019-11-11 15:04:24.952 [http-nio-8080-exec-10] - c.a.c.u.b.d.e.J.getFathers - o.a.i.logging.jdbc.BaseJdbcLogger - < Total: 1

https://blog.csdn.net/guicaizhou/article/details/82013516 ↩︎
https://cwiki.apache.org/confluence/display/hive/indexdev#IndexDev-ReferenceImplementation ↩︎
https://blog.csdn.net/qq_26442553/article/details/80865014 ↩︎

你可能感兴趣的:(hive,大数据)

nosql数据库技术与应用知识点皆过客，揽星河 NoSQL nosql 数据库大数据数据分析数据结构非关系型数据库
Nosql知识回顾大数据处理流程数据采集(flume、爬虫、传感器)数据存储(本门课程NoSQL所处的阶段)Hdfs、MongoDB、HBase等数据清洗(入仓)Hive等数据处理、分析(Spark、Flink等)数据可视化数据挖掘、机器学习应用(Python、SparkMLlib等)大数据时代存储的挑战(三高)高并发(同一时间很多人访问)高扩展(要求随时根据需求扩展存储)高效率(要求读写速度快)
ES聚合分析原理与代码实例讲解光剑书架上的书大厂Offer收割机面试题简历程序员读书硅基计算碳基计算认知计算生物计算深度学习神经网络大数据 AIGC AGI LLM Java Python 架构设计 Agent 程序员实现财富自由
ES聚合分析原理与代码实例讲解1.背景介绍1.1问题的由来在大规模数据分析场景中，特别是在使用Elasticsearch（ES）进行数据存储和检索时，聚合分析成为了一个至关重要的功能。聚合分析允许用户对数据集进行细分和分组，以便深入探索数据的结构和模式。这在诸如实时监控、日志分析、业务洞察等领域具有广泛的应用。1.2研究现状目前，ES聚合分析已经成为现代大数据平台的核心组件之一。它支持多种类型的聚
WebMagic：强大的Java爬虫框架解析与实战 Aaron_945 Java java 爬虫开发语言
文章目录引言官网链接WebMagic原理概述基础使用1.添加依赖2.编写PageProcessor高级使用1.自定义Pipeline2.分布式抓取优点结论引言在大数据时代，网络爬虫作为数据收集的重要工具，扮演着不可或缺的角色。Java作为一门广泛使用的编程语言，在爬虫开发领域也有其独特的优势。WebMagic是一个开源的Java爬虫框架，它提供了简单灵活的API，支持多线程、分布式抓取，以及丰富的
免费的GPT可在线直接使用（一键收藏） kkai人工智能 gpt
1、LuminAI（https://kk.zlrxjh.top）LuminAI标志着一款融合了星辰大数据模型与文脉深度模型的先进知识增强型语言处理系统，旨在自然语言处理（NLP）的技术开发领域发光发热。此系统展现了卓越的语义把握与内容生成能力，轻松驾驭多样化的自然语言处理任务。VisionAI在NLP界的应用领域广泛，能够胜任从机器翻译、文本概要撰写、情绪分析到问答等众多任务。通过对大量文本数据的
如何利用大数据与AI技术革新相亲交友体验 h17711347205 回归算法安全系统架构交友小程序
在数字化时代，大数据和人工智能（AI）技术正逐渐革新相亲交友体验，为寻找爱情的过程带来前所未有的变革（编辑h17711347205）。通过精准分析和智能匹配，这些技术能够极大地提高相亲交友系统的效率和用户体验。大数据的力量大数据技术能够收集和分析用户的行为模式、偏好和互动数据，为相亲交友系统提供丰富的信息资源。通过分析用户的搜索历史、浏览记录和点击行为，系统能够深入了解用户的兴趣和需求，从而提供更
浅谈MapReduce Android路上的人 Hadoop 分布式计算 mapreduce 分布式框架 hadoop
从今天开始，本人将会开始对另一项技术的学习，就是当下炙手可热的Hadoop分布式就算技术。目前国内外的诸多公司因为业务发展的需要，都纷纷用了此平台。国内的比如BAT啦，国外的在这方面走的更加的前面，就不一一列举了。但是Hadoop作为Apache的一个开源项目，在下面有非常多的子项目，比如HDFS，HBase,Hive，Pig,等等，要先彻底学习整个Hadoop，仅仅凭借一个的力量，是远远不够的。
未来软件市场是怎么样的？做开发的生存空间如何？ cesske 软件需求
目录前言一、未来软件市场的发展趋势二、软件开发人员的生存空间前言未来软件市场是怎么样的？做开发的生存空间如何？一、未来软件市场的发展趋势技术趋势：人工智能与机器学习：随着技术的不断成熟，人工智能将在更多领域得到应用，如智能客服、自动驾驶、智能制造等，这将极大地推动软件市场的增长。云计算与大数据：云计算服务将继续普及，大数据技术的应用也将更加广泛。企业将更加依赖云计算和大数据来优化运营、提升效率，并
Hadoop架构 henan程序媛 hadoop 大数据分布式
一、案列分析1.1案例概述现在已经进入了大数据(BigData)时代，数以万计用户的互联网服务时时刻刻都在产生大量的交互，要处理的数据量实在是太大了，以传统的数据库技术等其他手段根本无法应对数据处理的实时性、有效性的需求。HDFS顺应时代出现，在解决大数据存储和计算方面有很多的优势。1.2案列前置知识点1.什么是大数据大数据是指无法在一定时间范围内用常规软件工具进行捕捉、管理和处理的大量数据集合，
[转载] NoSQL简介 weixin_30325793 大数据数据库运维
摘自“百度百科”。NoSQL，泛指非关系型的数据库。随着互联网web2.0网站的兴起，传统的关系数据库在应付web2.0网站，特别是超大规模和高并发的SNS类型的web2.0纯动态网站已经显得力不从心，暴露了很多难以克服的问题，而非关系型的数据库则由于其本身的特点得到了非常迅速的发展。NoSQL数据库的产生就是为了解决大规模数据集合多重数据种类带来的挑战，尤其是大数据应用难题。虽然NoSQL流行语
Kafka详细解析与应用分析芊言芊语 kafka 分布式
Kafka是一个开源的分布式事件流平台（EventStreamingPlatform），由LinkedIn公司最初采用Scala语言开发，并基于ZooKeeper协调管理。如今，Kafka已经被Apache基金会纳入其项目体系，广泛应用于大数据实时处理领域。Kafka凭借其高吞吐量、持久化、分布式和可靠性的特点，成为构建实时流数据管道和流处理应用程序的重要工具。Kafka架构Kafka的架构主要由
分享一个基于python的电子书数据采集与可视化分析 hadoop电子书数据分析与推荐系统 spark大数据毕设项目（源码、调试、LW、开题、PPT) 计算机源码社 Python项目大数据大数据 python hadoop 计算机毕业设计选题计算机毕业设计源码数据分析 spark毕设
作者：计算机源码社个人简介：本人八年开发经验，擅长Java、Python、PHP、.NET、Node.js、Android、微信小程序、爬虫、大数据、机器学习等，大家有这一块的问题可以一起交流！学习资料、程序开发、技术解答、文档报告如需要源码，可以扫取文章下方二维码联系咨询Java项目微信小程序项目Android项目Python项目PHP项目ASP.NET项目Node.js项目选题推荐项目实战|p
疫情，疫情东山草
2020年，疫情爆发，至今已近三年，反反复复，此起彼伏。不但没被消灭，还自我发展，从德尔塔到奥密克戎，与时俱进的变异着。去年11月，疫情之下，大数据800米范围内，都成为时空伴随者。“你的码儿有没有变颜色”“你绿码还是黄码”成为那段时间的流行语，当然少不了的还有全员核酸。段子手整出来一首歌：我走过你走过的路,这算不算相逢？我吹过你吹过的风，这算不算相拥？800米内我们不曾擦肩而过，你却要我14天相
在服务器计算节点中使用 jupyter Lab ranshan567 程序人生
JupyterLab是一个基于网页的交互式开发环境,用于科学计算、数据分析和机器学.jupyterlab是jupyternotebook的下一代产品,集成了更多功能,使用起来更方便.在进行数据分析及可视化时，个人电脑不能满足大数据的分析需求，就需要用到高性能计算机集群资源，然而计算机集群的计算节点往往没有联网功能，所以在计算机集群中使用jupyterLab需要进行一些配置。具体的步骤如下：
Presto【基础 01】简介+架构+数据源+数据模型 2401_84254343 程序员架构
一个Catalog包含Schema和Connector。例如，配置JMX的Catalog，通过JXMConnector访问JXM信息。当执行一条SQL语句时，可以同时运行在多个Catalog。Presto处理table时，是通过表的完全限定（fully-qualified）名来找到Catalog。例如，一个表的权限定名是hive.test_data.test，则test是表名，test_data是
大数据真实面试题---SQL The博宇大数据面试题——SQL 大数据 mysql sql 数据库 big data
视频号数据分析组外包招聘笔试题时间限时45分钟完成。题目根据3张表表结构，写出具体求解的SQL代码（搞笑品类定义：视频分类或者视频创建者分类为“搞笑”）1、表创建语句：createtablet_user_video_action_d(dsint,user_idstring,video_idstring,action_typeint,`timestamp`bigint)rowformatdelimi
Flume：大规模日志收集与数据传输的利器傲雪凌霜，松柏长青后端大数据 flume 大数据
Flume：大规模日志收集与数据传输的利器在大数据时代，随着各类应用的不断增长，产生了海量的日志和数据。这些数据不仅对业务的健康监控至关重要，还可以通过深入分析，帮助企业做出更好的决策。那么，如何高效地收集、传输和存储这些海量数据，成为了一项重要的挑战。今天我们将深入探讨ApacheFlume，它是如何帮助我们应对这些挑战的。一、Flume概述ApacheFlume是一个分布式、可靠、可扩展的日志
云服务业界动态简报-20180128 Captain7
一、青云青云QingCloud推出深度学习平台DeepLearningonQingCloud，包含了主流的深度学习框架及数据科学工具包，通过QingCloudAppCenter一键部署交付，可以让算法工程师和数据科学家快速构建深度学习开发环境，将更多的精力放在模型和算法调优。二、腾讯云1.腾讯云正式发布腾讯专有云TCE(TencentCloudEnterprise)矩阵，涵盖企业版、大数据版、AI
大数据毕业设计hadoop+spark+hive知识图谱租房数据分析可视化大屏租房推荐系统 58同城租房爬虫房源推荐系统房价预测系统计算机毕业设计机器学习深度学习人工智能 2401_84572577 程序员大数据 hadoop 人工智能
做了那么多年开发，自学了很多门编程语言，我很明白学习资源对于学一门新语言的重要性，这些年也收藏了不少的Python干货，对我来说这些东西确实已经用不到了，但对于准备自学Python的人来说，或许它就是一个宝藏，可以给你省去很多的时间和精力。别在网上瞎学了，我最近也做了一些资源的更新，只要你是我的粉丝，这期福利你都可拿走。我先来介绍一下这些东西怎么用，文末抱走。（1）Python所有方向的学习路线（
架构评审的自动化与人工智能: 如何提高效率光剑书架上的书架构自动化人工智能运维
1.背景介绍架构评审是软件开发过程中的一个关键环节，它旨在确保软件架构的质量、可维护性和可扩展性。传统的架构评审通常是由人工进行，需要大量的时间和精力。随着大数据技术和人工智能的发展，自动化和人工智能技术已经开始应用于架构评审，从而提高评审的效率和准确性。在本文中，我们将讨论如何通过自动化和人工智能技术来提高架构评审的效率。我们将从以下几个方面进行讨论：背景介绍核心概念与联系核心算法原理和具体操作
【数字化供应链】数字化供应链架构、全景管理、全流程贯通方案数字化建设方案数字化转型数据治理主数据数据仓库供应链数字仓储智慧物流智慧仓储物流园区架构微服务数据挖掘大数据人工智能
原文《数字化供应链架构、全景管理、全流程贯通方案》PPT格式。主要从供应链管理全景、智慧供应链建设总体目标、供应链总体业务流程、供应链总体功能架构、供应链总体技术架构、供应链全流程贯通、供应链全领域管理、供应链数据数据分析、供应链决策中台等进行建设。本文仅对主要内容进行介绍。来源网络公开渠道，旨在交流学习，如有侵权联系速删，更多参考公众号：优享智库基于先进IT技术、大数据能力、物联网应用、区块链平
80 鑫_259b
科普一个谈恋爱的方法。在以前，谈恋爱千难万难，就难在对对方不知底细，不知道对方希望自己是一个怎样的人，要耗费大量的时间去试探、再磨合，往往会因为一些小事一些细节，满盘皆输。在一个信息化的时代，在一个大数据近乎变成了流行语的时代，我们要跟上时代的步伐，通过大数据，去寻找异性最希望自己展现出来的形象是什么，才可以在爱情的道路上少走弯路。那这个大数据怎么操作呢？上街发问卷？问别人的择偶标准？一来会被打死
解锁企业潜能，Vatee万腾平台引领智能新纪元自媒体经济说其他
在数字化转型的浪潮中，企业正站在一个前所未有的十字路口，面对着前所未有的机遇与挑战。解锁企业内在潜能，实现跨越式发展，已成为众多企业的共同追求。而Vatee万腾平台，作为智能科技的先锋，正以其强大的智能赋能能力，引领企业步入一个全新的智能纪元。Vatee万腾平台，是一个集成了人工智能、大数据、云计算等前沿技术的综合性智能服务平台。它不仅仅是一个技术工具，更是企业转型升级的加速器，能够深入企业运营的
释放“AI+”新质生产力，深算院如何“把大数据变小”？ YashanDB YashanDB 国产数据库数据库数据库大数据
近期，南都·湾财社推出《新质·中国造》栏目，深入千行百业，遍访湾区企业，解锁湾区新质生产力，共探高质量发展之道。本期对话深圳计算科学研究院YashanDB首席技术官陈志标，探讨国产数据库如何实现创新突围，抢抓数字经济时代的新机遇。以下是专访内容：如何应对AI时代所面临的算力挑战？南都·湾财社：数据、算力和算法是发展人工智能的三要素，深算院做了怎样的前瞻性布局？陈志标：今年，政府工作报告中首次提及开
数字化智能工厂数字化供应链架构、全景管理、全流程贯通方案数字化建设方案智能制造数字工厂制造业数字化转型工业互联网架构
随着信息技术的飞速发展，数字化转型已成为制造企业提升竞争力的关键途径。数字化智能工厂通过集成先进的物联网(IoT)、大数据、云计算、人工智能(AI)等技术，实现了生产过程的智能化、供应链管理的精准化及决策的科学化。本方案旨在构建一套完善的数字化供应链架构，实现全景管理、全流程贯通、智慧化升级，以数据为驱动，强化技术支撑与安全管理体系，推动企业向智能制造迈进。一、数字化供应链架构1.**集成化平台构
日记——我的歌单静若小猴
又到一年一度大数据汇总的时候了，听歌已经成为很多人生活里的一种乐趣。春夏秋冬，我们都有自己喜欢的歌，歌词歌曲唱出沃尔玛你的心声。还记得大学时候最喜欢听的《春天里》，我有一天单曲回放了30遍，总觉得听着仿佛看到自己声音。还有的歌，初听不知曲中意，再听已经是曲终人，听着歌流泪，听着歌入睡……还记得那些年少的故事吗，总觉得自己才是故事外的人，却不是自己已经入歌。一段时间会喜欢一个人的音乐，一段时间会沉静
Linux dmesg命令：显示开机信息 fafadsj666 linux 数据库数据挖掘机器学习大数据
通过学习《Linux启动管理》一章可以知道，在系统启动过程中，内核还会进行一次系统检测（第一次是BIOS进行加测），但是检测的过程不是没有显示在屏幕上，就是会快速的在屏幕上一闪而过那么，如果开机时来不及查看相关信息，我们是否可以在开机后查看呢？答案是肯定的，使用dmesg命令就可以。无论是系统启动过程中，还是系统运行过程中，只要是内核产生的信息，都会被存储在系统缓冲区中，已经为大家精心准备了大数据
大数据新视界 --大数据大厂之揭秘大数据时代 Excel 魔法：大厂数据分析师进阶秘籍青云交大数据新视界 Excel 数据分析函数公式数据透视表图表功能规划求解数据分析工具库大数据新视界数据库
亲爱的朋友们，热烈欢迎你们来到青云交的博客！能与你们在此邂逅，我满心欢喜，深感无比荣幸。在这个瞬息万变的时代，我们每个人都在苦苦追寻一处能让心灵安然栖息的港湾。而我的博客，正是这样一个温暖美好的所在。在这里，你们不仅能够收获既富有趣味又极为实用的内容知识，还可以毫无拘束地畅所欲言，尽情分享自己独特的见解。我真诚地期待着你们的到来，愿我们能在这片小小的天地里共同成长，共同进步。本博客的精华专栏：Ja
大数据新视界 --大数据大厂之数据挖掘入门：用 R 语言开启数据宝藏的探索之旅青云交大数据新视界数据库大数据数据挖掘 R 语言算法案例未来趋势应用场景学习建议大数据新视界
亲爱的朋友们，热烈欢迎你们来到青云交的博客！能与你们在此邂逅，我满心欢喜，深感无比荣幸。在这个瞬息万变的时代，我们每个人都在苦苦追寻一处能让心灵安然栖息的港湾。而我的博客，正是这样一个温暖美好的所在。在这里，你们不仅能够收获既富有趣味又极为实用的内容知识，还可以毫无拘束地畅所欲言，尽情分享自己独特的见解。我真诚地期待着你们的到来，愿我们能在这片小小的天地里共同成长，共同进步。本博客的精华专栏：Ja
高职人工智能训练师边缘计算实训室解决方案武汉唯众智创人工智能训练师边缘计算实训室人工智能训练师实训室边缘计算实训室
一、引言随着物联网（IoT）、大数据、人工智能（AI）等技术的飞速发展，计算需求日益复杂和多样化。传统的云计算模式虽在一定程度上满足了这些需求，但在处理海量数据、保障实时性与安全性、提升计算效率等方面仍面临诸多挑战。在此背景下，边缘计算作为一种新兴的计算模式应运而生，通过将计算能力推向数据生成或用户所在的网络边缘，显著降低了数据传输的延迟，提升了处理效率，并增强了数据安全性。针对高等职业院校的人工
python基于django/flask的NBA球员大数据分析与可视化python+java+node.js QQ_511008285 python django flask java spring boot 数据分析
前端开发框架:vue.js数据库mysql版本不限后端语言框架支持：1java(SSM/springboot)-idea/eclipse2.Nodejs+Vue.js-vscode3.python(flask/django)--pycharm/vscode4.php(thinkphp/laravel)-hbuilderx数据库工具：Navicat/SQLyog等都可以本文针对NBA球员的大数据进行
apache 安装linux windows 墙头上一根草 apache inux windows
linux安装Apache 有两种方式一种是手动安装通过二进制的文件进行安装，另外一种就是通过yum 安装，此中安装方式，需要物理机联网。以下分别介绍两种的安装方式通过二进制文件安装Apache需要的软件有apr,apr-util,pcre 1，安装 apr 下载地址：htt
fill_parent、wrap_content和match_parent的区别 Cb123456 match_parent fill_parent
fill_parent、wrap_content和match_parent的区别: 1）fill_parent 设置一个构件的布局为fill_parent将强制性地使构件扩展，以填充布局单元内尽可能多的空间。这跟Windows控件的dockstyle属性大体一致。设置一个顶部布局或控件为fill_parent将强制性让它布满整个屏幕。 2） wrap_conte
网页自适应设计天子之骄 html css 响应式设计页面自适应
网页自适应设计网页对浏览器窗口的自适应支持变得越来越重要了。自适应响应设计更是异常火爆。再加上移动端的崛起，更是如日中天。以前为了适应不同屏幕分布率和浏览器窗口的扩大和缩小，需要设计几套css样式，用js脚本判断窗口大小，选择加载。结构臃肿，加载负担较大。现笔者经过一定时间的学习，有所心得，故分享于此，加强交流，共同进步。同时希望对大家有所
[sql server] 分组取最大最小常用sql 一炮送你回车库 SQL Server
--分组取最大最小常用sql--测试环境if OBJECT_ID('tb') is not null drop table tb;gocreate table tb( col1 int, col2 int, Fcount int)insert into tbselect 11,20,1 union allselect 11,22,1 union allselect 1
ImageIO写图片输出到硬盘 3213213333332132 java image
package awt; import java.awt.Color; import java.awt.Font; import java.awt.Graphics; import java.awt.image.BufferedImage; import java.io.File; import java.io.IOException; import javax.imagei
自己的String动态数组宝剑锋梅花香 java 动态数组数组
数组还是好说，学过一两门编程语言的就知道，需要注意的是数组声明时需要把大小给它定下来，比如声明一个字符串类型的数组：String str[]=new String[10]; 但是问题就来了，每次都是大小确定的数组，我需要数组大小不固定随时变化怎么办呢？动态数组就这样应运而生，龙哥给我们讲的是自己用代码写动态数组，并非用的ArrayList 看看字符
pinyin4j工具类 darkranger .net
pinyin4j工具类Java工具类 2010-04-24 00:47:00 阅读69 评论0 字号：大中小引入pinyin4j-2.5.0.jar包: pinyin4j是一个功能强悍的汉语拼音工具包，主要是从汉语获取各种格式和需求的拼音，功能强悍，下面看看如何使用pinyin4j。本人以前用AscII编码提取工具，效果不理想，现在用pinyin4j简单实现了一个。功能还不是很完美，
StarUML学习笔记----基本概念 aijuans UML建模
介绍StarUML的基本概念，这些都是有效运用StarUML?所需要的。包括对模型、视图、图、项目、单元、方法、框架、模型块及其差异以及UML轮廓。模型、视与图（Model, View and Diagram） &
Activiti最终总结 avords Activiti id 工作流
1、流程定义ID：ProcessDefinitionId，当定义一个流程就会产生。 2、流程实例ID：ProcessInstanceId，当开始一个具体的流程时就会产生，也就是不同的流程实例ID可能有相同的流程定义ID。 3、TaskId，每一个userTask都会有一个Id这个是存在于流程实例上的。 4、TaskDefinitionKey和（ActivityImpl activityId
从省市区多重级联想到的，react和jquery的差别 bee1314 jquery UI react
在我们的前端项目里经常会用到级联的select，比如省市区这样。通常这种级联大多是动态的。比如先加载了省，点击省加载市，点击市加载区。然后数据通常ajax返回。如果没有数据则说明到了叶子节点。针对这种场景，如果我们使用jquery来实现，要考虑很多的问题，数据部分，以及大量的dom操作。比如这个页面上显示了某个区，这时候我切换省，要把市重新初始化数据，然后区域的部分要从页面
Eclipse快捷键大全 bijian1013 java eclipse 快捷键
Ctrl+1 快速修复(最经典的快捷键,就不用多说了)Ctrl+D: 删除当前行 Ctrl+Alt+↓ 复制当前行到下一行(复制增加)Ctrl+Alt+↑ 复制当前行到上一行(复制增加)Alt+↓ 当前行和下面一行交互位置(特别实用,可以省去先剪切,再粘贴了)Alt+↑ 当前行和上面一行交互位置(同上)Alt+← 前一个编辑的页面Alt+→ 下一个编辑的页面(当然是针对上面那条来说了)Alt+En
js 笔记函数征客丶 JavaScript
一、函数的使用 1.1、定义函数变量 var vName = funcation(params){ } 1.2、函数的调用函数变量的调用： vName(params); 函数定义时自发调用：(function(params){})(params); 1.3、函数中变量赋值 var a = 'a'; var ff
【Scala四】分析Spark源代码总结的Scala语法二 bit1129 scala
1. Some操作在下面的代码中，使用了Some操作：if (self.partitioner == Some(partitioner))，那么Some(partitioner)表示什么含义？首先partitioner是方法combineByKey传入的变量， Some的文档说明： /** Class `Some[A]` represents existin
java 匿名内部类 BlueSkator java匿名内部类
组合优先于继承 Java的匿名类，就是提供了一个快捷方便的手段，令继承关系可以方便地变成组合关系继承只有一个时候才能用，当你要求子类的实例可以替代父类实例的位置时才可以用继承。在Java中内部类主要分为成员内部类、局部内部类、匿名内部类、静态内部类。内部类不是很好理解，但说白了其实也就是一个类中还包含着另外一个类如同一个人是由大脑、肢体、器官等身体结果组成，而内部类相
盗版win装在MAC有害发热，苹果的东西不值得买，win应该不用 ljy325 游戏 apple windows XP OS
Mac mini 型号: MC270CH-A RMB:5,688 Apple 对windows的产品支持不好,有以下问题: 1.装完了xp,发现机身很热虽然没有运行任何程序！貌似显卡跑游戏发热一样，按照那样的发热量,那部机子损耗很大,使用寿命受到严重的影响! 2.反观安装了Mac os的展示机，发热量很小，运行了1天温度也没有那么高 &nbs
读《研磨设计模式》-代码笔记-生成器模式-Builder bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ /** * 生成器模式的意图在于将一个复杂的构建与其表示相分离，使得同样的构建过程可以创建不同的表示（GoF） * 个人理解： * 构建一个复杂的对象，对于创建者（Builder）来说，一是要有数据来源(rawData)，二是要返回构
JIRA与SVN插件安装 chenyu19891124 SVN jira
JIRA安装好后提交代码并要显示在JIRA上，这得需要用SVN的插件才能看见开发人员提交的代码。 1.下载svn与jira插件安装包，解压后在安装包(atlassian-jira-subversion-plugin-0.10.1) 2.解压出来的包里下的lib文件夹下的jar拷贝到(C:\Program Files\Atlassian\JIRA 4.3.4\atlassian-jira\WEB
常用数学思想方法 comsci 工作
对于搞工程和技术的朋友来讲，在工作中常常遇到一些实际问题，而采用常规的思维方式无法很好的解决这些问题，那么这个时候我们就需要用数学语言和数学工具，而使用数学工具的前提却是用数学思想的方法来描述问题。。下面转帖几种常用的数学思想方法，仅供学习和参考函数思想　　把某一数学问题用函数表示出来，并且利用函数探究这个问题的一般规律。这是最基本、最常用的数学方法
pl/sql集合类型 daizj oracle 集合 type pl/sql
--集合类型 /* 单行单列的数据，使用标量变量单行多列数据，使用记录单列多行数据，使用集合（。。。） *集合：类似于数组也就是。pl/sql集合类型包括索引表（pl/sql table）、嵌套表（Nested Table）、变长数组（VARRAY）等 */ /* --集合方法 &n
[Ofbiz]ofbiz初用 dinguangx 电商 ofbiz
从github下载最新的ofbiz（截止2015-7-13），从源码进行ofbiz的试用 1. 加载测试库 ofbiz内置derby，通过下面的命令初始化测试库 ./ant load-demo (与load-seed有一些区别) 2. 启动内置tomcat ./ant start 或 ./startofbiz.sh 或 java -jar ofbiz.jar &
结构体中最后一个元素是长度为0的数组 dcj3sjt126com c gcc
在Linux源代码中，有很多的结构体最后都定义了一个元素个数为0个的数组，如/usr/include/linux/if_pppox.h中有这样一个结构体： struct pppoe_tag { __u16 tag_type; __u16 tag_len; &n
Linux cp 实现强行覆盖 dcj3sjt126com linux
发现在Fedora 10 /ubutun 里面用cp -fr src dest，即使加了-f也是不能强行覆盖的，这时怎么回事的呢？一两个文件还好说，就输几个yes吧，但是要是n多文件怎么办，那还不输死人呢？下面提供三种解决办法。方法一我们输入alias命令，看看系统给cp起了一个什么别名。 [root@localhost ~]# aliasalias cp=’cp -i’a
Memcached(一)、HelloWorld frank1234 memcached
一、简介高性能的架构离不开缓存，分布式缓存中的佼佼者当属memcached，它通过客户端将不同的key hash到不同的memcached服务器中，而获取的时候也到相同的服务器中获取，由于不需要做集群同步，也就省去了集群间同步的开销和延迟，所以它相对于ehcache等缓存来说能更好的支持分布式应用，具有更强的横向伸缩能力。二、客户端选择一个memcached客户端，我这里用的是memc
Search in Rotated Sorted Array II hcx2013 search
Follow up for "Search in Rotated Sorted Array":What if duplicates are allowed? Would this affect the run-time complexity? How and why? Write a function to determine if a given ta
Spring4新特性——更好的Java泛型操作API jinnianshilongnian spring4 generic type
Spring4新特性——泛型限定式依赖注入 Spring4新特性——核心容器的其他改进 Spring4新特性——Web开发的增强 Spring4新特性——集成Bean Validation 1.1(JSR-349)到SpringMVC Spring4新特性——Groovy Bean定义DSL Spring4新特性——更好的Java泛型操作API Spring4新
CentOS安装JDK liuxingguome centos
1、行卸载原来的： [root@localhost opt]# rpm -qa | grep java tzdata-java-2014g-1.el6.noarch java-1.7.0-openjdk-1.7.0.65-2.5.1.2.el6_5.x86_64 java-1.6.0-openjdk-1.6.0.0-11.1.13.4.el6.x86_64 [root@localhost
二分搜索专题2-在有序二维数组中搜索一个元素 OpenMind 二维数组算法二分搜索
1,设二维数组p的每行每列都按照下标递增的顺序递增。用数学语言描述如下：p满足 (1),对任意的x1，x2，y，如果x1<x2,则p(x1,y)<p(x2,y); (2),对任意的x，y1,y2, 如果y1<y2,则p(x,y1)<p(x,y2); 2,问题：给定满足1的数组p和一个整数k，求是否存在x0,y0使得p(x0,y0)=k? 3,算法分析： (
java 随机数 Math与Random SaraWon java Math Random
今天需要在程序中产生随机数，知道有两种方法可以使用，但是使用Math和Random的区别还不是特别清楚，看到一篇文章是关于的，觉得写的还挺不错的，原文地址是 http://www.oschina.net/question/157182_45274?sort=default&p=1#answers 产生1到10之间的随机数的两种实现方式： //Math Math.roun
oracle创建表空间 tugn oracle
create temporary tablespace TXSJ_TEMP tempfile 'E:\Oracle\oradata\TXSJ_TEMP.dbf' size 32m autoextend on next 32m maxsize 2048m extent m
使用Java8实现自己的个性化搜索引擎 yangshangchuan java superword 搜索引擎 java8 全文检索
需要对249本软件著作实现句子级别全文检索，这些著作均为PDF文件，不使用现有的框架如lucene，自己实现的方法如下： 1、从PDF文件中提取文本，这里的重点是如何最大可能地还原文本。提取之后的文本，一个句子一行保存为文本文件。 2、将所有文本文件合并为一个单一的文本文件，这样，每一个句子就有一个唯一行号。 3、对每一行文本进行分词，建立倒排表，倒排表的格式为：词=包含该词的总行数N=行号