hive-函数

1、建表导入json数据

建表:create table rating_json(json string);

导入数据:

load data local inpath'/home/hadoop/data/rating.json' into table rating_json;

2、取出数据

直接查询


hive-函数_第1张图片


明显不符合我们要求

我们需要用到“json_tuple”函数来解析

首先看官网定义

--------------------------------------------------------------------------------------------------------------------------------------

json_tuple

A new json_tuple() UDTF is introduced in Hive 0.7. It takes a set of names (keys) and a JSON string, and returns a tuple of values using one function. This is much more efficient than calling GET_JSON_OBJECT to retrieve more than one key from a single JSON string. In any case where a single JSON string would be parsed more than once, your query will be more efficient if you parse it once, which is what JSON_TUPLE is for. As JSON_TUPLE is a UDTF, you will need to use the LATERAL VIEW syntax in order to achieve the same goal.

For example,

select a.timestamp, get_json_object(a.appevents, '$.eventid'), get_json_object(a.appenvets, '$.eventname') from log a;

should be changed to:

select a.timestamp, b.*

from log a lateral view json_tuple(a.appevent, 'eventid', 'eventname') b as f1, f2;

--------------------------------------------------------------------------------------------------------------------------------------

简单来说就是一转多

select json_tuple(json,"movie","rate","time","userid") as (movie,rate,time,userid) from rating_json limit 10;

这才是我们熟悉的格式

hive-函数_第2张图片

查出来的数据第三列是时间戳,我们把时间戳转化一下转化为年月日,查询出的meta变为,

userid,movie,rate,time, year,month,day,hour,minute, ts

语句:

select t.movie,t.rate,t.time,t.userid,from_unixtime(cast(t.time as bigint),'yyyy'),from_unixtime(cast(t.time as bigint),'MM'),from_unixtime(cast(t.time as bigint),'dd'),from_unixtime(cast(t.time as bigint),'HH'),from_unixtime(cast(t.time as bigint),'mm'),from_unixtime(cast(t.time as bigint),'ss')

from (select json_tuple(json,"movie","rate","time","userid") as (movie,rate,time,userid) from rating_json  limit 10) t

查询结果:

Total MapReduce CPU Time Spent: 4 seconds 520 msec

OK

919    4      978301368      1      2001    01      01      06      22      48

594    4      978302268      1      2001    01      01      06      37      48

2804    5      978300719      1      2001    01      01      06      11      59

1287    5      978302039      1      2001    01      01      06      33      59

1197    3      978302268      1      2001    01      01      06      37      48

2355    5      978824291      1      2001    01      07      07      38      11

3408    4      978300275      1      2001    01      01      06      04      35

914    3      978301968      1      2001    01      01      06      32      48

661    3      978302109      1      2001    01      01      06      35      09

1193    5      978300760      1      2001    01      01      06      12      40

Time taken: 42.961 seconds, Fetched: 10 row(s)

查询性别相同的人,年龄最大的两个

数据:

1 18 ruoze M

2      19      jepson  M

3      22      wangwu  F

4      16      zhaoliu F

5      30      tianqi  M

6      26      wangba  F

使用分区函数:

row_number

select age,name,sex

from (select age,name,sex,row_number() over(PARTITION BY sex order by age desc)as rank from hive_rownumber) t

where rank < 3


自定义函数:

User-Defined Functions : UDF

UDF: 一进一出  upper  lower substring

UDAF:Aggregation  多进一出  count max min sum ...

UDTF: Table-Generation  一进多出

自定义函数demo:

创建一个maven项目:

pom文件:


hive-函数_第3张图片


hive-函数_第4张图片

java文件:


hive-函数_第5张图片

导出jar包,传到服务器上;

添加jar包:

add jar /home/hadoop/lib/hive.jar;

创建临时函数:

CREATE TEMPORARY FUNCTION sayHello AS 'com.ruoze.data.hive.HelloUDF';

测试函数是否已经加到hive中:

select sayhello('eeeeee','kkkkkkkkk') from hive_wc;

这样创建的是临时函数,重新打开一个窗口就找不到了;

我们在创建一个永久函数:

首先将jar包传到hdfs中:

hadoop fs -mkdir /lib

hadoop fs -put  hive.jar  /lib/

CREATE FUNCTION sayHello22 AS 'com.ruoze.data.hive.HelloUDF' USING JAR 'hdfs://xx.xx.xx.7:9000/lib/hive.jar';

完成!!!!

你可能感兴趣的:(hive-函数)