HIVE: Transform应用实例

数据文件内容

steven:100;steven:90;steven:99^567^22

ray:90;ray:98^456^30

Tom:81^222^33

期望最终放到数据库的数据格式如下:

steven    100    567     22

steven    90      567     22

steven    99      567     22

ray       90      456    30

ray       98      456    30

Tom       81      222    33

Specifically, if you want to return a different number of columns, or a different number of rows for a given input row, then yu need to perform what hive calls a transform.

 

1.创建表存储原始数据

create table u_data(col1 string, code int, age int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '^' STORED AS TEXTFILE;

2.加载数据

load data local inpath '/home/stevenxia/data1' overwrite into table u_data;

3.编写transform脚本

#!/usr/bin/python

import sys

for line in sys.stdin:

 values = line.split()

 tmp = values[0]

 key_values = tmp.split(";")

 for kv in key_values:

  k = kv.split(":")[0]

  v = kv.split(":")[1]

  print '\t'.join([k,v,values[1],values[2]])

4.把脚本部署到node节点, 位置 /home/stevenxia/u.py

5.这样hive就可以使用了

select transform(u.col1, u.code, u.age) using '/home/stevenxia/u.py' as (col1, col2, col3, col4) from (select * from u_data) as u;

运行结果

 

你可能感兴趣的:(transform)