Hive找出掉线率最高的前10基站&WordCount

统计出掉线率最高的前10基站

record_time:通话时间
imei:基站编号
cell:手机编号
drop_num:掉话的秒数
duration:通话持续总秒数

建原始数据表

hive > create table cell_monitor(
record_time string,
imei string,
cell string,
ph_num int,
call_num int,
drop_num int,
duration int,
drop_rate DOUBLE,
net_type string,
erl string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

如果文件数据是纯文本,用 STORED AS TEXTFILE
如果数据需要压缩,用 STORED AS SEQUENCE

结果表

hive > create table cell_drop_monitor(
imei string,
total_call_num int,
total_drop_num int,
d_rate DOUBLE
) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;

导入数据

hive > LOAD DATA LOCAL INPATH '/root/cdr_summ_imei_cell_info.csv' OVERWRITE INTO TABLE cell_monitor;
hive > select * from cell_monitor limit 5;
第一行表头也读取了,但解析不了
Hive找出掉线率最高的前10基站&WordCount_第1张图片
image.png

找出掉线率最高的基站

hive> from cell_monitor cm 
insert overwrite table cell_drop_monitor  
select cm.imei, sum(cm.drop_num), sum(cm.duration), sum(cm.drop_num)/sum(cm.duration) d_rate 
group by cm.imei 
sort by d_rate desc;

desc 从高到低

hive执行结果

hive> select * from cell_drop_monitor limit 10;
OK
639876  1   734 0.0013623978201634877
356436  1   1028    9.727626459143969E-4
351760  1   1232    8.116883116883117E-4
368883  1   1448    6.906077348066298E-4
358849  1   1469    6.807351940095302E-4
358231  1   1613    6.199628022318661E-4
863738  2   3343    5.982650314089142E-4
865011  1   1864    5.36480686695279E-4
862242  1   1913    5.227391531625719E-4
350301  2   3998    5.002501250625312E-4
基站掉话率



hive实现WordCount

建表

create table docs(line string);
create table wc(word string, total int);

加载数据&统计

hive> load data local inpath '/root/wc' into table docs;
hive> from (select explode(split(line, ' ')) as word from docs) t1
    insert into table wc
    select t1.word, count(t1.word) 
    group by t1.word 
    sort by t1.word;


select split(line, ' ') from docs
将字符串转成数组

UDAF:一进多出explode
select explode(split(line, ' ')) from docs
转为一个个的单词

查询结果

hive> select * from wc;
OK
c++ 2
hbase   2
hello   17
hive    1
java    5
matlab  3
mongodb 1
mysql   3
objective-c 2
oracle  1
pig 1
python  8
redies  2
sqoop   3
swift   3
word    4
zookeeper   1



原始数据

hello word objective-c hive
hello java python
hello python mysql
hello hbase python
hello word swift redies
hello java oracle sqoop
hello python swift pig
hello redies c++ mysql
hello python matlab
hello java c++ matlab
hello python sqoop
hello word sqoop objective-c
hello java zookeeper
hello python matlab
hello word mongodb
hello java mysql hbase
hello python swift

你可能感兴趣的:(Hive找出掉线率最高的前10基站&WordCount)