统计出掉线率最高的前10基站
record_time:通话时间
imei:基站编号
cell:手机编号
drop_num:掉话的秒数
duration:通话持续总秒数
建原始数据表
hive > create table cell_monitor(
record_time string,
imei string,
cell string,
ph_num int,
call_num int,
drop_num int,
duration int,
drop_rate DOUBLE,
net_type string,
erl string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
如果文件数据是纯文本,用 STORED AS TEXTFILE
如果数据需要压缩,用 STORED AS SEQUENCE
结果表
hive > create table cell_drop_monitor(
imei string,
total_call_num int,
total_drop_num int,
d_rate DOUBLE
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;
导入数据
hive > LOAD DATA LOCAL INPATH '/root/cdr_summ_imei_cell_info.csv' OVERWRITE INTO TABLE cell_monitor;
hive > select * from cell_monitor limit 5;
第一行表头也读取了,但解析不了
找出掉线率最高的基站
hive> from cell_monitor cm
insert overwrite table cell_drop_monitor
select cm.imei, sum(cm.drop_num), sum(cm.duration), sum(cm.drop_num)/sum(cm.duration) d_rate
group by cm.imei
sort by d_rate desc;
desc 从高到低
hive执行结果
hive> select * from cell_drop_monitor limit 10;
OK
639876 1 734 0.0013623978201634877
356436 1 1028 9.727626459143969E-4
351760 1 1232 8.116883116883117E-4
368883 1 1448 6.906077348066298E-4
358849 1 1469 6.807351940095302E-4
358231 1 1613 6.199628022318661E-4
863738 2 3343 5.982650314089142E-4
865011 1 1864 5.36480686695279E-4
862242 1 1913 5.227391531625719E-4
350301 2 3998 5.002501250625312E-4
hive实现WordCount
建表
create table docs(line string);
create table wc(word string, total int);
加载数据&统计
hive> load data local inpath '/root/wc' into table docs;
hive> from (select explode(split(line, ' ')) as word from docs) t1
insert into table wc
select t1.word, count(t1.word)
group by t1.word
sort by t1.word;
select split(line, ' ') from docs
将字符串转成数组
UDAF:一进多出explode
select explode(split(line, ' ')) from docs
转为一个个的单词
查询结果
hive> select * from wc;
OK
c++ 2
hbase 2
hello 17
hive 1
java 5
matlab 3
mongodb 1
mysql 3
objective-c 2
oracle 1
pig 1
python 8
redies 2
sqoop 3
swift 3
word 4
zookeeper 1
原始数据
hello word objective-c hive
hello java python
hello python mysql
hello hbase python
hello word swift redies
hello java oracle sqoop
hello python swift pig
hello redies c++ mysql
hello python matlab
hello java c++ matlab
hello python sqoop
hello word sqoop objective-c
hello java zookeeper
hello python matlab
hello word mongodb
hello java mysql hbase
hello python swift