pig基础操作
原始数据
hdj,network,tigle,100
md,database,tigle,99
wqy,pde,yao,94
zx,network,tigle,98
mmd,pde,yao,98
zx,pde,yao,100
一:查询每个学生被几个老师教过
A = load 'score.txt' using PigStorage(',') as (student, corse, teacher, score:int); describe A; B = foreach A generate student, teacher; C = distinct B; D = foreach (group C by student) generate group as student, COUNT(C); dump D; ###运行结果### (md,1) (zx,2) (hdj,1) (mmd,1) (wqy,1)
A = load 'score.txt' using PigStorage(',') as (student, corse, teacher, score:int); describe A; B = foreach A generate student, teacher; E = group B by student; F = foreach E { T = B.teacher; uniq = distinct T; generate group as student, COUNT(uniq) as cnt; } dump F; ###运行结果### (md,1) (zx,2) (hdj,1) (mmd,1) (wqy,1)
二:查询每个科目的前两名学生
A = load 'score.txt'
using PigStorage(',')
as (student, corse, teacher, score:int);
B = foreach A generate student, corse, score;
C = group B by corse;
describe C;
D = foreach C
{
sorted = order B by score DESC;
top = LIMIT sorted 2;
generate group as course, top as top;
}
dump D;
E = foreach D generate course, flatten (top);
dump E;
####运行结果####
(pde,zx,pde,100)
(pde,mmd,pde,98)
(network,hdj,network,100)
(network,zx,network,98)
(database,md,database,99)
操作时报错:
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias passwd. Backend error : javadoop/192.168.0.2 to master.hadoop:10020 failed on connection exception: java.net.ConnectException: 拒绝连接; For more deta
Details at logfile: /usr/local/pig/pig_1433189043690.log
原因是:10020端口的服务没有打开,打开命令是:
mr-jobhistory-daemon.sh start historyserver