pig基础操作
原始数据
hdj,network,tigle,100 md,database,tigle,99 wqy,pde,yao,94 zx,network,tigle,98 mmd,pde,yao,98 zx,pde,yao,100
一:查询每个学生被几个老师教过
A = load 'score.txt' using PigStorage(',') as (student, corse, teacher, score:int); describe A; B = foreach A generate student, teacher; C = distinct B; D = foreach (group C by student) generate group as student, COUNT(C); dump D; ###运行结果### (md,1) (zx,2) (hdj,1) (mmd,1) (wqy,1)
A = load 'score.txt' using PigStorage(',') as (student, corse, teacher, score:int); describe A; B = foreach A generate student, teacher; E = group B by student; F = foreach E { T = B.teacher; uniq = distinct T; generate group as student, COUNT(uniq) as cnt; } dump F; ###运行结果### (md,1) (zx,2) (hdj,1) (mmd,1) (wqy,1)
二:查询每个科目的前两名学生
A = load 'score.txt' using PigStorage(',') as (student, corse, teacher, score:int); B = foreach A generate student, corse, score; C = group B by corse; describe C; D = foreach C { sorted = order B by score DESC; top = LIMIT sorted 2; generate group as course, top as top; } dump D; E = foreach D generate course, flatten (top); dump E; ####运行结果#### (pde,zx,pde,100) (pde,mmd,pde,98) (network,hdj,network,100) (network,zx,network,98) (database,md,database,99)
操作时报错:
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias passwd. Backend error : javadoop/192.168.0.2 to master.hadoop:10020 failed on connection exception: java.net.ConnectException: 拒绝连接; For more deta Details at logfile: /usr/local/pig/pig_1433189043690.log
原因是:10020端口的服务没有打开,打开命令是:
mr-jobhistory-daemon.sh start historyserver