作用:以key分桶(一般来说应用于hadoop后的reduce阶段),lines包含顺序向下key的所有行
groupby函数的两个参数:
测试脚本:
from itertools import groupby
things = [("animal", "bear"), ("animal", "duck"), ("plant", "cactus"), ("vehicle", "speed boat"), ("vehicle", "school bus")]
for key, group in groupby(things, lambda x: x[0]):
for thing in group:
print "A %s is a %s." % (thing[1], key)
print " "
测试数据:
A bear is a animal.
A duck is a animal.
A cactus is a plant.
A speed boat is a vehicle.
A school bus is a vehicle.
测试脚本2–for标准输入:
import itertools
import sys
for key, lines in itertools.groupby(sys.stdin,key = lambda x : x.split(" ")[:2]):
for line in lines:
print line,
print key
print "\n"