python生成器-管道学习(三)

part3

开局先介绍一个python标准库:pathlib 类似于os模块,但是要比os强大

for filename in Path('/').rglob('*.py'):
   print(filename)

这个打印出来是一个生成器
需求:
你有很多服务器请求日志,分散在各个目录下,有的日志还被压缩了,你要在这很多文件中找到你需要的那些日志:

import gzip, bz2
import re
from pathlib import Path

def gen_find(filepat, top):
    yield from Path(top).rglob(filepat)

def gen_open(paths):
    for path in paths:
        if path.suffix == '.gz':
            yield gzip.open(path, 'rt')
        elif path.suffix == '.bz2':
            yield bz2.open(path, 'rt')
        else:
            yield open(path, 'rt')
def gen_cat(sources):
    for src in sources:
        yield from src



def gen_grep(pat, lines):
    patc = re.compile(pat)
    return (line for line in lines if patc.search(line))




if __name__ == '__main__':

    pat    = r'ply-.*\.gz'
    logdir = 'www'
    
    #开始寻找文件,这个时候filesnames是一个生成器,
    filenames = gen_find("access-log*",logdir)
    #此时logfiles也是一个生成器
    logfiles  = gen_open(filenames)
    
    loglines  = gen_cat(logfiles)
    patlines  = gen_grep(pat,loglines)
    bytecol   = (line.rsplit(None,1)[1] for line in patlines)
    bytes_sent= (int(x) for x in bytecol if x != '-')

    print("Total", sum(bytes_sent))

知识点:yield和yield from 的区别:
简单理解:
yield是返回一个值,yield from 是返回生成器的每一个值。
示例代码:

def up():
    yield from [1,2,3,4,5,6]


for i in up():
    print(i)
###output
1
2
3
4
5
6

你可能感兴趣的:(python生成器-管道学习(三))