python平均分配文件_在Python中,如何将csv文件分割成均匀大小的块?

只需将您的阅读器包装在列表中即可.显然这会打破真正的大文件(见下面的更新中的替代方案):

>>> reader = csv.reader(open('big.csv', 'rb'))

>>> lines = list(reader)

>>> print lines[:100]

...

更新1(列表版本):另一种可能的方式是处理每个卡盘,当它到达时迭代遍历:

#!/usr/bin/env python

import csv

reader = csv.reader(open('4956984.csv', 'rb'))

chunk, chunksize = [], 100

def process_chunk(chuck):

print len(chuck)

# do something useful ...

for i, line in enumerate(reader):

if (i % chunksize == 0 and i > 0):

process_chunk(chunk)

del chunk[:]

chunk.append(line)

# process the remainder

process_chunk(chunk)

更新2(生成器版本):我没有对它进行基准测试,但也许您可以通过使用块生成器来提高性能:

#!/usr/bin/env python

import csv

reader = csv.reader(open('4956984.csv', 'rb'))

def gen_chunks(reader, chunksize=100):

"""

Chunk generator. Take a CSV `reader` and yield

`chunksize` sized slices.

"""

chunk = []

for i, line in enumerate(reader):

if (i % chunksize == 0 and i > 0):

yield chunk

del chunk[:]

chunk.append(line)

yield chunk

for chunk in gen_chunks(reader):

print chunk # process chunk

# test gen_chunk on some dummy sequence:

for chunk in gen_chunks(range(10), chunksize=3):

print chunk # process chunk

# => yields

# [0, 1, 2]

# [3, 4, 5]

# [6, 7, 8]

# [9]

你可能感兴趣的:(python平均分配文件)