python gzip pickle_numpy - Python gzip: OverflowError size does not fit in an int

I am trying to serialize a large python object, composed of a tuple of numpy arrays using pickle/cPickle and gzip. The procedure works well up to a certain size of the data, and after that I receive the following error:

--> 121 cPickle.dump(dataset_pickle, f)

***/gzip.pyc in write(self, data)

238 print(type(self.crc))

239 print(self.crc)

--> 240 self.crc = zlib.crc32(data, self.crc) & 0xffffffffL

241 self.fileobj.write( self.compress.compress(data) )

OverflowError: size does not fit in an int

The size of the numpy array is around 1.5 GB and the string sent to zlib.crc32 exceeds 2 GB. I am working on a 64-bit machine and my Python is also 64-bit

>>> import sys

>>> sys.maxsize

9223372036854775807

Is it a bug with python or am I doing something wrong? Are there any good alternatives for compressing and serializing numpy arrays? I am taking a look at numpy.savez, PyTables and HDF5 right now, but it would be good to know why I am having this problems since I have enough memory

Update: I remember reading somewhere that this could be caused by using an old version of Numpy (and I was), but I've fully switched to numpy.save/savez instead which is actually faster than cPickle (at least in my case)

python

numpy

serialization

gzip

pickle

|

this question

edited Feb 28 '16 at 15:24 asked May 21 '15 at 14:07

gsmafra 543 4 13

|

1 Answers

1

---Accepted---Accepted---Accepted---

This seems to be a bug in python 2.7ist and a numpy list. My code is import timeitimport numpy as npt = timeit.Timer("range(1000)")print t.timeit()u = timeit.Timer("np.arange(1000)")print u.timeit() Calculation for t is fine, but for u NameError: global name 'np' is n

From inspecting the bug report, it does not look like there is a pending solution to it. Your best bet would be to move to python 3 which apparently did not exhibit this bug.

|

this answer answered Jul 4 '16 at 5:34

Perennial 61 4      Looks like the issue was closed. –

Francisco Couzo Nov 1 '16 at 18:58

|

on) that contain a specific parameter. XML is about 12 GB unpacked. abcde

你可能感兴趣的:(python,gzip,pickle)