I am trying to serialize a large python object, composed of a tuple of numpy arrays using pickle/cPickle and gzip. The procedure works well up to a certain size of the data, and after that I receive the following error:
--> 121 cPickle.dump(dataset_pickle, f)
***/gzip.pyc in write(self, data)
238 print(type(self.crc))
239 print(self.crc)
--> 240 self.crc = zlib.crc32(data, self.crc) & 0xffffffffL
241 self.fileobj.write( self.compress.compress(data) )
OverflowError: size does not fit in an int
The size of the numpy array is around 1.5 GB and the string sent to zlib.crc32 exceeds 2 GB. I am working on a 64-bit machine and my Python is also 64-bit
>>> import sys
>>> sys.maxsize
9223372036854775807
Is it a bug with python or am I doing something wrong? Are there any good alternatives for compressing and serializing numpy arrays? I am taking a look at numpy.savez, PyTables and HDF5 right now, but it would be good to know why I am having this problems since I have enough memory
Update: I remember reading somewhere that this could be caused by using an old version of Numpy (and I was), but I've fully switched to numpy.save/savez instead which is actually faster than cPickle (at least in my case)
python
numpy
serialization
gzip
pickle
|
this question
edited Feb 28 '16 at 15:24 asked May 21 '15 at 14:07
gsmafra 543 4 13
|
1 Answers
1
---Accepted---Accepted---Accepted---
This seems to be a bug in python 2.7ist and a numpy list. My code is import timeitimport numpy as npt = timeit.Timer("range(1000)")print t.timeit()u = timeit.Timer("np.arange(1000)")print u.timeit() Calculation for t is fine, but for u NameError: global name 'np' is n
From inspecting the bug report, it does not look like there is a pending solution to it. Your best bet would be to move to python 3 which apparently did not exhibit this bug.
|
this answer answered Jul 4 '16 at 5:34
Perennial 61 4 Looks like the issue was closed. –
Francisco Couzo Nov 1 '16 at 18:58
|
on) that contain a specific parameter. XML is about 12 GB unpacked. abcde