numpy.core._exceptions.MemoryError: Unable to allocate 27.0 GiB for an array with shape (38064, 38064, 10) and data type float16
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "D:\Anaconda\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "F:\09-code\02-wind_profile\function\northEastPlain.py", line 758, in main_function
data_matrix = spectral_wind_profile.cal_similarity_matrix(df_10min)
File "F:\09-code\02-wind_profile\function\northEastPlain.py", line 336, in cal_similarity_matrix
data = self.cal_euclid_distance_matrix(df_10min_e.values)
File "F:\09-code\02-wind_profile\function\northEastPlain.py", line 386, in cal_euclid_distance_matrix
S_std = self.custom_operation(Xc[:, np.newaxis, :], Xc[np.newaxis, :, :])
File "F:\09-code\02-wind_profile\function\northEastPlain.py", line 399, in custom_operation
return np.std(row - col, axis=2)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 27.0 GiB for an array with shape (38064, 38064, 10) and data type float16
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "D:\pycharm\PyCharm Community Edition 2020.2\plugins\python-ce\helpers\pydev\pydevd.py", line 1448, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "D:\pycharm\PyCharm Community Edition 2020.2\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "F:/09-code/02-wind_profile/function/northEastPlain.py", line 931, in <module>
main()
File "F:/09-code/02-wind_profile/function/northEastPlain.py", line 921, in main
data = process.get()
File "D:\Anaconda\lib\multiprocessing\pool.py", line 771, in get
raise self._value
numpy.core._exceptions.MemoryError: Unable to allocate 27.0 GiB for an array with shape (38064, 38064, 10) and data type float16
Then I search the solve answer from Internet. There are some solve methods.
The error message indicates that when trying to create a NumPy array with a shape of (38064, 38064, 10) and a data type of float16 within the cal_euclid_distance_matrix
method inside the file northEastPlain.py
, there wasn’t enough memory available to allocate this array. The array was attempting to allocate 27GB of memory, which may exceed the physical memory of your machine or the memory limit allocated to the Python process.
To address this issue, you can consider the following options:
Reduce the size of the dataset: If possible, work with a smaller dataset or downsample the dataset.
Use data chunking: If reducing the dataset size is not an option, try implementing a chunking mechanism, processing only a small part of the dataset in memory at a time.
Optimize your algorithm: Check if the algorithm can be optimized to reduce memory usage. Sometimes different algorithms or approaches can significantly reduce the amount of memory needed.
Increase physical memory: If you often need to process such large datasets, consider upgrading your computer hardware.
Use external storage: For large datasets, some libraries and methods can leverage disk space to process data, such as using the chunksize
parameter in Pandas’s read_csv
or using libraries like Dask.
Change data type: If your algorithm can tolerate a loss in precision, consider reducing the precision even further from float16, though from your error, it seems float16 is already quite low in precision.
Optimize parallel processing: If you are using the multiprocessing
library and have multiple processes attempting to allocate large amounts of memory, you may need to optimize your parallel processing strategy, such as reducing the number of processes running simultaneously.
Dealing with out-of-memory errors often requires a comprehensive understanding of your program’s memory usage, including which parts are using the most memory and how to effectively manage memory allocation. This might require a more detailed review of the code.