numpy.core._exceptions.MemoryError: Unable to allocate 27.0 GiB for an array with shape (38064, 3806

文章目录

  • answer from Internet
    • How can I dataset or downsample the dataset

numpy.core._exceptions.MemoryError: Unable to allocate 27.0 GiB for an array with shape (38064, 38064, 10) and data type float16
The follow information is:

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "D:\Anaconda\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "F:\09-code\02-wind_profile\function\northEastPlain.py", line 758, in main_function
    data_matrix = spectral_wind_profile.cal_similarity_matrix(df_10min)
  File "F:\09-code\02-wind_profile\function\northEastPlain.py", line 336, in cal_similarity_matrix
    data = self.cal_euclid_distance_matrix(df_10min_e.values)
  File "F:\09-code\02-wind_profile\function\northEastPlain.py", line 386, in cal_euclid_distance_matrix
    S_std = self.custom_operation(Xc[:, np.newaxis, :], Xc[np.newaxis, :, :])
  File "F:\09-code\02-wind_profile\function\northEastPlain.py", line 399, in custom_operation
    return np.std(row - col, axis=2)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 27.0 GiB for an array with shape (38064, 38064, 10) and data type float16
"""
The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "D:\pycharm\PyCharm Community Edition 2020.2\plugins\python-ce\helpers\pydev\pydevd.py", line 1448, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "D:\pycharm\PyCharm Community Edition 2020.2\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "F:/09-code/02-wind_profile/function/northEastPlain.py", line 931, in <module>
    main()
  File "F:/09-code/02-wind_profile/function/northEastPlain.py", line 921, in main
    data = process.get()
  File "D:\Anaconda\lib\multiprocessing\pool.py", line 771, in get
    raise self._value
numpy.core._exceptions.MemoryError: Unable to allocate 27.0 GiB for an array with shape (38064, 38064, 10) and data type float16

answer from Internet

Then I search the solve answer from Internet. There are some solve methods.

The error message indicates that when trying to create a NumPy array with a shape of (38064, 38064, 10) and a data type of float16 within the cal_euclid_distance_matrix method inside the file northEastPlain.py, there wasn’t enough memory available to allocate this array. The array was attempting to allocate 27GB of memory, which may exceed the physical memory of your machine or the memory limit allocated to the Python process.

To address this issue, you can consider the following options:

  1. Reduce the size of the dataset: If possible, work with a smaller dataset or downsample the dataset.

  2. Use data chunking: If reducing the dataset size is not an option, try implementing a chunking mechanism, processing only a small part of the dataset in memory at a time.

  3. Optimize your algorithm: Check if the algorithm can be optimized to reduce memory usage. Sometimes different algorithms or approaches can significantly reduce the amount of memory needed.

  4. Increase physical memory: If you often need to process such large datasets, consider upgrading your computer hardware.

  5. Use external storage: For large datasets, some libraries and methods can leverage disk space to process data, such as using the chunksize parameter in Pandas’s read_csv or using libraries like Dask.

  6. Change data type: If your algorithm can tolerate a loss in precision, consider reducing the precision even further from float16, though from your error, it seems float16 is already quite low in precision.

  7. Optimize parallel processing: If you are using the multiprocessing library and have multiple processes attempting to allocate large amounts of memory, you may need to optimize your parallel processing strategy, such as reducing the number of processes running simultaneously.

Dealing with out-of-memory errors often requires a comprehensive understanding of your program’s memory usage, including which parts are using the most memory and how to effectively manage memory allocation. This might require a more detailed review of the code.

How can I dataset or downsample the dataset

你可能感兴趣的:(bug,numpy)