CUDA out of memory.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 31.75 GiB total capacity; 24.79 GiB already allocated; 20.94 MiB free; 26.37 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

There are a few possible ways to fix the out of memory error, depending on the available resources and the desired trade-offs. Here are some suggestions:

  • Reduce the batch size: This is the simplest way to reduce the memory consumption, but it may also affect the model performance and convergence speed. To reduce the batch size, change the value of per_device_train_batch_size in the TrainingArguments. For example, try setting it to 2 or 1 instead of 4.
  • Use gradient accumulation: This is a technique that allows training with larger effective batch sizes without increasing the memory usage per step. It works by accumulating the gradients over several steps and updating the model parameters only after a certain number of steps. To usegradient accumulation, set the value of gradient_accumulation_steps in the TrainingArguments. For example, if the original batch size is 4 and the gradient accumulation steps is 2, then the effective batch size is 8, but the memory usage per step is the same as with batch size 4. Note that the learning rate may need to be adjusted accordingly.
  • Use mixed precision training: This is a technique that uses lower precision (e.g. 16-bit) for some operations and tensors, which can reduce the memory usage and speed up the training. However, it may also introduce some numerical instability and require careful tuning. To use mixed precision training, install the apex library (https://github.com/NVIDIA/apex) and set the value of fp16 to True in the TrainingArguments. Alternatively, use the native PyTorch AMP (https://pytorch.org/docs/stable/amp.html) by setting the value of fp16_backend to “native” in the TrainingArguments.
  • Use a smaller model: This is another straightforward way to reduce the memory consumption, but it may also compromise the model quality and generation capabilities. To use a smaller model, change the value of the model_name_or_path argument when loading the tokenizer and the model. For example, try using “microsoft/prophetnet-base-uncased” instead of “microsoft/prophetnet-large-uncased”.
  • Use a differentaccelerator backend: This may not be feasible depending on the available hardware and software, but it may help to use a different backend for distributed training or multi-GPU training. For example, try using torch.distributed or horovod instead of accelerate, or use DeepSpeed (https://github.com/microsoft/DeepSpeed) which offers some memory optimization techniques. To use a different backend, change the value of backend in the Accelerator class (https://huggingface.co/docs/accelerate/api_reference.html#accelerator).

你可能感兴趣的:(python)