PyTorch报错“/.../Loss.cu: ... [59,0,0] Assertion input_val >= zero && input_val <= one failed.”

1 问题描述

今天在调试代码的时候,出现这样一个错误:

PyTorch Assertion:
/opt/conda/conda-bld/pytorch_1640811803361/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [123,0,0], thread: [13,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1640811803361/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [123,0,0], thread: [14,0,0] Assertion input_val >= zero && input_val <= one failed.

/opt/conda/conda-bld/pytorch_1640811803361/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [533,0,0], thread: [59,0,0] Assertion input_val >= zero && input_val <= one failed.
Traceback (most recent call last):
File “xxx.py”, line 268, in train_stages
assert reponse_mask.any()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

2 解决方案

BCE-Loss的输入要求其范围在 ( 0 , 1 ) (0,1) (0,1),不包含01
如果使用clamp()函数后仍出现这个问题,可能时因为输入张量中包含nan,需要使用tensor.isnan().any()对张量进行检查;

你可能感兴趣的:(python,学习,开发语言)