在构建一个新模型时的踩坑经历,模型描述
中间有一层特征图是拼接而成,缺少各部分分别归一化,并且有一个值一直是inf,查看特征如下:
- (tensor([[[3.8176e+01, 1.6472e+04, 2.9112e-01, 2.1578e-01, inf,
2.4098e-02, 2.4098e-02, 2.9844e-03, 3.0047e-03, 6.4136e-04]]],
dtype=torch.float64, grad_fn=<CatBackward0>),)
特征归一化
def data_normal(org_data):
d_min = org_data.min()
if d_min < 0:
org_data += torch.abs(d_min)
d_min = org_data.min()
d_max = org_data.max()
dst = d_max - d_min
norm_data = (org_data - d_min).true_divide(dst)
return norm_data
偶尔出现nan的输出可能是输入数据就含有nan值,需检查
after_total_nan_values = data.isnull().sum().sum()
print('after fillna, total_nan_values is: ', after_total_nan_values)
参考:https://blog.csdn.net/qq_35056292/article/details/116695219
将问题1中的org_data += torch.abs(d_min)
改为org_data = org_data + torch.abs(d_min)
常见原因:如果在迭代的100轮以内,出现NaN,一般情况下的原因是因为你的学习率过高,需要降低学习率。可以不断降低学习率直至不出现NaN为止,一般来说低于现有学习率1-10倍即可。(未解决)
自查,发现突然出现NAN的时候都会报错这样
[ -3000 -12100 64000 -46900 -1800 -12100 15600 -11300 -4300 18600
-9800]
D:\PythonProject\ECG-Bianrized-main\ECG-Bianrized-main\src\models\KEModel1022.py:36: RuntimeWarning: invalid value encountered in sqrt
rmssd = np.sqrt(np.mean(diff_nni ** 2))
可能是数据以毫秒为单位再经过各种平方计算数据太大超出了范围导致开平方之前出现了负数,改秒为单位试一下(已解决)
torch.Size([1, 3, 216, 1])
无法计算样本熵特征ValueError: cannot embed data of length 2 with embedding dimension 3 and lag 1, minimum required length is 3
在计算时间序列样本熵时调用了nold.sampen,sampen = nolds.sampen(nn_intervals, emb_dim=2)
,语法为
nolds.sampen(data, emb_dim=2, tolerance=None, dist=<function rowwise_chebyshev>, closed=False, debug_plot=False, debug_data=False, plot_file=None)[source]
看了下样本熵的介绍,dim改为1,dim改为1没有解决,暂时删掉样本熵这个特征
IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)
原因:在原始数据切片后有个别几个tensor数据维度为([])
,因此要在加载数据的初始化时删掉
解决:先用一个程序把错误数据的序号列出来,由于tensor不能删除行,转化成numpy删除,再转化回来。在得出要删除的数据的列表时经常出现差一两位的情况,数据不多的情况下多试几次就出来了。
self.x_train_temp = x_train.numpy()
self.y_train_temp = y_train.numpy()
del_list = [312, 1770, 2766]
self.x_train_temp = np.delete(self.x_train_temp, del_list, axis=0)
self.y_train_temp = np.delete(self.y_train_temp, del_list, axis=0)
self.x_train = torch.Tensor(self.x_train_temp)
self.y_train = torch.Tensor(self.y_train_temp)
RuntimeError: expected scalar type Long but found Float
在问题五里dataloader转化为numpy格式又转化回tensor会出现数据类型不对的情况,添加.type(torch.long)
注意不能使用类似于v = torch.tensor([0], dtype=torch.float)
会报错
self.x_train = torch.Tensor(self.x_train_temp).type(torch.long)
self.y_train = torch.Tensor(self.y_train_temp).type(torch.long)
tensor(1.6826, grad_fn=<NllLossBackward0>)
tensor(1.3434, grad_fn=<NllLossBackward0>)
D:\anaconda\envs\py_cpu\lib\site-packages\numpy\core\_methods.py:265: RuntimeWarning: Degrees of freedom <= 0 for slice
ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
D:\anaconda\envs\py_cpu\lib\site-packages\numpy\core\_methods.py:223: RuntimeWarning: invalid value encountered in divide
arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
D:\anaconda\envs\py_cpu\lib\site-packages\numpy\core\_methods.py:257: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
D:\anaconda\envs\py_cpu\lib\site-packages\numpy\core\fromnumeric.py:3432: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
D:\anaconda\envs\py_cpu\lib\site-packages\numpy\core\_methods.py:190: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
tensor(nan, grad_fn=<NllLossBackward0>)
loss在这个警告后都变成nan怀疑遇到了值为nan的数据,尝试删除该条数据,成功,方法同问题六
问题:为了降低损失,模型把全部数据都分类成数量较多的N样本
解决方法:
Train_DataSet‘ object has no attribute ‘get_label‘
,则说明如果用自建数据集此处应该对imbalanced.py文件进行修改,在报错出根据自定义类看着修改一下,比如我的修改为 elif isinstance(dataset, torch.utils.data.Dataset):
# return dataset.get_labels()
return dataset.y_train[:]