文章标题:TRAINING DEEP NEURAL-NETWORKS USING A NOISE ADAPTATION LAYER, Jacob Goldberger & Ehud Ben-Reuven Engineering Faculty, Bar-Ilan University
摘要:不准确的标签会严重影响分类器的表现,当观测的标签有噪声时,我们可以吧正确的标签认为是一个隐变量,把噪声过程认为是一个未知参数的communication channel。该作者之前使用的方法是EM算法来寻找此未知变量,并且以此来估计正确标签。在这篇文章中作者使用多加的一层神经网络的softmax层来模拟EM算法优化的似然函数。
perm = np.array([7, 9, 0, 4, 2, 1, 3, 5, 6, 8])
我的笔记:感觉这种处理方法没有很靠谱,也没有贴近实际。比如原始标签为7的被污染(认错)时可能会被误标记为9,1等等。这里的翻转矩阵看起来是随机标记的。
神经网络是两层简单的全连接层。
nhiddens = [500, 300]
DROPOUT = 0.5
opt = 'adam'
batch_size = 256
patience = 4 # Early stopping patience
epochs = 40 # number of epochs to train on
from keras.models import Sequential
hidden_layers = Sequential(name='hidden')
from keras.layers import Dense, Dropout, Activation
for i, nhidden in enumerate(nhiddens):
hidden_layers.add(Dense(nhidden,
input_shape=(img_size,) if i == 0 else []))
hidden_layers.add(Activation('relu'))
hidden_layers.add(Dropout(DROPOUT))
from keras.layers import Input
train_inputs = Input(shape=(img_size,))
last_hidden = hidden_layers(train_inputs)
baseline_output = Dense(nb_classes, activation='softmax', name='baseline')(
last_hidden)
from keras.models import Model
model = Model(inputs=train_inputs, outputs=baseline_output)
model.compile(loss='sparse_categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
# baseline model performance evaluation before training
def eval(model, y_test=y_test):
return dict(zip(model.metrics_names,
model.evaluate(X_test, y_test, verbose=False)))
print(eval(model))
# ### baseline training
from keras.callbacks import EarlyStopping
train_res = model.fit(X_train_train,
y_train_train,
batch_size=batch_size,
epochs=epochs,
verbose=verbose,
validation_data=(X_train_val,
y_train_val),
callbacks=
[EarlyStopping(patience=patience, mode='min',
verbose=verbose)]
)
# ### baseline performance
print(eval(model))
# build confusion matrix (prediction,noisy_label)
ybaseline_predict = model.predict(X_train, batch_size=batch_size)
ybaseline_predict = np.argmax(ybaseline_predict, axis=-1)
baseline_confusion = np.zeros((nb_classes, nb_classes))
for n, p in zip(y_train_noise, ybaseline_predict):
baseline_confusion[p, n] += 1.
import matplotlib.pyplot as plt
# perm_bias_weights.astype(int)
plt.pcolor(baseline_confusion)
plt.ylabel('true labels')
plt.xlabel('baseline labels')
plt.title('baseline confusion matrix');
channel_weights = baseline_confusion.copy()
channel_weights /= channel_weights.sum(axis=1, keepdims=True)
# perm_bias_weights[prediction,noisy_label] = log(P(noisy_label|prediction))
channel_weights = np.log(channel_weights + 1e-8)