TensorFlow以及Keras里面在处理多分类时有sparse_categorical_crossentropy和categorical_crossentropy。两者的区别在于训练的Label的形态:
另外tensorflow官方源码中列举了相关参数:
Args:
from_logits: Whether `y_pred` is expected to be a logits tensor. By default,
we assume that `y_pred` encodes a probability distribution.
Note: Using from_logits=True may be more numerically stable.
reduction: (Optional) Type of `tf.keras.losses.Reduction` to apply to loss.
Default value is `AUTO`. `AUTO` indicates that the reduction option will
be determined by the usage context. For almost all cases this defaults to
`SUM_OVER_BATCH_SIZE`.
When used with `tf.distribute.Strategy`, outside of built-in training
loops such as `tf.keras` `compile` and `fit`, using `AUTO` or
`SUM_OVER_BATCH_SIZE` will raise an error. Please see
https://www.tensorflow.org/alpha/tutorials/distribute/training_loops
for more details on this.
name: Optional name for the op.
y_pred
的形式,如果设置成True
表示是一个未经过归一化的对数概率输出。默认情况下是False
,表示预测经过经过Softmax归一化。官方文档中表示设置成True
在数值上更稳定。后文代码中我针对相关参数的设定进行了对比分析fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
train_images = train_images / 255.0
test_images = test_images / 255.0
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(28,28)),
tf.keras.layers.Dense(128,activation='relu'),
tf.keras.layers.Dense(10)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=10)
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
tf.print('\nTest accuracy:', test_acc)
初始模型中将from_logits
设置为True
,模型的准确率为: 0.8812;将参数修改成False
后,模型的准确率为:0.332
probability_model = tf.keras.Sequential([model,
tf.keras.layers.Softmax()])
predictions = probability_model.predict(test_images)
tf.print(predictions[0])
#输出
array([6.3350608e-06, 1.9365751e-09, 3.4766887e-09, 1.3867620e-08,
1.5902913e-08, 1.4535291e-03, 2.8523655e-06, 1.9025985e-02,
6.8801391e-08, 9.7951114e-01], dtype=float32)
对比不使用Softmax:
probability_model = tf.keras.Sequential([model])
predictionLogit = probability_model.predict(test_images)
tf.print(predictionLogit[0])
# 输出
array([ -6.288512 , -14.381447 , -13.796287 , -12.41281 , -12.275866 ,
-0.8528625, -7.086463 , 1.7189487, -10.811144 , 5.660197 ],
dtype=float32)
img = test_images[1]
# imag's shape = (1, 28, 28)
img = (np.expand_dims(img,0))
predictions_single = probability_model.predict(img)
print(predictions_single)
[[1.1843847e-05 2.8502357e-11 9.9778062e-01 3.2734149e-10 2.0844834e-03
3.5600198e-15 1.2303848e-04 1.4568713e-08 3.6617865e-11 5.2883337e-14]]