损失函数

TensorFlow以及Keras里面在处理多分类时有sparse_categorical_crossentropy和categorical_crossentropy。两者的区别在于训练的Label的形态:
  • one-hot: categorical_crossentropy
  • Integer number: sparse_categorical_crossentropy

另外tensorflow官方源码中列举了相关参数:

 Args:
    from_logits: Whether `y_pred` is expected to be a logits tensor. By default,
      we assume that `y_pred` encodes a probability distribution.
      Note: Using from_logits=True may be more numerically stable.
    reduction: (Optional) Type of `tf.keras.losses.Reduction` to apply to loss.
      Default value is `AUTO`. `AUTO` indicates that the reduction option will
      be determined by the usage context. For almost all cases this defaults to
      `SUM_OVER_BATCH_SIZE`.
      When used with `tf.distribute.Strategy`, outside of built-in training
      loops such as `tf.keras` `compile` and `fit`, using `AUTO` or
      `SUM_OVER_BATCH_SIZE` will raise an error. Please see
      https://www.tensorflow.org/alpha/tutorials/distribute/training_loops
      for more details on this.
    name: Optional name for the op.
  • from_logits:该参数定义了y_pred的形式,如果设置成True表示是一个未经过归一化的对数概率输出。默认情况下是False,表示预测经过经过Softmax归一化。官方文档中表示设置成True在数值上更稳定。后文代码中我针对相关参数的设定进行了对比分析
  • reduction:这个定义了损失的计算个数

案例分析

LOAD DATA

fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

BUILD MODEL

train_images = train_images / 255.0
test_images = test_images / 255.0 

model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28,28)),
    tf.keras.layers.Dense(128,activation='relu'),
    tf.keras.layers.Dense(10)
])
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=10)
test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)
tf.print('\nTest accuracy:', test_acc)

初始模型中将from_logits设置为True,模型的准确率为: 0.8812;将参数修改成False后,模型的准确率为:0.332

Prediction

probability_model = tf.keras.Sequential([model,
                                         tf.keras.layers.Softmax()])
predictions = probability_model.predict(test_images)
tf.print(predictions[0])
#输出
array([6.3350608e-06, 1.9365751e-09, 3.4766887e-09, 1.3867620e-08,
       1.5902913e-08, 1.4535291e-03, 2.8523655e-06, 1.9025985e-02,
       6.8801391e-08, 9.7951114e-01], dtype=float32)

对比不使用Softmax:

probability_model = tf.keras.Sequential([model])
predictionLogit = probability_model.predict(test_images)
tf.print(predictionLogit[0])

# 输出
array([ -6.288512 , -14.381447 , -13.796287 , -12.41281  , -12.275866 ,
        -0.8528625,  -7.086463 ,   1.7189487, -10.811144 ,   5.660197 ],
      dtype=float32)

Using The Trained Model

img = test_images[1]
# imag's shape = (1, 28, 28)
img = (np.expand_dims(img,0))
predictions_single = probability_model.predict(img)
print(predictions_single)
[[1.1843847e-05 2.8502357e-11 9.9778062e-01 3.2734149e-10 2.0844834e-03
  3.5600198e-15 1.2303848e-04 1.4568713e-08 3.6617865e-11 5.2883337e-14]]

你可能感兴趣的:(tensorflow2.0,迁移学习,机器学习,深度学习,tensorflow)