sigmoid cross entorpy loss

1.Cross Entropy Error
The mathematics behind cross entropy (CE) error and its relationship to NN training are very complex, but, fortunately, the results are remarkably simple to understand and implement. CE is best explained by example. Suppose you have just three training items with the following computed outputs and target outputs:
sigmoid cross entorpy loss_第1张图片

InCE, A 4-7-3 NN(4input - 7hindden - 3output) is instantiated and then trained using the back-propagation algorithm in conjunction with cross entropy error. After training’s completed, the NN model correctly predicted the species of 29 of the 30 (0.9667) test items.

Using a winner-takes-all evaluation technique, the NN predicts the first two data items correctly because the positions of the largest computed outputs match the positions of the 1 values in the target outputs, but the NN is incorrect on the third data item. The mean (average) squared error for this data is the sum of the squared errors divided by three. The squared error for the first item is (0.1 - 0)^2 + (0.3 - 0)^2 + (0.6 - 1)^2 = 0.01 + 0.09 + 0.16 = 0.26. Similarly, the squared error for the second item is 0.04 + 0.16 + 0.04 = 0.24, and the squared error for the third item is 0.49 + 0.16 + 0.09 = 0.74. So the mean squared error is (0.26 + 0.24 + 0.74) / 3 = 0.41.

Notice that in some sense the NN predicted the first two items with identical accuracy, because for both those items the computed outputs that correspond to target outputs of 1 are 0.6. But observe the squared error for the first two items are different (0.24 and 0.26), because all three outputs contribute to the sum.

The mean (average) CE error for the three items is the sum of the CE errors divided by three. The fancy way to express CE error with a function is shown in Figure 2.
sigmoid cross entorpy loss_第2张图片

In words this means, “Add up the product of the log to the base e of each computed output times its corresponding target output, and then take the negative of that sum.” So for the three items above, the CE of the first item is - (ln(0.1)*0 + ln(0.3)*0 + ln(0.6)*1) = - (0 + 0 -0.51) = 0.51. The CE of the second item is - (ln(0.2)*0 + ln(0.6)*1 + ln(0.2)*0) = - (0 -0.51 + 0) = 0.51. The CE of the third item is - (ln(0.3)*1 + ln(0.4)*0 + ln(0.3)*0) = - (-1.2 + 0 + 0) = 1.20. So the mean cross entropy error for the three-item data set is (0.51 + 0.51 + 1.20) / 3 = 0.74.

Notice that when computing mean cross entropy error with neural networks in situations where target outputs consist of a single 1 with the remaining values equal to 0, all the terms in the sum except one (the term with a 1 target) will vanish because of the multiplication by the 0s. Put another way, cross entropy essentially ignores all computed outputs which don’t correspond to a 1 target output. The idea is that when computing error during training, you really don’t care how far off the outputs which are associated with non-1 targets are, you’re only concerned with how close the single computed output that corresponds to the target value of 1 is to that value of 1. So, for the three items above, the CEs for the first two items, which in a sense were predicted with equal accuracy, are both 0.51.

  • 联系NG在ML课程中LR回归所讲,可知,NG所说的LR回归loss其实就是sigmoid cross entorpy loss(注意上文Notice)。当然sigmoid cross entorpy loss不仅仅用在这样的问题中,还可以应用在多标签学习问题中(多标签学习概念)。
  • 多标签学习与传统的单标签学习的区别在于:
    Traditional single-label classification is concerned with learning from a set of examples that are associated with a single label l from a set of disjoint labels L, |L| > 1. In multi-label classification, the examples are associated with a set of labels Y in L. In the past, multi-label classification was mainly motivated by the tasks of text categorization and medical diagnosis. Nowadays, we notice that multilabel classification methods are increasingly required by modern applications, such as protein function classification, music categorization and semantic scene classification。

2. caffe里sigmoidCrossEntropyLoss层计算
参考自caffecn
sigmoid cross entorpy loss_第3张图片

你可能感兴趣的:(sigmoid cross entorpy loss)