更新时间:
2017.5.10
删掉了大部分的冗余,留下一些重要的东西.更新官方文档地址
这篇讲的都是在tf.nn这个模块下面的函数(类),作为提供神经网咯基本”组件”的模块.算是非常重要的一个模块了.
官方文档:Neural Network
常用的激活函数的理论总结可以看之前的博客:深度学习笔记六:常见激活函数总结
激活操作提供了在神经网络中使用的不同类型的非线性模型。包括光滑非线性模型(sigmoid, tanh, elu, softplus, and softsign)。连续但是不是处处可微的函数(relu, relu6, crelu and relu_x)。当然还有随机正则化 (dropout)
所有的激活操作都是作用在每个元素上面的,输出一个tensor和输入的tensor又相同的形状和数据类型。
这里列出tensorflow提供的这些激活函数,但是就不细讲原理了,参照链接就行.至于使用也是非常简单的.只以relu作为例子,其他的使用方式差不多.具体可以看文档,
tf.nn.relu(features, name=None)
作用:
计算修正线性单元(非常常用):max(features, 0).并且返回和feature一样的形状的tensor。参数:
features: tensor类型,必须是这些类型:A Tensor. float32, float64, int32, int64, uint8, int16, int8, uint16, half.
name: :操作名称(可选)
列表:
tf.nn.relu
tf.nn.relu6
tf.nn.crelu
tf.nn.elu
tf.nn.softplus
tf.nn.softsign
tf.nn.dropout
tf.nn.bias_add
tf.sigmoid
tf.tanh
分类实现了一些交叉熵的函数等等,要是你的神经网络有分类的任务什么的化,一般可以用在输出层。
http://dataunion.org/26447.html
http://www.jianshu.com/p/fb119d0ff6a6
tf.nn.sigmoid_cross_entropy_with_logits(logits, targets, name=None)
Computes sigmoid cross entropy given logits.
Measures the probability error in discrete classification tasks in which each class is independent and not mutually exclusive. For instance, one could perform multilabel classification where a picture can contain both an elephant and a dog at the same time.
For brevity, let x = logits, z = targets. The logistic loss is
z * -log(sigmoid(x)) + (1 - z) * -log(1 - sigmoid(x))
= z * -log(1 / (1 + exp(-x))) + (1 - z) * -log(exp(-x) / (1 + exp(-x)))
= z * log(1 + exp(-x)) + (1 - z) * (-log(exp(-x)) + log(1 + exp(-x)))
= z * log(1 + exp(-x)) + (1 - z) * (x + log(1 + exp(-x))
= (1 - z) * x + log(1 + exp(-x))
= x - x * z + log(1 + exp(-x))
For x < 0, to avoid overflow in exp(-x), we reformulate the above
x - x * z + log(1 + exp(-x))
= log(exp(x)) - x * z + log(1 + exp(-x))
= - x * z + log(1 + exp(x))
Hence, to ensure stability and avoid overflow, the implementation uses this equivalent formulation
max(x, 0) - x * z + log(1 + exp(-abs(x)))
logits and targets must have the same type and shape.
Args:
logits: A Tensor of type float32 or float64.
targets: A Tensor of the same type and shape as logits.
name: A name for the operation (optional).
Returns:
A Tensor of the same shape as logits with the componentwise logistic losses.
Raises:
ValueError: If logits and targets do not have the same shape.
tf.nn.softmax(logits, dim=-1, name=None)
作用:计算softmax 激活值
参数:
logits: 非空的tensor,类型必须为half, float32, float64.
dim: softmax作用的维度,默认是-1,表示最后一个维度
name: 【可选】这个操作的名字
返回值:
返回一个tensor,和logits有相同的类型和形状
tf.nn.log_softmax(logits, dim=-1, name=None)
Computes log softmax activations.
For each batch i and class j we have
logsoftmax = logits - reduce_sum(exp(logits), dim)
Args:
logits: A non-empty Tensor. Must be one of the following types: half, float32, float64.
dim: The dimension softmax would be performed on. The default is -1 which indicates the last dimension.
name: A name for the operation (optional).
Returns:
A Tensor. Has the same type as logits. Same shape as logits.
Raises:
InvalidArgumentError: if logits is empty or dim is beyond the last dimension of logits.
tf.nn.softmax_cross_entropy_with_logits(logits, labels, dim=-1, name=None)
作用:对于logits和labels之间计算softmax交叉熵。要是你对于softmax熟悉的话,这里很容易理解。通俗来说,就是在离散分类任务的时候度量概率误差的。softmax之后的每一个分量就代表一个类,分量(类)上面的值就是该类的概率。
这个函数并不是计算softmax的函数,只是根据softmax计算分类误差,所以不要吧这个函数当做softmax函数使用。
logits和labels必须有相同的形状[batch_size, num_classes]和相同的类型 (either float16, float32, or float64)。
参数:
logits: Unscaled log probabilities.
labels: 你的labels矩阵,每一行代表一个样本的概率分布(要是你熟悉softmax和onehot encoding的话)
dim: 作用的维度,默认是-1,表示最后的那个维度
name: 【可选】这个操作的名字
返回:
一个1维的tensor,长度为batch_size,类型和logits一样。其中是各个元素相应样本的softmax的交叉熵损失。
If using exclusive labels (wherein one and only one class is true at a time), see sparse_softmax_cross_entropy_with_logits.
WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results.
logits and labels must have the same shape and the same dtype.
tf.nn.sparse_softmax_cross_entropy_with_logits(logits, labels, name=None)
Computes sparse softmax cross entropy between logits and labels.
Measures the probability error in discrete classification tasks in which the classes are mutually exclusive (each entry is in exactly one class). For example, each CIFAR-10 image is labeled with one and only one label: an image can be a dog or a truck, but not both.
NOTE: For this operation, the probability of a given label is considered exclusive. That is, soft classes are not allowed, and the labels vector must provide a single specific index for the true class for each row of logits (each minibatch entry). For soft softmax classification with a probability distribution for each entry, see softmax_cross_entropy_with_logits.
WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results.
A common use case is to have logits of shape [batch_size, num_classes] and labels of shape [batch_size]. But higher dimensions are supported.
Args:
logits: Unscaled log probabilities of rank r and shape [d_0, d_1, …, d_{r-2}, num_classes] and dtype float32 or float64. labels: Tensor of shape [d_0, d_1, …, d_{r-2}] and dtype int32 or int64. Each entry in labels must be an index in [0, num_classes). Other values will raise an exception when this op is run on CPU, and return NaN for corresponding corresponding loss and gradient rows on GPU. name: A name for the operation (optional).
Returns:
A Tensor of the same shape as labels and of the same type as logits with the softmax cross entropy loss.
Raises:
ValueError: If logits are scalars (need to have rank >= 1) or if the rank of the labels is not equal to the rank of the labels minus one.
tf.nn.weighted_cross_entropy_with_logits(logits, targets, pos_weight, name=None)
Computes a weighted cross entropy.
This is like sigmoid_cross_entropy_with_logits() except that pos_weight, allows one to trade off recall and precision by up- or down-weighting the cost of a positive error relative to a negative error.
The usual cross-entropy cost is defined as:
targets * -log(sigmoid(logits)) + (1 - targets) * -log(1 - sigmoid(logits))
The argument pos_weight is used as a multiplier for the positive targets:
targets * -log(sigmoid(logits)) * pos_weight + (1 - targets) * -log(1 - sigmoid(logits))
For brevity, let x = logits, z = targets, q = pos_weight. The loss is:
qz * -log(sigmoid(x)) + (1 - z) * -log(1 - sigmoid(x))
= qz * -log(1 / (1 + exp(-x))) + (1 - z) * -log(exp(-x) / (1 + exp(-x)))
= qz * log(1 + exp(-x)) + (1 - z) * (-log(exp(-x)) + log(1 + exp(-x)))
= qz * log(1 + exp(-x)) + (1 - z) * (x + log(1 + exp(-x))
= (1 - z) * x + (qz + 1 - z) * log(1 + exp(-x))
= (1 - z) * x + (1 + (q - 1) * z) * log(1 + exp(-x))
Setting l = (1 + (q - 1) * z), to ensure stability and avoid overflow, the implementation uses
(1 - z) * x + l * (log(1 + exp(-abs(x))) + max(-x, 0))
logits and targets must have the same type and shape.
Args:
logits: A Tensor of type float32 or float64.
targets: A Tensor of the same type and shape as logits.
pos_weight: A coefficient to use on the positive examples.
name: A name for the operation (optional).
Returns:
A Tensor of the same shape as logits with the componentwise weightedlogistic losses.
Raises:
ValueError: If logits and targets do not have the same shape.