这几天在学TensorFlow,从mnist开始,有不少迷惑的地方,先把理解记下来
附上完整代码在这里
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
这里 truncated_normal方法的定义为
def truncated_normal(shape,
mean=0.0,
stddev=1.0,
dtype=dtypes.float32,
seed=None,
name=None):
"""Outputs random values from a truncated normal distribution.
The generated values follow a normal distribution with specified mean and standard deviation, except that values whose magnitude is more than 2 standard deviations from the mean are dropped and re-picked.
Args:
shape: A 1-D integer Tensor or Python array. The shape of the output tensor.
mean: A 0-D Tensor or Python value of type `dtype`. The mean of the truncated normal distribution.
stddev: A 0-D Tensor or Python value of type `dtype`. The standard deviation of the truncated normal distribution.
dtype: The type of the output.
seed: A Python integer. Used to create a random seed for the distribution.
See
@{tf.set_random_seed}
for behavior.
name: A name for the operation (optional).
Returns:
A tensor of the specified shape filled with random truncated normal values.
"""
也就是说这个函数生成的值遵循具有特定平均值和标准偏差的正态分布,如果其数值大于平均值2个标准偏差,将被丢弃并重新挑选。那么6个参数的作用就一目了然了。
也就是说,初始W是随机值,b是定值
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
这里作用是卷积和池化
conv2d方法:
def conv2d(input, filter, strides, padding, use_cudnn_on_gpu=True, data_format="NHWC", name=None):
"""
Args:
input: A `Tensor`. Must be one of the following types: `half`, `float32`.A 4-D tensor. The dimension order is interpreted according to the value of `data_format`, see below for details.
filter: A `Tensor`. Must have the same type as `input`.A 4-D tensor of shape `[filter_height, filter_width, in_channels, out_channels]`
strides: A list of `ints`.1-D tensor of length 4. The stride of the sliding window for each dimension of `input`. The dimension order is determined by the value of `data_format`, see below for details.
padding: A `string` from: `"SAME", "VALID"`.The type of padding algorithm to use.
use_cudnn_on_gpu: An optional `bool`. Defaults to `True`.
data_format: An optional `string` from: `"NHWC", "NCHW"`. Defaults to `"NHWC"`.
Specify the data format of the input and output data. With the default format "NHWC", the data is stored in the order of:
[batch, height, width, channels].
Alternatively, the format could be "NCHW", the data storage order of:
[batch, channels, height, width].
name: A name for the operation (optional).
Returns:
A `Tensor`. Has the same type as `input`.
A 4-D tensor. The dimension order is determined by the value of `data_format`, see below for details.
"""
这里说conv2d是一个计算两个四维张量卷积的函数,这两个四维张量的格式是
x= [batch, in_height, in_width, in_channels] # 输入图像
W= [filter_height, filter_width, in_channels, out_channels] # 卷积核
batch : 一次输入图的个数
in/out_channel : 输入/出通道
图像卷积的计算就是把卷积核翻转180°,再与图像一一对应相乘,卷积核在每个维度滑动的步长就由strides控制,文档里说了对于图片,因为只有两维,通常strides取[1,stride,stride,1]
max_pool方法:
def max_pool(value, ksize, strides, padding, data_format="NHWC", name=None):
"""Performs the max pooling on the input.
Args:
value: A 4-D `Tensor` of the format specified by `data_format`.
ksize: A 1-D int Tensor of 4 elements. The size of the window for
each dimension of the input tensor.
strides: A 1-D int Tensor of 4 elements. The stride of the sliding
window for each dimension of the input tensor.
padding: A string, either `'VALID'` or `'SAME'`. The padding algorithm.
See the @{tf.nn.convolution$comment here}
data_format: A string. 'NHWC', 'NCHW' and 'NCHW_VECT_C' are supported.
name: Optional name for the operation.
Returns:
A `Tensor` of format specified by `data_format`.
The max pooled output tensor.
"""
池化与卷积类似也就很容易看懂了
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
x_image = tf.reshape(x, [-1, 28, 28, 1])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
这里预定义的W 的shape为[5, 5, 1, 32],对照2里面说的,那么卷积核是5*5大小,1通道输入,32通道输出的
b则是[shape为 [32] 的一个向量
x_image 为x reshape成[-1, 28, 28, 1]的矩阵,图像大小为28*28,1为通道数,-1将自动匹配数值,也就是1了(28*28=784,正好是原输入)
???为什么是32
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
同上,为什么是64?
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
同上,为什么是1024?
这里因为代码的padding设置为了SAME,原图是28×28×1,第一次池化后为14×14×32,第二次池化后为7×7×64,全链接后,变成了1×3136的向量
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
dropout函数用于防止过拟合
中文官方文档中说:我们用一个placeholder来代表一个神经元的输出在dropout中保持不变的概率。这样我们可以在训练过程中启用dropout,在测试过程中关闭dropout。 TensorFlow的tf.nn.dropout操作除了可以屏蔽神经元的输出外,还会自动处理神经元输出值的scale。所以用dropout的时候可以不用考虑scale
dropout在Geoffrey Hinton的这篇大作中被发现,具体操作过程可以看一看,简单的来说:
Dropout is a radically different technique for regularization. Unlike L1 and L2 regularization, dropout doesn’t rely on modifying the cost function. Instead, in dropout we modify the network itself. Let me describe the basic mechanics of how dropout works, before getting into why it works, and what the results are.Suppose we’re trying to train a network.
In particular, suppose we have a training input xx and corresponding desired output yy. Ordinarily, we’d train by forward-propagating xxthrough the network, and then backpropagating to determine the contribution to the gradient. With dropout, this process is modified. We start by randomly (and temporarily) deleting half the hidden neurons in the network, while leaving the input and output neurons untouched. After doing this, we’ll end up with a network along the following lines.
We forward-propagate the input xx through the modified network, and then backpropagate the result, also through the modified network. After doing this over a mini-batch of examples, we update the appropriate weights and biases. We then repeat the process, first restoring the dropout neurons, then choosing a new random subset of hidden neurons to delete, estimating the gradient for a different mini-batch, and updating the weights and biases in the network.
By repeating this process over and over, our network will learn a set of weights and biases. Of course, those weights and biases will have been learnt under conditions in which half the hidden neurons were dropped out. When we actually run the full network that means that twice as many hidden neurons will be active. To compensate for that, we halve the weights outgoing from the hidden neurons.
This dropout procedure may seem strange and ad hoc. Why would we expect it to help with regularization? To explain what’s going on, I’d like you to briefly stop thinking about dropout, and instead imagine training neural networks in the standard way (no dropout). In particular, imagine we train several different neural networks, all using the same training data. Of course, the networks may not start out identical, and as a result after training they may sometimes give different results. When that happens we could use some kind of averaging or voting scheme to decide which output to accept. For instance, if we have trained five networks, and three of them are classifying a digit as a “3”, then it probably really is a “3”. The other two networks are probably just making a mistake. This kind of averaging scheme is often found to be a powerful (though expensive) way of reducing overfitting. The reason is that the different networks may overfit in different ways, and averaging may help eliminate that kind of overfitting.
What’s this got to do with dropout? Heuristically, when we dropout different sets of neurons, it’s rather like we’re training different neural networks. And so the dropout procedure is like averaging the effects of a very large number of different networks. The different networks will overfit in different ways, and so, hopefully, the net effect of dropout will be to reduce overfitting.
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2
这里就输出结果了
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(20000):
batch = mnist.train.next_batch(50)
if i % 100 == 0:
train_accuracy = accuracy.eval(feed_dict={
x: batch[0], y_: batch[1], keep_prob: 1.0})
print('step %d, training accuracy %g' % (i, train_accuracy))
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
print('test accuracy %g' % accuracy.eval(feed_dict={
x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))
循环20000次,每次50张图片,每100的倍数输出正确率,训练完在输出test集的正确率
这里每次50张有讲究吗?