这篇文章主要以LeNet在tensorflow中的实践作为基础进行讲解。主要是一些基本代码的分析。至于LeNet这个网络的详解,可以参考:
LeNet详解
对于CNN网络来说,tensorflow基本的两个提供的函数分别是: tf.nn.conv2d()和
tf.nn.bias_add() ,那么用这两个基本的函数来构造LeNet,那么先来看看整体的架构是怎样的?(python实现的)
# Output depth
k_output = 64
# Image Properties
image_width = 10
image_height = 10
color_channels = 3
# Convolution filter
filter_size_width = 5
filter_size_height = 5
# Input/Image
input = tf.placeholder(
tf.float32,
shape=[None, image_height, image_width, color_channels])
# Weight and bias
weight = tf.Variable(tf.truncated_normal(
[filter_size_height, filter_size_width, color_channels, k_output]))
bias = tf.Variable(tf.zeros(k_output))
# Apply Convolution
conv_layer = tf.nn.conv2d(input, weight, strides=[1, 2, 2, 1], padding='SAME')
# Add bias
conv_layer = tf.nn.bias_add(conv_layer, bias)
# Apply activation function
conv_layer = tf.nn.relu(conv_layer)
需要注意的是里面的一些参数的构造,比如input里面的shape = shape=[None, image_height, image_width, color_channels],但是在weights里面又是不一样的。解析如下:
同时对于卷积层输入时m*n*k ,然后给出filter的大小,求出输出数据大小的具体计算如下:
其中的pading的p可能很多人不理解,其实在下面的公式中,w_out公式中的p对应的是在padding参数等于SAME的情况下在原图像中的宽度加的像素值,因为会涉及到两边,所以乘以2.高度同理。
在TensorFlow中不同的方式求解formula如下:和以上方法有一定差别
实现代码如下:
from tensorflow.contrib.layers import flatten
keep_prob = tf.placeholder(tf.float32)
X_train,y_train = shuffle(X_train,y_train)
def LeNet(x):
mu = 0
sigma = 0.1
conv1_W = tf.Variable(tf.truncated_normal(shape=(5,5,1,6),mean=mu,stddev = sigma))
conv1_b = tf.Variable(tf.zeros(6))
conv1 = tf.nn.conv2d(x,conv1_W,strides=[1,1,1,1],padding = 'VALID') + conv1_b
conv1 = tf.nn.relu(conv1)
conv1 = tf.nn.dropout(conv1,keep_prob)
conv1 = tf.nn.max_pool(conv1,ksize = [1,2,2,1],strides=[1,2,2,1],padding='VALID')
# SOLUTION: Layer 2: Convolutional. Output = 10x10x16.
conv2_W = tf.Variable(tf.truncated_normal(shape=(5, 5, 6, 16), mean = mu, stddev = sigma))
conv2_b = tf.Variable(tf.zeros(16))
conv2 = tf.nn.conv2d(conv1, conv2_W, strides=[1, 1, 1, 1], padding='VALID') + conv2_b
# SOLUTION: Activation.
conv2 = tf.nn.relu(conv2)
conv2 = tf.nn.dropout(conv2,keep_prob)
# SOLUTION: Pooling. Input = 10x10x16. Output = 5x5x16.
conv2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')
# SOLUTION: Flatten. Input = 5x5x16. Output = 400.
fc0 = flatten(conv2)
# SOLUTION: Layer 3: Fully Connected. Input = 400. Output = 120.
fc1_W = tf.Variable(tf.truncated_normal(shape=(400, 120), mean = mu, stddev = sigma))
fc1_b = tf.Variable(tf.zeros(120))
fc1 = tf.matmul(fc0, fc1_W) + fc1_b
# SOLUTION: Activation.
fc1 = tf.nn.relu(fc1)
#changed that
fc1 = tf.nn.dropout(fc1,keep_prob)
# SOLUTION: Layer 4: Fully Connected. Input = 120. Output = 84.
fc2_W = tf.Variable(tf.truncated_normal(shape=(120, 84), mean = mu, stddev = sigma))
fc2_b = tf.Variable(tf.zeros(84))
fc2 = tf.matmul(fc1, fc2_W) + fc2_b
# SOLUTION: Activation.
fc2 = tf.nn.relu(fc2)
#changed that
fc2 = tf.nn.dropout(fc2,keep_prob)
# SOLUTION: Layer 5: Fully Connected. Input = 84. Output = 10.
fc3_W = tf.Variable(tf.truncated_normal(shape=(84, 43), mean = mu, stddev = sigma))
fc3_b = tf.Variable(tf.zeros(43))
logits = tf.matmul(fc2, fc3_W) + fc3_b
return logits
拿第一层卷积层来说:刚开始输入是32*32*1(灰度图),经过第一层之后变成了28*28*16,其中28的大小计算就是 ceil(float(input-filter+1)/float(stride))
对于激活函数relu,其实就是处理掉一些信号弱的部分。
对于接下来的dropout,其实就是在为了防止过拟合的步骤,在test的时候随机选取一些网络结构的节点参与操作。
接下来紧接着的pooling操作,其实就是实现降维,一个博客里面的图很好的解释了这个操作。pooling解释
以上大概就是LeNet在tensorflow上的实现,接下来将实践用keras这个高度封装的库来实现LeNet。