最近在研究学习TensorFlow,在做识别手写数字时,遇到了tf.nn.conv2d这个方法,其中有些方法还不是很清楚,于是网上搜索后,记录如下:
卷积神经网络的核心是对图像的“卷积”操作
tf.nn.conv2d方法定义
tf.nn.conv2d (input, filter, strides, padding, use_cudnn_on_gpu=None, data_format=None, name=None)(官网参数)
参数:
这个op(conv2d)执行了以下操作
1 将filter转为二维矩阵
它的shape是[filter_height * filter_width * in_channels, output_channels].
2 从input tensor中提取image patches(小块),形成一个virtual tensor,
它的shape是[batch, out_height, out_width, filter_height * filter_width * in_channels].
3 filter矩阵和image patch向量相乘
一般要求 strides的参数,strides[0] = strides[3] = 1
图片卷积后的尺寸计算公式
N = (W − F + 2P )/S+1
输出图片大小为 N×N
接下来通过实例来看通道、卷积核数目不同时,数据是如何变化的。
案例一(单通道、单核):
import tensorflow as tf
#case 1
input = tf.Variable(tf.random_normal([1,3,3,1]))
filter = tf.Variable(tf.random_normal([1,1,1,1]))
op2 = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
print("case 1")
print(sess.run(input))
case 1
[[[[ 0.00850411]
[ 0.00713599]
[-0.1402842 ]]
[[-1.4874302 ]
[ 1.1501638 ]
[-0.27221245]]
[[-1.8692739 ]
[-1.0514828 ]
[-0.22669399]]]]
Process finished with exit code 0
案例二(多通道、单核):
import tensorflow as tf
#case 2
input = tf.Variable(tf.random_normal([1,3,3,5]))
filter = tf.Variable(tf.random_normal([1,1,5,1]))
op2 = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
print("case 2")
print(sess.run(input))
case 2
[[[[-0.7135731 ]
[ 0.33953804]
[ 2.2308116 ]]
[[ 0.99762535]
[-1.1370671 ]
[ 1.024965 ]]
[[ 1.1837609 ]
[-0.01205832]
[ 0.7789178 ]]]]
Process finished with exit code 0
从这个案例可以看出,多通道好像不体现在数据的形状上,从RGB中图片中,我们也联想到是sum(R+G+B)组合一个数据,也是符合情况的。
案例三(单通道、多核):
import tensorflow as tf
#case 3
input = tf.Variable(tf.random_normal([1,3,3,1]))
filter = tf.Variable(tf.random_normal([1,1,1,5]))
op2 = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
print("case 3")
print(sess.run(input))
case 3
[[[[-0.5227298 -0.66182023 1.0833362 0.72718143 -0.17323467]
[-0.24752371 -0.313386 0.5129827 0.34433588 -0.0820303 ]
[-0.34527072 -0.43714198 0.7155594 0.480314 -0.11442404]]
[[-0.23402141 -0.29629093 0.4849998 0.32555252 -0.0775556 ]
[ 0.25400987 0.321598 -0.52642506 -0.35335892 0.08417984]
[-0.5669042 -0.71774876 1.1748857 0.7886334 -0.18787423]]
[[ 0.9459997 1.1977158 -1.9605457 -1.3160018 0.31350794]
[ 0.26897132 0.34054047 -0.55743206 -0.37417215 0.08913813]
[ 0.4315578 0.54638875 -0.8943859 -0.6003499 0.14301991]]]]
Process finished with exit code 0
从这里好像看出,多核的数目反应在行上
案例四(多通道、多核):
import tensorflow as tf
#case 4
input = tf.Variable(tf.random_normal([1,3,3,5]))
filter = tf.Variable(tf.random_normal([1,1,5,5]))
op2 = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
print("case 4")
print(sess.run(input))
case 4
[[[[-2.4651055 -0.3509455 -4.686282 -2.3212051 -0.710496 ]
[ 0.7768164 0.57705945 1.318051 0.5697315 0.07564557]
[-1.0820991 1.9157095 0.23172522 2.5040576 2.321178 ]]
[[ 0.4635254 0.05687328 0.81018573 -0.14844202 -1.0212026 ]
[ 2.6249065 -0.76527166 4.834874 2.1821246 0.7100087 ]
[ 0.6223385 -0.1788317 4.873171 3.3820043 -1.1162739 ]]
[[-1.0766833 0.43714532 -0.9590479 -0.6471283 -1.8756338 ]
[-3.936176 -0.02507877 -7.966227 -3.140205 0.98988634]
[-4.6462646 -0.8348821 -6.4950304 -1.1865668 0.99817204]]]]
Process finished with exit code 0
接下来的案例继续说明问题。他们是如何计算的,以及计算结果是如何排列的
import tensorflow as tf
sess = tf.InteractiveSession()
input_batch = tf.constant([
[ # First Input (6x6x1)
[[0.0], [1.0], [2.0], [3.0], [4.0], [5.0]],
[[0.1], [1.1], [2.1], [3.1], [4.1], [5.1]],
[[0.2], [1.2], [2.2], [3.2], [4.2], [5.2]],
[[0.3], [1.3], [2.3], [3.3], [4.3], [5.3]],
[[0.4], [1.4], [2.4], [3.4], [4.4], [5.4]],
[[0.5], [1.5], [2.5], [3.5], [4.5], [5.5]],
],
])
kernel = tf.constant([ # Kernel (3x3x1)
[[[0.0]], [[0.5]], [[0.0]]],
[[[0.0]], [[1.0]], [[0.0]]],
[[[0.0]], [[0.5]], [[0.0]]]
])
# NOTE: the change in the size of the strides parameter.
conv2d = tf.nn.conv2d(input_batch, kernel, strides=[1, 3, 3, 1], padding='SAME')
print(sess.run(conv2d))
输出结果
[[[[ 2.20000005]
[ 8.19999981]]
[[ 2.79999995]
[ 8.80000019]]]]
如果将上述例子更改为 strides=[1, 2, 2, 1]
结果输出
[[[[ 2.20000005]
[ 6.19999981]
[ 10.19999981]]
[[ 2.5999999 ]
[ 6.60000038]
[ 10.60000038]]
[[ 2.20000005]
[ 5.19999981]
[ 8.19999981]]]]
案例六(多通道、多核):
#case 6
input = tf.Variable(tf.random_normal([1,5,5,5]))
filter = tf.Variable(tf.random_normal([3,3,5,7]))
op6 = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')
case 6
[[[[ 12.02504349 4.35077286 2.67207813 5.77893162 6.98221684
-0.96858567 -8.1147871 ]
[ -0.02988982 -2.52141953 15.24755192 6.39476395 -4.36355495
-2.34515095 5.55743504]
[ -2.74448752 -1.62703776 -6.84849405 10.12248802 3.7408421
4.71439075 6.13722801]
[ 0.82365227 -1.00546622 -3.29460764 5.12690163 -0.75699937
-2.60097408 -8.33882809]
[ 0.76171923 -0.86230004 -6.30558443 -5.58426857 2.70478535
8.98232937 -2.45504045]]
[[ 3.13419819 -13.96483231 0.42031103 2.97559547 6.86646557
-3.44916964 -0.10199898]
[ 11.65359879 -5.2145977 4.28352737 2.68335319 3.21993709
-6.77338028 8.08918095]
[ 0.91533852 -0.31835344 -1.06122255 -9.11237717 5.05267143
5.6913228 -5.23855162]
[ -0.58775592 -5.03531456 14.70254898 9.78966522 -11.00562763
-4.08925819 -3.29650426]
[ -2.23447251 -0.18028721 -4.80610704 11.2093544 -6.72472
-2.67547607 1.68422937]]
[[ -3.40548897 -9.70355129 -1.05640507 -2.55293012 -2.78455877
-15.05377483 -4.16571808]
[ 13.66925812 2.87588191 8.29056358 6.71941566 2.56558466
10.10329056 2.88392687]
[ -6.30473804 -3.3073864 12.43273926 -0.66088223 2.94875336
0.06056046 -2.78857946]
[ -7.14735603 -1.44281793 3.3629775 -7.87305021 2.00383091
-2.50426936 -6.93097973]
[ -3.15817571 1.85821593 0.60049552 -0.43315536 -4.43284273
0.54264796 1.54882073]]
[[ 2.19440389 -0.21308756 -4.35629082 -3.62100363 -0.08513772
-0.80940366 7.57606506]
[ -2.65713739 0.45524287 -16.04298019 -5.19629049 -0.63200498
1.13256514 -6.70045137]
[ 8.00792599 4.09538221 -6.16250181 8.35843849 -4.25959206
-1.5945878 -7.60996151]
[ 8.56787586 5.85663748 -4.38656425 0.12728286 -6.53928804
2.3200655 9.47253895]
[ -6.62967777 2.88872099 -2.76913023 -0.86287498 -1.4262073
-6.59967232 5.97229099]]
[[ -3.59423327 4.60458899 -5.08300591 1.32078576 3.27156973
0.5302844 -5.27635145]
[ -0.87793881 1.79624665 1.66793108 -4.70763969 -2.87593603
-1.26820421 -7.72825718]
[ -1.49699068 -3.40959787 -1.21225107 -1.11641395 -8.50123024
-0.59399474 3.18010235]
[ -4.4249506 -0.73349547 -1.49064219 -6.09967899 5.18624878
-3.80284953 -0.55285597]
[ -1.42934585 2.76053572 -5.19795799 0.83952439 -0.15203482
0.28564462 2.66513705]]]]
以下是区别padding='VALID',不考虑边界的情况
input = tf.Variable(tf.random_normal([1,5,5,5]))
filter = tf.Variable(tf.random_normal([3,3,5,7]))
op2 = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='VALID')
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
print("case 7")
print(sess.run(op2))
case 7
[[[[-12.277221 -3.6852837 -7.5415998 -1.4435571 4.5265384
5.6599164 0.03461173]
[ 4.2244606 -1.8150828 -2.9772494 11.986962 1.5673934
-5.33732 -6.576837 ]
[ 2.792845 -1.1091218 -8.66483 12.438319 -1.8882469
-3.9440742 -6.3208795 ]]
[[ -2.3882375 9.021189 -7.999711 18.31005 4.852937
-5.7791305 5.0236855 ]
[ 1.0881239 -5.179409 0.15859601 6.445263 8.557671
-16.044416 3.657256 ]
[ 2.795134 4.8999724 -9.92672 3.9908109 6.207695
-6.553004 9.258662 ]]
[[ -5.4560223 6.153165 6.02847 6.907523 -5.5059247
-2.2264066 1.7103047 ]
[ -1.0343044 -5.2060676 0.98752177 -4.918023 0.17576812
-1.5359226 1.663869 ]
[ -7.092221 1.1528535 -1.7145716 3.2233562 -4.150458
0.8865322 14.828557 ]]]]
Process finished with exit code 0
tf.nn.max_pool方法定义
tf.nn.max_pool(value, ksize, strides, padding, name=None)
value:池化的输入,一般池化层接在卷积层的后面,所以输出通常为feature map。feature map依旧是[batch, in_height, in_width, in_channels]这样的参数。
ksize:池化窗口的大小,参数为四维向量,通常取[1, height, width, 1],因为我们不想在batch和channels上做池化,所以这两个维度设为了1。ps:估计面tf.nn.conv2d中stries的四个取值也有相同的意思。
stries:步长,同样是一个四维向量。
padding:填充方式同样只有两种不重复了。
参考资料:
https://blog.csdn.net/flyfish1986/article/details/77508783