[论文笔记]A guide to convolution arithmetic for deep learning

Github

Chapter 1. Introduction

1.1 Discrete convolutions

: N-D
: number of output feature maps
: number of input feature maps
: kernel size along axis j
: input size along axis j
: stride (distance between two consecutive positions of the kernel) along axis j
: zero padding (number of zeros concatenated at the beginning and at the end of an axis) along axis j

1.2 Pooling

Pooling operations reduce the size of feature maps by using some function to summarize subregions, such as taking the average or the maximum value.

Chapter 2. Convolution arithmetic

The analysis of the relationship between convolutional layer properties is eased by the fact that they don’t interact across axes. Because of that, this chapter will focus on the following simplified setting:

  • 2-D discrete convolutions ()
  • square inputs (),
  • square kernel size (),
  • same strides along both axes (),
  • same zero padding along both axes ().

Note: the results outlined here also generalize to the -D and non-square cases.

2.1 No zero padding, unit strides ()

Relationship 1. For any and , and for and , .

2.2 Zero padding, unit strides ()

Relationship 2. For any , and , and for ,
.

2.2.1 Half (same) padding ()

Relationship 3. For any and for odd (,), and ,

2.2.2 Full padding ()

Relationship 4. For any and , and for and , .

2.3 No zero padding, non-unit strides ()

Relationship 5. For any , and , and for ,
.

2.4 Zero padding, non-unit strides ()

Relationship 6. For any , , and ,
.

Chapter 3. Pooling arithmetic

Pooling does not involve zero padding ().

Relationship 7. For any , and ,
.

Chapter 4. Transposed convolution arithmetic

The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution.
Note: transposed convolution properties don’t interact across axes
We still use the same settings as chapter 2 in the following.

4.1 Convolution as a matrix operation

4.2 Transposed convolution

4.3 No zero padding, unit strides, transposed ()

Relationship 8. A convolution described by , and has an associated transposed convolution described by , and and its output size is :


4.4 Zero padding, unit strides, transposed ()

Relationship 9. A convolution described by , and has an associated transposed convolution described by , and and its output size is


4.4.1 Half (same) padding, transposed ()

Relationship 10. A convolution described by , and has an associated transposed convolution described by , and and its output size is .

4.4.2 Full padding, transposed ()

Relationship 11. A convolution described by , and has an associated transposed convolution described by , and and its output size is .

4.5 No zero padding, non-unit strides, transposed ()

Relationship 12. A convolution described by , and and whose input size is such that is a multiple of , has an associated transposed convolution described by,, and , where is the size of the stretched input obtained by adding zeros between each input unit, and its output size is .


4.6 Zero padding, non-unit strides, transposed ()

Relationship 13. A convolution described by , and and whose input size is such that is a multiple of , has an associated transposed convolution described by,, and , where is the size of the stretched input obtained by adding zeros between each input unit, and its output size is .


Relationship 14.A convolution described by , and has an associated transposed convolution described by, ,, and , where is the size of the stretched input obtained by adding zeros between each input unit, and mod represents the number of zeros added to the bottom and right
edges of the input, its output size is .

Chapter 5. Miscellaneous convolutions

5.1 Dilated convolutions

Dilated convolutions are used to cheaply increase the receptive field of output units without increasing the kernel size, there are usually d-1 spaces inserted between kernel elements such that d = 1 corresponds to a regular convolution.
A kernel of size k dilated by a factor d has an effective size

Relationship 15. For any , , and , and for a dilation rate , .

你可能感兴趣的:([论文笔记]A guide to convolution arithmetic for deep learning)