感谢知乎:
https://www.zhihu.com/question/52668301
百度百科:
https://baike.baidu.com/item/%E5%8D%B7%E7%A7%AF%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C/17541100?fr=aladdin
卷积神经网络(Convolutional Neural Networks, CNN)是一类包含卷积计算且具有深度结构的前馈神经网络(Feedforward Neural Networks),是深度学习(deep learning)的代表算法之一 。卷积神经网络具有表征学习(representation learning)能力,能够按其阶层结构对输入信息进行平移不变分类(shift-invariant classification),因此也被称为“平移不变人工神经网络(Shift-Invariant Artificial Neural Networks, SIANN)” 。百度百科这样说。
可能还是很难理解。卷积神经网络,听起来像是计算机科学、生物学和数学的诡异组合,但它们已经成为计算机视觉领域中最具影响力的革新的一部分。神经网络在 2012 年崭露头角,Alex Krizhevsky 凭借它们赢得了那一年的 ImageNet 挑战赛(大体上相当于计算机视觉的年度奥林匹克),他把分类误差记录从 26% 降到了 15%,在当时震惊了世界。自那之后,大量公司开始将深度学习用作服务的核心。Facebook 将神经网络用于自动标注算法、谷歌将它用于图片搜索、亚马逊将它用于商品推荐、Pinterest 将它用于个性化主页推送、Instagram 将它用于搜索架构。
这里我们也以经典的VGG16网络为例。
核心函数和参数计算方法:
Pool
附上python完整代码
import torch.nn as nn
import torch
class SE_VGG(nn.Module):
def __init__(self, num_classes):
super().__init__()
self.num_classes = num_classes
# define an empty for Conv_ReLU_MaxPool
net = []
# block 1
net.append(nn.Conv2d(in_channels=3, out_channels=64, padding=1, kernel_size=3, stride=1))
net.append(nn.ReLU())
net.append(nn.Conv2d(in_channels=64, out_channels=64, padding=1, kernel_size=3, stride=1))
net.append(nn.ReLU())
net.append(nn.MaxPool2d(kernel_size=2, stride=2))
# block 2
net.append(nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=1, padding=1))
net.append(nn.ReLU())
net.append(nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding=1))
net.append(nn.ReLU())
net.append(nn.MaxPool2d(kernel_size=2, stride=2))
# block 3
net.append(nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, padding=1, stride=1))
net.append(nn.ReLU())
net.append(nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, padding=1, stride=1))
net.append(nn.ReLU())
net.append(nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, padding=1, stride=1))
net.append(nn.ReLU())
net.append(nn.MaxPool2d(kernel_size=2, stride=2))
# block 4
net.append(nn.Conv2d(in_channels=256, out_channels=512, kernel_size=3, padding=1, stride=1))
net.append(nn.ReLU())
net.append(nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, padding=1, stride=1))
net.append(nn.ReLU())
net.append(nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, padding=1, stride=1))
net.append(nn.ReLU())
net.append(nn.MaxPool2d(kernel_size=2, stride=2))
# block 5
net.append(nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, padding=1, stride=1))
net.append(nn.ReLU())
net.append(nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, padding=1, stride=1))
net.append(nn.ReLU())
net.append(nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, padding=1, stride=1))
net.append(nn.ReLU())
net.append(nn.MaxPool2d(kernel_size=2, stride=2))
# add net into class property
self.extract_feature = nn.Sequential(*net)
# define an empty container for Linear operations
classifier = []
classifier.append(nn.Linear(in_features=512*7*7, out_features=4096))
classifier.append(nn.ReLU())
classifier.append(nn.Dropout(p=0.5))
classifier.append(nn.Linear(in_features=4096, out_features=4096))
classifier.append(nn.ReLU())
classifier.append(nn.Dropout(p=0.5))
classifier.append(nn.Linear(in_features=4096, out_features=self.num_classes))
# add classifier into class property
self.classifier = nn.Sequential(*classifier)
def forward(self, x):
feature = self.extract_feature(x)
feature = feature.view(x.size(0), -1)
classify_result = self.classifier(feature)
return classify_result
测试的主函数:
if __name__ == "__main__":
x = torch.rand(size=(8, 3, 224, 224))
vgg = SE_VGG(num_classes=1000)
out = vgg(x)
print(out.size())
对于很多成熟模型Pytorch已经通过Model模式快速构建,以及成熟的图像数据集可以快速通过torchvision下载。可参考官方文档。
https://pytorch.org/docs/stable/index.html