之前苦于CV不知道具体怎么入手,在看完cs231n的课程之后,算是对整体的套路和方法有了大概的认识,但是牵扯但具体的代码,感觉还是处于一个非常懵的状态,好在突然发现了一个大佬github仓库,讲的正好是CV很经典的论文和具体的代码实现,因此我决定写一个笔记追踪我的学习过程。
下面给上大佬的仓库链接: https://github.com/WZMIAOMIAO/deep-learning-for-image-processing
再贴上另一位大佬的csdn博客,他也跟踪学习了这个仓库:https://blog.csdn.net/m0_37867091
图中并没有给出Maxpooling操作的细节结果,你可以在前两个卷积中相想成Maxpooling和Conv2d在两个块一步做完得到下一个特征。
本次代码都只使用了原模型一般的卷积核进行操作,可以参照下面的网络的详细结构和代码一起看
import torch.nn as nn
import torch
class AlexNet(nn.Module):
def __init__(self, num_classes=1000, init_weights=False):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 48, kernel_size=11, stride=4, padding=2), # input[3, 224, 224] output[48, 55, 55]
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2), # output[48, 27, 27]
nn.Conv2d(48, 128, kernel_size=5, padding=2), # output[128, 27, 27]
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2), # output[128, 13, 13]
nn.Conv2d(128, 192, kernel_size=3, padding=1), # output[192, 13, 13]
nn.ReLU(inplace=True),
nn.Conv2d(192, 192, kernel_size=3, padding=1), # output[192, 13, 13]
nn.ReLU(inplace=True),
nn.nplaConv2d(192, 128, kernel_size=3, padding=1), # output[128, 13, 13]
nn.ReLU(ice=True),
nn.MaxPool2d(kernel_size=3, stride=2), # output[128, 6, 6]
)
self.classifier = nn.Sequential(
nn.Dropout(p=0.5),
nn.Linear(128 * 6 * 6, 2048),
nn.ReLU(inplace=True),
nn.Dropout(p=0.5),
nn.Linear(2048, 2048),
nn.ReLU(inplace=True),
nn.Linear(2048, num_classes),
)
if init_weights:
self._initialize_weights()
def forward(self, x):
x = self.features(x)
x = torch.flatten(x, start_dim=1)
x = self.classifier(x)
return x
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01)
nn.init.constant_(m.bias, 0)
1.nn.Sequential()
函数可以在网络层数比较深时精简我们的代码,把我们的各个模块打包成一个新的模块。
Example::
# Using Sequential to create a small model. When `model` is run,
# input will first be passed to `Conv2d(1,20,5)`. The output of
# `Conv2d(1,20,5)` will be used as the input to the first
# `ReLU`; the output of the first `ReLU` will become the input
# for `Conv2d(20,64,5)`. Finally, the output of
# `Conv2d(20,64,5)` will be used as input to the second `ReLU`
model = nn.Sequential(
nn.Conv2d(1,20,5),
nn.ReLU(),
nn.Conv2d(20,64,5),
nn.ReLU()
)
# Using Sequential with OrderedDict. This is functionally the
# same as the above code
model = nn.Sequential(OrderedDict([
('conv1', nn.Conv2d(1,20,5)),
('relu1', nn.ReLU()),
('conv2', nn.Conv2d(20,64,5)),
('relu2', nn.ReLU())
]))
"""
2.对于padding的补充,传入的数据只能是padding (int, tuple or str, optional): Padding added to all four sides of the input. Default: 0。本次为了方便直接使用一个int型。例如:传入2,则会在上下左右各补两行0,如果是(1,2)的tuple就会在左右补1行0,上下补两行0。
如果想要实现更加精细化的操作就要使用nn.ZeroPad2d(dtype:tensor) Args:padding (int, tuple): the size of the padding.
import torch.nn as nn
import torch
F = torch.ones(3, 3)
m = nn.ZeroPad2d((1,2))
print(F)
F_padding = m(F)
print(F_padding)
four = nn.ZeroPad2d((1,1,2,0))
print(four(F))
# tensor([[1., 1., 1.],
# [1., 1., 1.],
# [1., 1., 1.]])
# tensor([[0., 1., 1., 1., 0., 0.],
# [0., 1., 1., 1., 0., 0.],
# [0., 1., 1., 1., 0., 0.]])
# tensor([[0., 0., 0., 0., 0.],
# [0., 0., 0., 0., 0.],
# [0., 1., 1., 1., 0.],
# [0., 1., 1., 1., 0.],
# [0., 1., 1., 1., 0.]])
3.对于卷积完的特征图: N = (W-F+2P)/S+1,
N:卷积完特征图的大小;
W:输入特征图的大小
F:卷积核的大小
P:padding的大小,不一定都是2P
S:卷积的步长
如果得到的结果不是整数,例如:N=55.25,stride=4,那么就是舍弃最后一行和最后一列;N = 55.8, stride = 5那么就会舍弃最后四行和最后四列。0.25 * 4 = 1,0.8 * 5 = 4。可以通过余数*步长的计算舍弃的最后一部分。
4.nn.ReLU(inplace=True)
这里的inplace = True 是pytorch增加计算量但降低内存使用的一种方法。
5.如果在网络搭建的过程中传入参数init_weights = True
那么就会在网络搭建完成后初始化权重。
def _initialize_weights(self):
for m in self.modules(): #继承自父类 nn.Module
if isinstance(m, nn.Conv2d): #判断每一层所属的类别
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
#使用了kaiming_normal_方法对m.weight进行初始
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear): #判断每一层所属的类别
nn.init.normal_(m.weight, 0, 0.01) #对m.weight 进行初始化
nn.init.constant_(m.bias, 0)
我们看一下self.modules
的定义,他会返回一个迭代器遍历网络中的所有模块,也就是之前定义的每一个层结构。
def modules(self) -> Iterator['Module']:
r"""Returns an iterator over all modules in the network.
其中if instance(m, nn.Conv2d)
和 elif isinstance(m, nn.Linear)
是为了判断每一层所属的类别是卷积层还是全连接层,并使用对应的初始化方法。
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
当前pytorch版本会自动对我们的网络进行基本的初始化,其实不需要自己初始化。
6.x = torch.flatten(x, start_dim=1)
dim = 1,是对我们的tensor从第一维开始展平,因为我们的tensor是四维的[batch, channel, height, width]
batch是图片的个数,我们并不去动它,而从后面把每张图片的特征展开成一维的向量。
import os
import json
import torch
import torch.nn as nn
from torchvision import transforms, datasets, utils
import matplotlib.pyplot as plt
import numpy as np
import torch.optim as optim
from tqdm import tqdm
from model import AlexNet
def main():
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print("using {} device.".format(device))
#定义一个字典data_transform,key为str,映射为对象Compose
data_transform = {
"train": transforms.Compose([transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]),
"val": transforms.Compose([transforms.Resize((224, 224)), # cannot 224, must (224, 224)
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])}
data_root = os.path.abspath(os.path.join(os.getcwd(), "../..")) # get data root path,"../.." 返回上上级目录
image_path = os.path.join(data_root, "data_set", "flower_data") # flower data set path
assert os.path.exists(image_path), "{} path does not exist.".format(image_path)
train_dataset = datasets.ImageFolder(root=os.path.join(image_path, "train"),
transform=data_transform["train"])
train_num = len(train_dataset)
# {'daisy':0, 'dandelion':1, 'roses':2, 'sunflower':3, 'tulips':4}
flower_list = train_dataset.class_to_idx #导入
cla_dict = dict((val, key) for key, val in flower_list.items()) #交换key和val
# write dict into json file
json_str = json.dumps(cla_dict, indent=4)
with open('class_indices.json', 'w') as json_file:
json_file.write(json_str)
batch_size = 32
nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8])
# number of workers,在视频中是0,但是仓库中改动过的,会无法运行,建议改成nw = 0
print('Using {} dataloader workers every process'.format(nw))
train_loader = torch.utils.data.DataLoader(train_dataset,
batch_size=batch_size, shuffle=True,
num_workers=nw)
validate_dataset = datasets.ImageFolder(root=os.path.join(image_path, "val"),
transform=data_transform["val"])
val_num = len(validate_dataset)
validate_loader = torch.utils.data.DataLoader(validate_dataset,
batch_size=4, shuffle=False,
num_workers=nw)
print("using {} images for training, {} images for validation.".format(train_num,
val_num))
# test_data_iter = iter(validate_loader)
# test_image, test_label = test_data_iter.next()
#
# def imshow(img):
# img = img / 2 + 0.5 # unnormalize
# npimg = img.numpy()
# plt.imshow(np.transpose(npimg, (1, 2, 0)))
# plt.show()
#
# print(' '.join('%5s' % cla_dict[test_label[j].item()] for j in range(4)))
# imshow(utils.make_grid(test_image))
net = AlexNet(num_classes=5, init_weights=True)
net.to(device)
loss_function = nn.CrossEntropyLoss()
# pata = list(net.parameters())
optimizer = optim.Adam(net.parameters(), lr=0.0002)
epochs = 10
save_path = './AlexNet.pth'
best_acc = 0.0
train_steps = len(train_loader)
for epoch in range(epochs):
# train
net.train()
'''This has any effect only on certain modules. See documentations of
particular modules for details of their behaviors in training/evaluation
mode, if they are affected, e.g. :class:`Dropout`, :class:`BatchNorm`,
etc.'''
#和net.val()是一对,可以在这个函数的解释中看到,是为了在训练的时候不dropout,在验证的时候dropout防止过拟合
running_loss = 0.0
train_bar = tqdm(train_loader)
for step, data in enumerate(train_bar):
images, labels = data
optimizer.zero_grad()
outputs = net(images.to(device))
loss = loss_function(outputs, labels.to(device))
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
train_bar.desc = "train epoch[{}/{}] loss:{:.3f}".format(epoch + 1,
epochs,
loss)
# validate
net.eval()
acc = 0.0 # accumulate accurate number / epoch
with torch.no_grad():
val_bar = tqdm(validate_loader)
for val_data in val_bar:
val_images, val_labels = val_data
outputs = net(val_images.to(device))
predict_y = torch.max(outputs, dim=1)[1]
acc += torch.eq(predict_y, val_labels.to(device)).sum().item()
val_accurate = acc / val_num
print('[epoch %d] train_loss: %.3f val_accuracy: %.3f' %
(epoch + 1, running_loss / train_steps, val_accurate))
if val_accurate > best_acc:
best_acc = val_accurate
torch.save(net.state_dict(), save_path)
print('Finished Training')
if __name__ == '__main__':
main()
其中定义了字典data_transform,用来处理训练集和测试集,key为str,映射为类Compose:
data_transform = {
"train": transforms.Compose([transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]),
"val": transforms.Compose([transforms.Resize((224, 224)), # cannot 224, must (224, 224)
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])}
import os
import json
import torch
from PIL import Image
from torchvision import transforms
import matplotlib.pyplot as plt
from model import AlexNet
def main():
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
data_transform = transforms.Compose(
[transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
# load image
img_path = "../tulip.jpg"
assert os.path.exists(img_path), "file: '{}' dose not exist.".format(img_path)
img = Image.open(img_path)
plt.imshow(img)
# [N, C, H, W]
img = data_transform(img)
# expand batch dimension
img = torch.unsqueeze(img, dim=0)
# read class_indict
json_path = './class_indices.json'
assert os.path.exists(json_path), "file: '{}' dose not exist.".format(json_path)
json_file = open(json_path, "r")
class_indict = json.load(json_file)
# create model
model = AlexNet(num_classes=5).to(device)
# load model weights
weights_path = "./AlexNet.pth"
assert os.path.exists(weights_path), "file: '{}' dose not exist.".format(weights_path)
model.load_state_dict(torch.load(weights_path))
model.eval()
with torch.no_grad():
# predict class
output = torch.squeeze(model(img.to(device))).cpu()
predict = torch.softmax(output, dim=0)
predict_cla = torch.argmax(predict).numpy()
print_res = "class: {} prob: {:.3}".format(class_indict[str(predict_cla)],
predict[predict_cla].numpy())
plt.title(print_res)
for i in range(len(predict)):
print("class: {:10} prob: {:.3}".format(class_indict[str(i)],
predict[i].numpy()))
plt.show()
if __name__ == '__main__':
main()