梯度消失或梯度爆炸
当每一层的梯度是小于1的数,当进行反向传播时,随着网络越深,梯度越接近零
反之,当每一层的梯度是大于1的数…
所以通常需要对数据进行标准化处理,权重初始化,以及用BN层,在ResNet网络中使用了residual模块(残差模块)
pytorch入门学习时,输出与标签常用max()函数进行比较
import torch
label = torch.tensor([0, 1, 2, 3, 4, 0])
outputs = torch.tensor([[0.9, 0.4, 0.1, 0.5, 0.4], [0.9, 0.2, 0.3, 0.8, 0.7],
[0.2, 0.1, 0.9, 0.5, 0.3],[0.2, 0.2, 0.4, 0.9, 0.6],
[0.9, 0.5, 0.1, 0.5, 0.4], [0.9, 0.1, 0.4, 0.1, 0.6]])
_, predicted = torch.max(outputs, dim=1)
correct = (predicted == label)
print("predicted:", predicted)
print("label:", label)
print(correct)
print(correct[1].item())
pre2 = torch.argmax(outputs, dim=1)
print("pre2:", pre2) # 输出结果和predicted一样
输出结果为:
predicted: tensor([0, 0, 2, 3, 0, 0])
label: tensor([0, 1, 2, 3, 4, 0])
tensor([ True, False, True, True, False, True])
False
pre2: tensor([0, 0, 2, 3, 0, 0])
这里要说明的是:
labels 里的数据并不是按照顺序排列的,并且label.size(0) == MINI_BATCH
outputs的列数,即outputs.shape[1]
才是真正的这个数据集的种类
outputs的列数 == MINI_BATCH,所以,用max()
函数时,outputs的行数要和label.size(0)的数目相等
torchvision.utils.make_grid(tensor, nrow=8, padding=2, normalize=False, range=None, scale_each=False)
# 将一小batch图片变为一张图。nrow表示每行多少张图片的数量。
# 给一个batch为4的图片,h和w分别为32,channel为3,看看结果
images,labels = dataiter.next()
print(images.shape)
#torch.Size([4, 3, 32, 32]) bchw
print(torchvision.utils.make_grid(images).shape)
#torch.Size([3, 36, 138])
怎么理解这个输出结果呢?第一个dim当然就是channel,因为合并成一张图片了嘛,所以batch这个维度就融合了,变成了chw,这里c还是原来的channel数,h比原来增加了4,w = 32*4 + 10,c很好理解,那么为什么h增加了4,w增加了10呢?
我想办法把batch_size调整成了3,结果如下:
#torch.Size([3, 3, 32, 32])
#torch.Size([3, 36, 104])
tensor (Tensor or list) – 4D mini-batch Tensor of shape (B x C x H x W) or a list of images all of the same size.
nrow (int, optional) – Number of images displayed in each row of the grid. The Final grid size is (B / nrow, nrow). Default is 8.
padding (int, optional) – amount of padding. Default is 2.
normalize (bool, optional) – If True, shift the image to the range (0, 1), by subtracting the minimum and dividing by the maximum pixel value.
range (tuple, optional) – tuple (min, max) where min and max are numbers, then these numbers are used to normalize the image. By default, min and max are computed from the tensor.
scale_each (bool, optional) – If True, scale each image in the batch of images separately rather than the (min, max) over all images.
pad_value (float, optional) – Value for the padded pixels.
官方文档里要注意nrow 和 padding 这两个参数是有默认值的
很明显,当batch为3的时候,w应该为332 = 96,但是我们考虑到每张图片的padding其实是2,因此,每张图片其实变成了3636的图片,所以最终应该为w = 36/* 3 =108才对呀?
显然上面的想法还是不对,思考了一会,算是想明白了。
三张图片,padding在水平方向并没有每张图片都padding,而是两张图片之间只有一个padding,这样3张图片空隙有两个,加上最左和最右,水平方向上其实是4* 2 =8,所以w增加了8,这样96 + 8 = 104 就对了。同理,竖直方向上也是这样处理的。
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-TwkS9dh0-1641207172251)(D:\StudyNotes\wei.png)]
报错原因:
保存模型应该是:
torch.save(model.state_dict(), PATH)
如果写成:
torch.save(model.state_dict, PATH)
则会报错
这里以测试集为例(MINI_BATCH = 10)
# 定义一个显示图像的函数
def show_image(img):
# img是Tensor,且为 B * C * H *W
#这里对应的是transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
img = img * 0.5 + 0.5
img = np.transpose(img.numpy(), (1, 2, 0)) # C H W -> H W C
plt.imshow(img)
plt.show()
data = iter(test_loader)
images, labels = next(data)
img = torchvision.utils.make_grid(images, nrow = 10) # nrow参数用法参考上面
#调用函数
show_image(img)
# 显示某一张图片
# load image
img_path = "./data/daisy_01.jpg"
img = Image.open(img_path)
plt.imshow(img) # H W C
plt.show()
Receptive Field 感受野
输出feature map上的一个单元对应输入层上的区域大小
举例:
input: 9 * 9 *1 Conv1 Size:3 * 3 Stride:2 4 * 4 * 1
Conv2 Size:2 * 2 Stride:2 2 * 2 * 1
out_size = (in_size - F_size + 2P) / S +1
感受野计算公式:
F(i) = (F(i+1) - 1) * Stride + Ksize
F(i)为第i层感受野
Stride为第i层的步距
Ksize为卷积核或池化核尺寸
Feature map: F = 1
Pool1: F = (1 - 1) * 2 + 2 = 2 # 即对应 2 * 2 的区域
Conv1: F = (2 - 1) * 2 + 3 = 5 # 即对应 5 * 5 的区域
安装tqbm包的命令:pip install tqdm
from tqdm import tqdm # 导入包
import time
for i in tqdm(range(10), desc='Processing_1'):
time.sleep(0.05)
# 列表类型
list1 = [1, 5, 7, 9, "w"]
for i in tqdm(list1, desc="Processing_2"):
time.sleep(0.1)
# 字典类型
dict1 = {"apple": 10, "orange": 5, "banana": 20}
for i in tqdm(list(dict1.keys()), desc="Processing_3"):
time.sleep(0.1)
在训练开始之前写上model.trian(),在测试时写上model.eval()
如果模型中有BN层(Batch Normalization)和Dropout,需要在训练时添加model.train(),在测试时添加model.eval()。其中model.train()是保证BN层用每一批数据的均值和方差,而model.eval()是保证BN用全部训练数据的均值和方差;而对于Dropout,model.train()是随机取一部分网络连接来训练更新参数,而model.eval()是利用到了所有网络连接。
nn.Softmax( ) 是一个类
返回一个与输入维度相同的张量
import torch
import numpy
# y = torch.rand(12).reshape(2, 2, 3)
y = torch.normal(0, 1, (2, 2, 3))
print(torch.softmax(x, dim=0)) # 两片 2*3 的张量,对应位置的和为1,想象一下空间
print(torch.softmax(x, dim=1)) # 每一列的和为1
print(torch.softmax(x, dim=2)) # 每一行的和为1
输出为:
tensor([[[-0.0479, 0.0685, -1.0559],
[-1.1511, 1.1537, -1.4805]],
[[-0.5178, 0.0845, 0.2507],
[-0.9432, -2.0841, 0.6930]]])
tensor([[[0.6153, 0.4960, 0.2131],
[0.4482, 0.9622, 0.1022]],
[[0.3847, 0.5040, 0.7869],
[0.5518, 0.0378, 0.8978]]])
tensor([[[0.7509, 0.2525, 0.6046],
[0.2491, 0.7475, 0.3954]],
[[0.6048, 0.8974, 0.3912],
[0.3952, 0.1026, 0.6088]]])
tensor([[[0.4019, 0.4515, 0.1467],
[0.0852, 0.8536, 0.0613]],
[[0.2007, 0.3665, 0.4328],
[0.1549, 0.0495, 0.7956]]])
0.3921 + 0.6079 = 1 0.3921 + 0.6079 = 1 0.3921+0.6079=1
0.3921 = e 0.3921 e 0.3921 + e 0.6079 0.3921 = \frac{e^{0.3921}}{{e^{0.3921} + e^{0.6079}}} 0.3921=e0.3921+e0.6079e0.3921
transforms.Compose函数就是将transforms组合在一起
'''
在这个案例当中,训练集的所有图像,分好类都放在
D:/PythonTemp/deep-learning/data_set/flower_data/train/文件夹下,
测试集亦然
输入图片为 3 * 224 * 224
'''
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision import datasets
def main():
data_transfrom = {
"train":transforms.Compose([transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]),
"val":transforms.Compose([transforms..Resize((224, 224)), # cannot224 must(224, 224)
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]),
}
image_root = "D:/PythonTemp/deep-learning/data_set/flower_data/"
train_dataset = datasets.ImageFolder(root=image_root + "train", transform=data_transform["train"])
train_num = len(train_dataset) # 可以输出训练集有多少文件
train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True, num_workers=8)
val_dataset = datasets.ImageFolder(root=image_root + "val/", transform=data_transform["val"])
val_num = len(val_dataset) # 可以输出训练集有多少文件
val_loader = DataLoader(dataset=val_dataset, batch_size=batch_size, shuffle=False, num_workers=8)
print("using {} images for training, using {} images for validating".format(train_num, val_num))
在进行全连接层之前,都要进行数据的展平处理,有两种展平方式
Pytorch 是 B C H W
import torch
import numpy as np
# 设 BCHW 是 1, 3, 2, 2
a = np.arange(12).reshape(1, 3, 2, 2)
a = torch.tensor(a)
a_view = a.view(-1, 3*2*2)
a_flatten = torch.flatten(a,start_dim=1)
print("a_view = ", a_view)
print("="*30)
print("a_flatten = ", a_flatten)
# 输出为:
a_view = tensor([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]], dtype=torch.int32)
==============================
a_flatten = tensor([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]], dtype=torch.int32)
文档格式化方法 | 类型type | 根节点 | 编码方式 | xpath |
---|---|---|---|---|
etree.HTML() | html | (X.encode(‘utf-8’)) | 支持 | |
etree.fromstring() | 原文档节点 | (X.encode(‘utf-8’)) | 支持 | |
etree.tostring() | 无 | 无 | 不支持 |
表格解读:
1.从三者的类型上可以看到,etree.HTML()和etree.fromstring()都是属于同一种“class类”,这个类型才会支持使用xpath。也就说etree.tostring()是“字节bytes类”,不能使用xpath!
2.从根节点看,etree.HTML()的文档格式已经变成html类型,所以根节点自然就是html标签【这属于html方面的知识点,不清楚的朋友可以查资料了解】
但是,etree.fromstring()的根节点还是原文档中的根节点,说明这种格式化方式并不改变原文档的整体结构,我比较推荐使用这种方式进行文档格式化,因为这样有利于我们有时使用xpath的绝对路径方式查找信息!
而etree.tostring()是没有所谓的根节点的,因为这个方法得到的文档类型是‘bytes’类,其实里面的tostring,我们可以理解成to_bytes,这样可以帮助理解记忆。
3.从编码方式上看,etree.HTML()和etree.fromstring()的括号内参数都要以“utf-8”的方式进行编码!表格中的X是表示用read()方法之后的原文档内容。
为了理解,下面举个实例:
# xml 文件
<annotation>
<folder>VOC2012</folder>
<filename>2008_000054.jpg</filename>
<source>
<database>The VOC2008 Database</database>
<annotation>PASCAL VOC2008</annotation>
<image>flickr</image>
</source>
<size>
<width>500</width>
<height>333</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>bird</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<occluded>0</occluded>
<bndbox>
<xmin>284</xmin>
<ymin>100</ymin>
<xmax>318</xmax>
<ymax>184</ymax>
</bndbox>
<difficult>0</difficult>
</object>
<object>
<name>bird</name>
<pose>Right</pose>
<truncated>0</truncated>
<occluded>0</occluded>
<bndbox>
<xmin>112</xmin>
<ymin>146</ymin>
<xmax>198</xmax>
<ymax>209</ymax>
</bndbox>
<difficult>0</difficult>
</object>
</annotation>
from lxml import etree
import os
root = "D:\\PythonTemp\\FasterR_CNN\\VOCdevkit\\VOC2012"
annotations_root = os.path.join(root, "Annotations")
txt_path = os.path.join(root, "ImageSets", "Main", "train.txt")
xml_path = os.path.join(annotations_root, "2008_000054.xml")
with open(xml_path) as f_xml:
# 这里读取的是每一个 .xml 的信息
xml_str = f_xml.read()
xml = etree.fromstring(xml_str)
print("len(xml) = ", len(xml))
def parse_xml_to_dict(xml):
if len(xml) == 0:
return {xml.tag: xml.text}
result = {}
for child in xml:
child_result = parse_xml_to_dict(child)
if child.tag != "object":
result[child.tag] = child_result[child.tag]
else:
if child.tag not in result:
result[child.tag] = []
result[child.tag].append(child_result[child.tag])
return {xml.tag: result}
data = parse_xml_to_dict(xml)
print(data)
net = nn.Sequential(nn.Flatten(),
nn.Linear(784, 256),
nn.ReLU(),
nn.Linear(256, 10))
def init_weights(m):
if type(m) == nn.Linear:
nn.init.normal_(m.weight, std=0.01)
net.apply(init_weights)
import numpy as np
x = np.arange(16).reshape(8, 2)
x = x[:, None]
print(x.shape)
y = np.arange(20).reshape(10, 2)
y = y[None]
print(y.shape)
z = np.arange(24).reshape(12, 2)
z = z[:, :, None]
print(z.shape)
# 输出为:
(8, 1, 2)
(1, 10, 2)
(12, 2, 1)
img = cv2.imread(img_path)
img = torch.unsqueeze(img, dim=0)
这两个格式都是 np.maximum(x1, x2)
并且 x1.shape == x2.shape,如果不相等,会通过broadcast变成形状相同的
x = np.array([[2, 4], [3, 5]])
y = np.array([[3, 7]])
# 这里x.shape != y.shape,所以y通过broadcasting变成 np.array([[3, 7], [3, 7]])
print(np.maximum(x, y))
# 输出为:
[[3 7]
[3 7]]
2 vs 3 4 vs 7 3 vs 3 5 vs 7