本文参考–PyTorch官方教程中文版链接:http://pytorch123.com/FirstSection/PyTorchIntro/
Pytorch中文文档:https://pytorch-cn.readthedocs.io/zh/latest/package_references/Tensor/
PyTorch英文文档:https://pytorch.org/docs/stable/tensors.html
《深度学习之PyTorch物体检测实战》《动手学深度学习》
第一次接触PyTorch,网上很难找到最新版本的教程,先从它的官方资料入手吧!
import os
import json
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import optim
import torchvision
from torchvision import models
from torch.utils.data import Dataset
from torchvision import transforms
from torch.utils.data import DataLoader
import visdom
# from tensorboardX import SummaryWriter
from torch.utils.tensorboard import SummaryWriter
图像增广(image augmentation)技术通过对训练图像做一系列随机改变,来产生相似但又不同的训练样本,从而扩大训练数据集的规模。图像增广的另一种解释是,随机改变训练样本可以降低模型对某些属性的依赖,从而提高模型的泛化能力。
为了在预测时得到确定的结果,我们通常只将图像增广应用在训练样本上,而不在预测时使用含随机操作的图像增广。
先回忆一下之间讲过的数据加载:
torch.utils.data.Dataset
抽象类,实现__len__()和__getitem__()方法,即可进行数据集迭代。其中__len__()用来提供数据集大小(可选),而__getitem__()用来支持整数索引(必须)torchvision.datasets
加载Imagenet,CIFAR10,MNIST 等公共数据集torchvision.transforms
torchvision.transforms
可以方便的进行图像缩放、裁剪、随机翻转、填充以及张量的归一化等操作。操作对象可以是PIL的Image或者Tensortransforms.Compose
将多个变换整合起来。在实际使用时,常会将变换操作集成到Dataset
类中torch.utils.data.DataLoader
类torch.utils.data.DataLoader
类包装之后就可以实现批量处理、随机选取等操作DataLoader
类是一个可迭代对象,对它的实例进行迭代即可用于训练过程这里以之前用过的一个红绿灯数据集为例:
该数据集在img和json文件夹中分别存有图片与标注,文件名为0.jpg, 1.jpg… 0.json, 1.json…
读取代码如下:
class MyData(Dataset):
def __init__(self, img_path, annotation_path, transforms=None):
# 初始化,读取数据集
self.annotation_path = annotation_path
self.img_path = img_path
self.transforms = transforms
def __len__(self):
return len(os.listdir(self.img_path))
def __getitem__(self, index):
annotation = json.load(open(self.annotation_path + '/' + str(index) + '.json'))
img = Image.open(self.img_path + '/' + str(index) + '.jpg')
# plt.imshow(img)
if self.transforms:
img = self.transforms(img)
return img, annotation
dataset = MyData('D:/Download/Dataset/traffic_light/train/img', 'D:/Download/Dataset/traffic_light/train/json',
transforms=transforms.Compose([
transforms.Resize(240), # 将图像最短边缩至240,宽高比例不变
transforms.RandomHorizontalFlip(), # 以0.5的概率左右翻转图像
transforms.ToTensor(), # 将PIL图像转为Tensor,并且进行归一化
transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]) # 进行mean与std为0.5的标准化
]))
dataloader = DataLoader(dataset, batch_size=4, shuffle=True)#, num_workers=2) # num_workers表示使用几个线程来加载数据 我的电脑加了这个参数就报错,可能不支持多线程操作
data_iter = iter(dataloader)
for step in range(1000):
data = next(data_iter)
# 下面即可将data用于训练网络
transforms.Compose(transforms)
Composes several transforms together.
transforms
(list of Transform
objects) – list of transforms to compose.transform = transforms.Compose([
# transforms.Resize(32), # 将图像最短边缩至240,宽高比例不变
transforms.RandomHorizontalFlip(), # 以0.5的概率左右翻转图像
transforms.ToTensor(), # 将PIL图像转为Tensor,并且进行归一化
transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]) # 进行mean与std为0.5的标准化
])
transforms.ColorJitter(brightness=0, contrast=0, saturation=0, hue=0)
Randomly change the brightness, contrast and saturation of an image.
brightness
(float
or tuple
of python:float (min, max)
) – How much to jitter brightness. brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness]
or the given [min, max]
. Should be non negative numbers.
contrast
(float
or tuple
of python:float (min, max)
) – How much to jitter contrast. contrast_factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast]
or the given [min, max]
. Should be non negative numbers.
saturation
(float
or tuple
of python:float (min, max)
) – How much to jitter saturation. saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation]
or the given [min, max]
. Should be non negative numbers.
hue
(float
or tuple of python:float (min, max)
) – How much to jitter hue. hue_factor is chosen uniformly from [-hue, hue]
or the given [min, max]
. Should have 0<= hue <= 0.5 or -0.5 <= min <= max <= 0.5
.
transforms.CenterCrop(size)
Crops the given PIL Image
at the center.
size
(sequence
or int
) – Desired output size of the crop. If size is an int
instead of sequence
like (h, w), a square crop (size
, size
) is made.transforms.RandomCrop(size, padding=None, pad_if_needed=False, fill=0, padding_mode='constant')
Crop the given PIL Image at a random location.
size
(sequence
or int
) – Desired output size of the crop. If size is an int
instead of sequence
like (h, w)
, a square crop (size, size)
is made.
padding
(int
or sequence
, optional) – Optional padding on each border of the image. Default
is None
. If a sequence
of length 4 is provided, it is used to pad left, top, right, bottom borders respectively. If a sequence
of length 2 is provided, it is used to pad left/right, top/bottom borders, respectively.
pad_if_needed
(boolean
) – It will pad the image if smaller than the desired size to avoid raising an exception. Since cropping is done after padding, the padding seems to be done at a random offset.
fill
– Pixel fill value for constant fill. Default
is 0
. If a tuple of length 3
, it is used to fill R, G, B channels respectively. This value is only used when the padding_mode
is constant
padding_mode
–
Type of padding. Should be: constant, edge, reflect or symmetric. Default
is constant
.
constant
: pads with a constant value, this value is specified with fill
edge
: pads with the last value on the edge of the image
reflect
: pads with reflection of image (without repeating the last value on the edge)
padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode will result in [3, 2, 1, 2, 3, 4, 3, 2]
symmetric
: pads with reflection of image (repeating the last value on the edge)
padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode will result in [2, 1, 1, 2, 3, 4, 4, 3]
transforms.RandomResizedCrop(size, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=2)
Crop the given PIL Image to random size and aspect ratio.
A crop of random size (default
: of 0.08
to 1.0
) of the original size and a random aspect ratio (default
: of 3/4
to 4/3
) of the original aspect ratio is made. This crop is finally resized to given size. This is popularly used to train the Inception networks.
size
– expected output size of each edge 可以是整数也可以是元组scale
– range of size of the origin size croppedratio
– range of aspect ratio of the origin aspect ratio cropped 裁剪区域的宽高比interpolation
– Default
: PIL.Image.BILINEAR
transforms.RandomHorizontalFlip(p=0.5)
The image can be a PIL Image
or a torch Tensor
, in which case it is expected to have […, H, W]
shape, where …
means an arbitrary number of leading dimensions
p
(float
) – probability of the image being flipped. Default
value is 0.5
transforms.RandomVerticalFlip(p=0.5)
transforms.RandomRotation(degrees, resample=False, expand=False, center=None, fill=None)
Rotate the image by angle.
degrees
(sequence
or float
or int
) – Range of degrees to select from. If degrees is a number instead of sequence like (min, max)
, the range of degrees will be (-degrees, +degrees)
.
resample
({PIL.Image.NEAREST
, PIL.Image.BILINEAR
, PIL.Image.BICUBIC
}, optional) – An optional resampling filter. See filters for more information. If omitted, or if the image has mode “1” or “P”, it is set to PIL.Image.NEAREST
.
expand
(bool
, optional) – Optional expansion flag. If true
, expands the output to make it large enough to hold the entire rotated image. If false
or omitted
, make the output image the same size as the input image. Note that the expand flag assumes rotation around the center and no translation.
center
(2-tuple
, optional) – Optional center of rotation. Origin is the upper left corner. Default
is the center of the image.
fill
(n-tuple
or int
or float
) – Pixel fill value for area outside the rotated image. If int
or float
, the value is used for all bands respectively. Defaults
to 0
for all bands. This option is only available for pillow>=5.2.0.
transforms.Resize(size, interpolation=2)
Resize the input PIL Image to the given size.
size
(sequence
or int
) – Desired output size. If size is a sequence
like (h, w)
, output size will be matched to this. If size is an int
, 最短边缩放为size且保持宽高比例不变
interpolation
(int
, optional) – Desired interpolation. Default is PIL.Image.BILINEAR
transforms.Normalize(mean, std, inplace=False)
Given mean: (mean[1],...,mean[n])
and std: (std[1],..,std[n])
for n
channels, this transform will normalize each channel of the input torch.*Tensor
i.e.
, output[channel] = (input[channel] - mean[channel]) / std[channel]
__call__(tensor)
tensor
(Tensor
) – Tensor image of size (C, H, W)
to be normalized.transforms.ToPILImage(mode=None)
Converts a torch.*Tensor
of shape C x H x W
or a numpy ndarray of shape H x W x C
to a PIL Image
while preserving the value range.
mode
(PIL.Image mode
)None
(default
) there are some assumptions made about the input data:torchvision.transforms.ToTensor
Converts a PIL Image
or numpy.ndarray
(H x W x C)
in the range [0, 255]
to a torch.FloatTensor
of shape (C x H x W)
in the range [0.0, 1.0]
if the PIL Image belongs to one of the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1) or if the numpy.ndarray has dtype = np.uint8
In the other cases, tensors are returned without scaling.
Functional transforms give you fine-grained control of the transformation pipeline. As opposed to the transformations above, functional transforms don’t contain a random number generator for their parameters. That means you have to specify/generate all parameters, but you can reuse the functional transform.
先加载一张图像作为实验样例:
vis = visdom.Visdom(env='image')
image = Image.open('./cat.jpg')
vis.image(transforms.ToTensor()(image), win='original image')
def apply(original_img, transforms=None, title=None, nrow=4, num=8):
"""
对图像多次运用图像增广方法并展示所有结果
"""
for i in range(num):
img = transforms(original_img)
img = img.reshape(1, *img.shape)
if i == 0:
imgs = img
else:
imgs = torch.cat([imgs, img], 0)
vis.images(imgs, nrow=nrow, opts=dict(title=title))
transform = transforms.Compose([
transforms.Resize(300),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
])
apply(image, transforms=transform, title='random horizontal flip')
transform = transforms.Compose([
transforms.Resize(300),
transforms.RandomVerticalFlip(),
transforms.ToTensor(),
])
apply(image, transforms=transform, title='random vertical flip')
图像的随机裁剪能使物体以不同的比例出现在图像的不同位置,从而降低模型对目标位置的敏感度
transform = transforms.Compose([
transforms.RandomResizedCrop(size=300, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333)),
transforms.ToTensor(),
])
apply(image, transforms=transform, title='random vertical flip')
transform = transforms.Compose([
transforms.Resize(300),
transforms.ColorJitter(brightness=0.5, contrast=0, saturation=0, hue=0), # 亮度为50%-150%
transforms.ToTensor(),
])
apply(image, transforms=transform, title='brightness')
transform = transforms.Compose([
transforms.Resize(300),
transforms.ColorJitter(brightness=0, contrast=0.5, saturation=0, hue=0), # 对比度为50%-150%
transforms.ToTensor(),
])
apply(image, transforms=transform, title='contrast')
transform = transforms.Compose([
transforms.Resize(300),
transforms.ColorJitter(brightness=0, contrast=0, saturation=0.5, hue=0), # 饱和度为50%-150%
transforms.ToTensor(),
])
apply(image, transforms=transform, title='saturation')
transform = transforms.Compose([
transforms.Resize(300),
transforms.ColorJitter(brightness=0, contrast=0, saturation=0, hue=0.5), # -05-0.5
transforms.ToTensor(),
])
apply(image, transforms=transform, title='hue')
transform = transforms.Compose([
transforms.Resize(300),
transforms.RandomRotation(degrees=5, expand=False, fill=None),
transforms.ToTensor(),
])
apply(image, transforms=transform, title='rotate')