transforms相当于一个工具箱,里面的类(class)totensor、resize等相当于工具。图片经过工具的处理,就会得到我们想要的结果。
首先我们查看transforms的源代码中的totensor函数:
class ToTensor: """Convert a ``PIL Image`` or ``numpy.ndarray`` to tensor. This transform does not support torchscript. def __call__(self, pic): """ Args: pic (PIL Image or numpy.ndarray): Image to be converted to tensor. Returns: Tensor: Converted image. """ return F.to_tensor(pic) def __repr__(self): return self.__class__.__name__ + '()'
可以知道该函数可以将传入的PIL Image和ndarray类型转换为tensor类型数据,下面进行实际操作:
from PIL import Image
from torchvision import transforms
img_path = "data/train/ants_image/67270775_e9fdf77e9d.jpg"#传入相对路径
img = Image.open(img_path)#传入图片的地址
这里的img是PIL.JpegImage类型,接着进行转换:
trans_tensor = transforms.ToTensor()#创建一个对象
img_tensor = trans_tensor(img)
print(img_tensor)
根据打印结果知道,转换为了tensor类型,我们查看tensor的属性:
可以发现有很多是神经网络用到的参数,所以我们采用transform将数据转换为tensor类型,便于我们之后的神经网络的使用。
上面说的是将PIL Image类型转换为tensor类型,我们还可以用opencv读取图片为numpy.ndarray:
可以看到图片读为ndarray类型 ,可以将该类型转换为tensor类型。
接着我们将读取的照片可视化:
from PIL import Image
from torch.utils.tensorboard import SummaryWriter
from torchvision import transforms
img_path = "data/train/ants_image/67270775_e9fdf77e9d.jpg"#传入相对路径
img = Image.open(img_path)#传入图片的地址
writer = SummaryWriter("logs")
trans_tensor = transforms.ToTensor()#创建一个对象
img_tensor = trans_tensor(img)
writer.add_image("图片",img_tensor)
writer.close()
运行结束后,打开TensorBoard窗口,可以看到:
将torch.Tensor类型的数据进行了显示。
学习几个常用的transforms:
Normalize(归一化):
class Normalize(torch.nn.Module): """Normalize a tensor image with mean and standard deviation. This transform does not support PIL Image. Given mean: ``(mean[1],...,mean[n])`` and std: ``(std[1],..,std[n])`` for ``n`` channels, this transform will normalize each channel of the input ``torch.*Tensor`` i.e., ``output[channel] = (input[channel] - mean[channel]) / std[channel]``
注意到,只能对tensor类进行操作,因为我们的图片是RGB三个通道,所以我们需要同时设置三个数。
from PIL import Image
from torch.utils.tensorboard import SummaryWriter
from torchvision import transforms
trans_totensor = transforms.ToTensor()
writer = SummaryWriter("logs")
img = Image.open("data/train/ants_image/20935278_9190345f6b.jpg")
#Totensor使用
img_tensor = trans_totensor(img)#转换为了tensor型
writer.add_image("img_tensor",img_tensor,0)
#Normalize使用
trans_norm = transforms.Normalize([0.5,0.5,0.5],[0.5,0.5,0.5])#因为是RGB型
img_norm = trans_norm(img_tensor)
writer.add_image("img_norm", img_norm, 0)
writer.close()
打开Tensorboard可以看到:
Resize的使用:
class Resize(torch.nn.Module): """Resize the input image to the given size. If the image is torch Tensor, it is expected to have [..., H, W] shape, where ... means an arbitrary number of leading dimensions Args: size (sequence or int): Desired output size. If size is a sequence like (h, w), output size will be matched to this. If size is an int, smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size). def __init__(self, size, interpolation=InterpolationMode.BILINEAR, max_size=None, antialias=None): ... def forward(self, img): ...
如果只有一个参数,则相当于等比缩放。
print(img)
trans_size = transforms.Resize((520, 520))
img_resize = trans_size(img)
print(img_resize)
打印结果为:
可以知道size已经改变。
compose的使用:
class Compose: """Composes several transforms together. This transform does not support torchscript. Please, see the note below. Args: transforms (list of ``Transform`` objects): list of transforms to compose. Example: >>> transforms.Compose([ >>> transforms.CenterCrop(10), >>> transforms.PILToTensor(), >>> transforms.ConvertImageDtype(torch.float), >>> ])
该函数的使用需要是transforms组成的列表,我们将图片先进行大小转换,接着转换为tensor类型,然后使用Tensorboard进行显示:
# compose
trans_size_2 = transforms.Resize(520)
tran_com = transforms.Compose([trans_size_2, trans_totensor])
img_com = tran_com(img)
writer.add_image("compose", img_com)
compose中的函数是按顺序进行的。