PyTorch在做一般的深度学习图像处理任务时,先使用dataset类和dataloader类读入图片,在读入的时候需要做transform变换,其中transform一般都需要ToTensor()操作,将dataset类中__getitem__()方法内读入的PIL或CV的图像数据转换为torch.FloatTensor。详细过程如下:
class ToTensor(object):
"""Convert a ``PIL Image`` or ``numpy.ndarray`` to tensor.
Converts a PIL Image or numpy.ndarray (H x W x C) in the range
[0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]
if the PIL Image belongs to one of the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1)
or if the numpy.ndarray has dtype = np.uint8
In the other cases, tensors are returned without scaling.
"""
def __call__(self, pic):
"""
Args:
pic (PIL Image or numpy.ndarray): Image to be converted to tensor.
Returns:
Tensor: Converted image.
"""
return F.to_tensor(pic)
def __repr__(self):
return self.__class__.__name__ + '()'
class ToPILImage(object):
"""Convert a tensor or an ndarray to PIL Image.
Converts a torch.*Tensor of shape C x H x W or a numpy ndarray of shape
H x W x C to a PIL Image while preserving the value range.
Args:
mode (`PIL.Image mode`_): color space and pixel depth of input data (optional).
If ``mode`` is ``None`` (default) there are some assumptions made about the input data:
- If the input has 4 channels, the ``mode`` is assumed to be ``RGBA``.
- If the input has 3 channels, the ``mode`` is assumed to be ``RGB``.
- If the input has 2 channels, the ``mode`` is assumed to be ``LA``.
- If the input has 1 channel, the ``mode`` is determined by the data type (i.e ``int``, ``float``,
``short``).
.. _PIL.Image mode: https://pillow.readthedocs.io/en/latest/handbook/concepts.html#concept-modes
"""
def __init__(self, mode=None):
self.mode = mode
def __call__(self, pic):
"""
Args:
pic (Tensor or numpy.ndarray): Image to be converted to PIL Image.
Returns:
PIL Image: Image converted to PIL Image.
"""
return F.to_pil_image(pic, self.mode)
def __repr__(self):
format_string = self.__class__.__name__ + '('
if self.mode is not None:
format_string += 'mode={0}'.format(self.mode)
format_string += ')'
return format_string
transforms.ToTensor()
(1) transforms.ToTensor() 将numpy的ndarray或PIL.Image读的图片转换成形状为(C,H, W)的Tensor格式,且/255归一化到[0, 1.0]之间
(2)通道的具体顺序与cv2读的还是PIL.Image读的图片有关系
cv2:(B,G,R)
PIL.Image:(R, G, B)
代码例子:
import torch
import cv2
from PIL import Image
from torchvision import transforms
image = cv2.imread('myimage.jpg') # numpy数组格式(H,W,C=3),通道顺序(B,G,R)
image2 = Image.open('myimage.jpg') # PIL的JpegImageFile格式(size=(W,H))
print(image.shape) # (H,W,3)
print(image2.size) # (W,H)
tran = transforms.ToTensor() # 将numpy数组或PIL.Image读的图片转换成(C,H, W)的Tensor格式且/255归一化到[0,1.0]之间
img_tensor = tran(image)
img2_tensor = tran(image2)
print(img_tensor.size()) # (C,H, W), 通道顺序(B,G,R)
print(img2_tensor.size()) # (C,H, W), 通道顺序(R,G,B)
对于一个图片img,调用ToTensor转化成张量的形式,发生的不是将图片的RGB三维信道矩阵变成tensor图片在内存中以bytes的形式存储,转化过程的步骤是:
对于一个Tensor的转化过程是: