Pytorch:ToTensor(object)类

PyTorch在做一般的深度学习图像处理任务时,先使用dataset类和dataloader类读入图片,在读入的时候需要做transform变换,其中transform一般都需要ToTensor()操作,将dataset类中__getitem__()方法内读入的PIL或CV的图像数据转换为torch.FloatTensor。详细过程如下: 


class ToTensor(object):
    """Convert a ``PIL Image`` or ``numpy.ndarray`` to tensor.

    Converts a PIL Image or numpy.ndarray (H x W x C) in the range
    [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]
    if the PIL Image belongs to one of the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1)
    or if the numpy.ndarray has dtype = np.uint8

    In the other cases, tensors are returned without scaling.
    """

    def __call__(self, pic):
        """
        Args:
            pic (PIL Image or numpy.ndarray): Image to be converted to tensor.

        Returns:
            Tensor: Converted image.
        """
        return F.to_tensor(pic)

    def __repr__(self):
        return self.__class__.__name__ + '()'


class ToPILImage(object):
    """Convert a tensor or an ndarray to PIL Image.

    Converts a torch.*Tensor of shape C x H x W or a numpy ndarray of shape
    H x W x C to a PIL Image while preserving the value range.

    Args:
        mode (`PIL.Image mode`_): color space and pixel depth of input data (optional).
            If ``mode`` is ``None`` (default) there are some assumptions made about the input data:
             - If the input has 4 channels, the ``mode`` is assumed to be ``RGBA``.
             - If the input has 3 channels, the ``mode`` is assumed to be ``RGB``.
             - If the input has 2 channels, the ``mode`` is assumed to be ``LA``.
             - If the input has 1 channel, the ``mode`` is determined by the data type (i.e ``int``, ``float``,
               ``short``).

    .. _PIL.Image mode: https://pillow.readthedocs.io/en/latest/handbook/concepts.html#concept-modes
    """
    def __init__(self, mode=None):
        self.mode = mode

    def __call__(self, pic):
        """
        Args:
            pic (Tensor or numpy.ndarray): Image to be converted to PIL Image.

        Returns:
            PIL Image: Image converted to PIL Image.

        """
        return F.to_pil_image(pic, self.mode)

    def __repr__(self):
        format_string = self.__class__.__name__ + '('
        if self.mode is not None:
            format_string += 'mode={0}'.format(self.mode)
        format_string += ')'
        return format_string

可以从to_tensor()函数看到,函数接受PIL Image或numpy.ndarray,将其先由HWC转置为CHW格式,再转为float后每个像素除以255.

transforms.ToTensor()
(1) transforms.ToTensor() 将numpy的ndarray或PIL.Image读的图片转换成形状为(C,H, W)的Tensor格式,且/255归一化到[0, 1.0]之间
(2)通道的具体顺序与cv2读的还是PIL.Image读的图片有关系
cv2:(B,G,R)
PIL.Image:(R, G, B)

代码例子:

import torch
import cv2
from PIL import Image
from torchvision import transforms
image = cv2.imread('myimage.jpg')  # numpy数组格式(H,W,C=3),通道顺序(B,G,R)
image2 = Image.open('myimage.jpg')  # PIL的JpegImageFile格式(size=(W,H))
print(image.shape)  # (H,W,3)
print(image2.size)  # (W,H)
tran = transforms.ToTensor()  # 将numpy数组或PIL.Image读的图片转换成(C,H, W)的Tensor格式且/255归一化到[0,1.0]之间
img_tensor = tran(image)
img2_tensor = tran(image2)
print(img_tensor.size())  # (C,H, W), 通道顺序(B,G,R)
print(img2_tensor.size())  # (C,H, W), 通道顺序(R,G,B)

orchvision.transforms.ToTensor

对于一个图片img,调用ToTensor转化成张量的形式,发生的不是将图片的RGB三维信道矩阵变成tensor图片在内存中以bytes的形式存储,转化过程的步骤是:

  1. img.tobytes()  将图片转化成内存中的存储格式
  2. torch.BytesStorage.frombuffer(img.tobytes() )  将字节以流的形式输入,转化成一维的张量
  3. 对张量进行reshape
  4. 对张量进行permute(2,0,1)
  5. 将当前张量的每个元素除以255
  6. 输出张量

 

torchvision.transforms.ToPILImage

对于一个Tensor的转化过程是:

  1. 将张量的每个元素乘上255
  2. 将张量的数据类型有FloatTensor转化成Uint8
  3. 将张量转化成numpy的ndarray类型
  4. 对ndarray对象做permute (1, 2, 0)的操作
  5. 利用Image下的fromarray函数,将ndarray对象转化成PILImage形式
  6. 输出PILImage

你可能感兴趣的:(Pytorch,ToTensor,Pytorch)