torch.nn.Unfold函数进行图像的patch块提取

torch.nn.Unfold函数进行图像的patch块提取

在项目中要用到图像转为一个个的n*n的patch块,本以为可以白嫖transformer,但妹想到它是直接nn.Conv2d(in_chans, in_chans, kernel_size=patch_size, stride=patch_size),合着是直接每个patch都投影成一个数值了
但苦手了一天,终于还是被我用unfold搞好了。

首先简单的定义x的维度为(b, c, h, w)->(1, 1, 4, 4)

x = torch.randn(1, 1, 4, 4)
x->tensor([[[
[-1.1258, -1.1524, -0.2506, -0.4339],
[ 0.8487,  0.6920, -0.3160, -2.1152],
[ 0.3223, -1.2633,  0.3500,  0.3081],
[ 0.1198,  1.2377,  1.1168, -0.2473]]]])

patch_size=n #你想设定的每个patch的大小
y = nn.Unfold(kernel_size=patch_size, stride=patch_size)(x)
# y.shape->(b, c*k*k, patch_number) 这是定义
#可以看出x中每个patch在y中都变成了一列
tensor([[
[-1.1258, -0.2506,  0.3223,  0.3500],
[-1.1524, -0.4339, -1.2633,  0.3081],
[ 0.8487, -0.3160,  0.1198,  1.1168],
[ 0.6920, -2.1152,  1.2377, -0.2473]]])

# 这里一列不好进行view操作变为patch块,所以用个转置
y = y.transpose(2,1)
tensor([[
[-1.1258, -1.1524,  0.8487,  0.6920],
[-0.2506, -0.4339, -0.3160, -2.1152],
[ 0.3223, -1.2633,  0.1198,  1.2377],
[ 0.3500,  0.3081,  1.1168, -0.2473]]])
#然后就可以用view变成patch大小
#但用了transpose或者permute之类的函数会改变内存存储位置
#存储就变得不连续了
#所以这里用到contiguous(),不然后面有些操作就会报错
#比如要fold回来就很麻烦(意思是加了contiguous()就不麻烦了)。
y = y.view(B, self.num_patches, C, \
								 self.patch_size[0], self.patch_size[1]).contiguous()
tensor([[[[[-1.1258, -1.1524],
           [ 0.8487,  0.6920]]],


         [[[-0.2506, -0.4339],
           [-0.3160, -2.1152]]],


         [[[ 0.3223, -1.2633],
           [ 0.1198,  1.2377]]],


         [[[ 0.3500,  0.3081],
           [ 1.1168, -0.2473]]]]])
#完美结束,四个patch都被取出来了,如果有通道的话就是这样的
#x->
tensor([[[[-1.1258, -1.1524, -0.2506, -0.4339],
          [ 0.8487,  0.6920, -0.3160, -2.1152],
          [ 0.3223, -1.2633,  0.3500,  0.3081],
          [ 0.1198,  1.2377,  1.1168, -0.2473]],

         [[-1.3527, -1.6959,  0.5667,  0.7935],
          [ 0.5988, -1.5551, -0.3414,  1.8530],
          [ 0.7502, -0.5855, -0.1734,  0.1835],
          [ 1.3894,  1.5863,  0.9463, -0.8437]]]])
#unfold后
tensor([[[[[-1.1258, -1.1524],
           [ 0.8487,  0.6920]],

          [[-1.3527, -1.6959],
           [ 0.5988, -1.5551]]],


         [[[-0.2506, -0.4339],
           [-0.3160, -2.1152]],

          [[ 0.5667,  0.7935],
           [-0.3414,  1.8530]]],


         [[[ 0.3223, -1.2633],
           [ 0.1198,  1.2377]],

          [[ 0.7502, -0.5855],
           [ 1.3894,  1.5863]]],


         [[[ 0.3500,  0.3081],
           [ 1.1168, -0.2473]],

          [[-0.1734,  0.1835],
           [ 0.9463, -0.8437]]]]])

这个操作可以让每个小patch块单独进行各种操作,然后再用fold函数变为之前的维度大小,如果有需要的小伙伴可以在评论下让我更个fold的(臭不要脸

你可能感兴趣的:(pytorch,深度学习,计算机视觉,python)