在项目中要用到图像转为一个个的n*n的patch块,本以为可以白嫖transformer,但妹想到它是直接nn.Conv2d(in_chans, in_chans, kernel_size=patch_size, stride=patch_size),合着是直接每个patch都投影成一个数值了
但苦手了一天,终于还是被我用unfold搞好了。
首先简单的定义x的维度为(b, c, h, w)->(1, 1, 4, 4)
x = torch.randn(1, 1, 4, 4)
x->tensor([[[
[-1.1258, -1.1524, -0.2506, -0.4339],
[ 0.8487, 0.6920, -0.3160, -2.1152],
[ 0.3223, -1.2633, 0.3500, 0.3081],
[ 0.1198, 1.2377, 1.1168, -0.2473]]]])
patch_size=n #你想设定的每个patch的大小
y = nn.Unfold(kernel_size=patch_size, stride=patch_size)(x)
# y.shape->(b, c*k*k, patch_number) 这是定义
#可以看出x中每个patch在y中都变成了一列
tensor([[
[-1.1258, -0.2506, 0.3223, 0.3500],
[-1.1524, -0.4339, -1.2633, 0.3081],
[ 0.8487, -0.3160, 0.1198, 1.1168],
[ 0.6920, -2.1152, 1.2377, -0.2473]]])
# 这里一列不好进行view操作变为patch块,所以用个转置
y = y.transpose(2,1)
tensor([[
[-1.1258, -1.1524, 0.8487, 0.6920],
[-0.2506, -0.4339, -0.3160, -2.1152],
[ 0.3223, -1.2633, 0.1198, 1.2377],
[ 0.3500, 0.3081, 1.1168, -0.2473]]])
#然后就可以用view变成patch大小
#但用了transpose或者permute之类的函数会改变内存存储位置
#存储就变得不连续了
#所以这里用到contiguous(),不然后面有些操作就会报错
#比如要fold回来就很麻烦(意思是加了contiguous()就不麻烦了)。
y = y.view(B, self.num_patches, C, \
self.patch_size[0], self.patch_size[1]).contiguous()
tensor([[[[[-1.1258, -1.1524],
[ 0.8487, 0.6920]]],
[[[-0.2506, -0.4339],
[-0.3160, -2.1152]]],
[[[ 0.3223, -1.2633],
[ 0.1198, 1.2377]]],
[[[ 0.3500, 0.3081],
[ 1.1168, -0.2473]]]]])
#完美结束,四个patch都被取出来了,如果有通道的话就是这样的
#x->
tensor([[[[-1.1258, -1.1524, -0.2506, -0.4339],
[ 0.8487, 0.6920, -0.3160, -2.1152],
[ 0.3223, -1.2633, 0.3500, 0.3081],
[ 0.1198, 1.2377, 1.1168, -0.2473]],
[[-1.3527, -1.6959, 0.5667, 0.7935],
[ 0.5988, -1.5551, -0.3414, 1.8530],
[ 0.7502, -0.5855, -0.1734, 0.1835],
[ 1.3894, 1.5863, 0.9463, -0.8437]]]])
#unfold后
tensor([[[[[-1.1258, -1.1524],
[ 0.8487, 0.6920]],
[[-1.3527, -1.6959],
[ 0.5988, -1.5551]]],
[[[-0.2506, -0.4339],
[-0.3160, -2.1152]],
[[ 0.5667, 0.7935],
[-0.3414, 1.8530]]],
[[[ 0.3223, -1.2633],
[ 0.1198, 1.2377]],
[[ 0.7502, -0.5855],
[ 1.3894, 1.5863]]],
[[[ 0.3500, 0.3081],
[ 1.1168, -0.2473]],
[[-0.1734, 0.1835],
[ 0.9463, -0.8437]]]]])
这个操作可以让每个小patch块单独进行各种操作,然后再用fold函数变为之前的维度大小,如果有需要的小伙伴可以在评论下让我更个fold的(臭不要脸