pytorch 中提供了对Tensor进行Crop的方法,可以使用GPU实现。具体函数是torch.nn.functional.affine_grid
和torch.nn.functional.grid_sample
。前者用于生成二维网格,后者对输入Tensor按照网格进行双线性采样。
grid_sample
函数中将图像坐标归一化到\([-1, 1]\),其中0对应-1,width-1对应1。
affine_grid
的输入是仿射矩阵(Nx2x3)和输出Tensor的尺寸(Tensor.Size(NxHxWx2)),输出的是归一化的二维网格。
在Faster R CNN中,用到了Crop Pooling, 需要在feature map 中裁剪出与proposal region 对应的部分,可以使用这两个函数实现。具体参考 http://www.telesens.co/2018/03/11/object-detection-and-classification-using-r-cnns/#ITEM-1455-4
下面进行简单的实验:
- 首先生成一个1x1x5x5的Tensor变量
- 裁剪窗口为x1 = 2.5, x2 = 4.5, y1 = 0.5, y2 = 3.5,size为1x1x3x2,根据坐标设置theta矩阵
- 进行裁剪,并与numpy计算结果相比较。
a = torch.rand((1, 1, 5, 5))
print(a)
# x1 = 2.5, x2 = 4.5, y1 = 0.5, y2 = 3.5
# out_w = 2, out_h = 3
size = torch.Size((1, 1, 3, 2))
print(size)
# theta
theta_np = np.array([[0.5, 0, 0.75], [0, 0.75, 0]]).reshape(1, 2, 3)
theta = torch.from_numpy(theta_np)
print('theta:')
print(theta)
print()
flowfield = torch.nn.functional.affine_grid(theta, size)
sampled_a = torch.nn.functional.grid_sample(a, flowfield.to(torch.float32))
sampled_a = sampled_a.numpy().squeeze()
print('sampled_a:')
print(sampled_a)
# compute bilinear at (0.5, 2.5), using (0, 3), (0, 4), (1, 3), (1, 4)
# quickly compute(https://blog.csdn.net/lxlclzy1130/article/details/50922867)
print()
coeff = np.array([[0.5, 0.5]])
A = a[0, 0, 0:2, 2:2+2]
print('torch sampled at (0.5, 3.5): %.4f' % sampled_a[0,0])
print('numpy compute: %.4f' % np.dot(np.dot(coeff, A), coeff.T).squeeze())
可以看到结果是相同的