对于三维视觉而言,需要清晰了解世界坐标系和像素坐标系的对应关系。本文用python做实验。
相机内外参数的数学推导可以看我之前的博客《【AI数学】相机成像之内参数
》,《【AI数学】相机成像之外参数》。
实验开始
首先明确相机的内外参数:
intri = [[1111.0, 0.0, 400.0],
[0.0, 1111.0, 400.0],
[0.0, 0.0, 1.0]]
extri = [[-9.9990e-01, 4.1922e-03, -1.3346e-02, -5.3798e-02],
[-1.3989e-02, -2.9966e-01, 9.5394e-01, 3.8455e+00],
[-4.6566e-10, 9.5404e-01, 2.9969e-01, 1.2081e+00],
[0.0, 0.0, 0.0, 1.0]]
我们在世界坐标系上找一个点,比如p1=[0, 0, 0],我们将其投影到像素坐标系,然后观察其投影坐标。
首先,我们将其转换到相机坐标系:(我们仅用外参数extri就可以得到相机坐标系)
import numpy as np
extri = np.array(
[[-9.9990e-01, 4.1922e-03, -1.3346e-02, -5.3798e-02],
[-1.3989e-02, -2.9966e-01, 9.5394e-01, 3.8455e+00],
[-4.6566e-10, 9.5404e-01, 2.9969e-01, 1.2081e+00],
[0.0, 0.0, 0.0, 1.0]])
p1 = np.array([0, 0, 0])
R = extri[:3, :3] # rotation
T = extri[:3, -1] # trans
cam_p1 = np.dot(p1 - T, R)
print(cam_p1)
然后,我们将相机坐标系转换到像素坐标系:
intri = [[1111.0, 0.0, 400.0],
[0.0, 1111.0, 400.0],
[0.0, 0.0, 1.0]]
screen_p1 = np.array([-cam_p1[0] * intri[0][0] / cam_p1[2] + intri[0][2],
cam_p1[1] * intri[1][1] / cam_p1[2] + intri[1][2]
])
# Formula:
# x y
# x_im = f_x * --- + offset_x y_im = f_y * --- + offset_y
# z z
print(screen_p1)
# [400.00057322 400.00211168]
由此可知,世界坐标系中的原点[0, 0, 0]经过这一套相机内外参数矩阵变换后,投影到屏幕的位置是[400, 400]。我们的图像大小刚好是[800, 800],恰好是中间位置。
上面的实验改成面向对象的写法,更方便调用:
import numpy as np
class CamTrans:
def __init__(self, intri, extri, h=None, w=None):
self.intri = intri
self.extri = extri
self.h = int(2 * intri[1][2]) if h is None else h
self.w = int(2 * intri[0][2]) if w is None else w
self.R = extri[:3, :3]
self.T = extri[:3, -1]
self.focal_x = intri[0][0]
self.focal_y = intri[1][1]
self.offset_x = intri[0][2]
self.offset_y = intri[1][2]
def world2cam(self, p):
return np.dot(p - self.T, self.R)
def cam2screen(self, p):
"""
x y
x_im = f_x * --- + offset_x y_im = f_y * --- + offset_y
z z
"""
x, y, z = p
return [-x * self.focal_x / z + self.offset_x, y * self.focal_y / z + self.offset_y]
def world2screen(self, p):
return self.cam2screen(self.world2cam(p))
if __name__ == "__main__":
p = [0, 0, 1]
intri = [[1111.0, 0.0, 400.0],
[0.0, 1111.0, 400.0],
[0.0, 0.0, 1.0]]
extri = np.array(
[[-9.9990e-01, 4.1922e-03, -1.3346e-02, -5.3798e-02],
[-1.3989e-02, -2.9966e-01, 9.5394e-01, 3.8455e+00],
[-4.6566e-10, 9.5404e-01, 2.9969e-01, 1.2081e+00],
[0.0, 0.0, 0.0, 1.0]])
# another camera
extri1 = np.array([[-3.0373e-01, -8.6047e-01, 4.0907e-01, 1.6490e+00],
[ 9.5276e-01, -2.7431e-01, 1.3041e-01, 5.2569e-01],
[-7.4506e-09, 4.2936e-01, 9.0313e-01, 3.6407e+00],
[0.0, 0.0, 0.0, 1.0]])
ct = CamTrans(intri, extri)
print(ct.world2screen(p))
ct1 = CamTrans(intri, extri1)
print(ct1.world2screen(p))
通过实验发现,世界坐标系的原点在两个相机的像素坐标中都位于中心点(400, 400),由此验证,我们的坐标投影是正确的~