视觉初始化的过程是至关重要的,如果在刚开始不能给出很好的位姿态估计,那么也就不能对IMU的参数进行精确的标定。这里就体现了多传感器融合的思想,当一个传感器的数据具有不确定性的时候,我们需要依赖与其他传感器进行降低不确定性。同理,当相机在弱纹理或者高动态场景下工作时,预期的SLAM算法能够根据IMU的数据补偿视觉不确定性带来的精度损失。
下面,我们对VINS-Mono中的视觉初始化部分的代码进行介绍。
1. 函数入口
视觉初始化的入口在
bool Estimator::initialStructure()
2. 全局SFM
(1)初始化变量
Quaterniond Q[frame_count + 1];
Vector3d T[frame_count + 1];
map<int, Vector3d> sfm_tracked_points;
vector<SFMFeature> sfm_f;
首先,定义旋转量和平移量,建立一个map容器sfm_tracked_points用于存储后面SFM重建出来的特征点的坐标,建立一个SFMFeature类型的vector容器sfm_f,SFMFeature的数据结构如下
struct SFMFeature
{
bool state; //三角化状态
int id; //特征索引
vector<pair<int,Vector2d>> observation; //对应的归一化平面观测
double position[3]; //位置坐标
double depth; //深度
};
(2)更新sfm_f
for (auto &it_per_id : f_manager.feature)
{
int imu_j = it_per_id.start_frame - 1;
SFMFeature tmp_feature;
tmp_feature.state = false;
tmp_feature.id = it_per_id.feature_id;
for (auto &it_per_frame : it_per_id.feature_per_frame)
{
imu_j++;
Vector3d pts_j = it_per_frame.point;
tmp_feature.observation.push_back(make_pair(imu_j, Eigen::Vector2d{pts_j.x(), pts_j.y()}));
}
sfm_f.push_back(tmp_feature);
}
首先,f_manager.feature存储的是一个list,FeaturePerId的数据结构为,这里是针对每个有索引的特征对应的属性信息
class FeaturePerId
{
public:
const int feature_id; //索引
int start_frame; //首次被观测到时,该帧的索引
vector<FeaturePerFrame> feature_per_frame; //所有观测到该特征的图像帧的信息,包括图像坐标、特征点的跟踪速度、空间坐标等属性
int used_num; //该特征出现的次数
bool is_outlier; //是否是外点
bool is_margin;
double estimated_depth; //逆深度
int solve_flag; // 0 haven't solve yet; 1 solve succ; 2 solve fail;
Vector3d gt_p;
FeaturePerId(int _feature_id, int _start_frame)
: feature_id(_feature_id), start_frame(_start_frame),
used_num(0), estimated_depth(-1.0), solve_flag(0)
{
}
int endFrame();
};
接下来对于每个特征,从其imu_j = it_per_id.start_frame - 1被观测到的首帧(前一帧)开始对每帧的信息,id,归一化坐标点赋值给一个SFMFeature类型的变量,然后在存储到sfm_f中,即sfm_f存储的是每个特征对应的所有的被观测数据。
附代码中特征类的形象说明.
3. 相对位姿求解
这里是对帧间的相对位姿进行求解。
Matrix3d relative_R;
Vector3d relative_T;
int l;
if (!relativePose(relative_R, relative_T, l))
{
cout << "Not enough features or parallax; Move device around" << endl;
return false;
}
下面对函数relativePose(relative_R, relative_T, l)进行介绍
bool Estimator::relativePose(Matrix3d &relative_R, Vector3d &relative_T, int &l)
{
// find previous frame which contians enough correspondance and parallex with newest frame
for (int i = 0; i < WINDOW_SIZE; i++)
{
vector<pair<Vector3d, Vector3d>> corres;
corres = f_manager.getCorresponding(i, WINDOW_SIZE);
if (corres.size() > 20)
{
double sum_parallax = 0;
double average_parallax;
for (int j = 0; j < int(corres.size()); j++)
{
Vector2d pts_0(corres[j].first(0), corres[j].first(1));
Vector2d pts_1(corres[j].second(0), corres[j].second(1));
double parallax = (pts_0 - pts_1).norm();
sum_parallax = sum_parallax + parallax;
}
average_parallax = 1.0 * sum_parallax / int(corres.size());
if (average_parallax * 460 > 30 && m_estimator.solveRelativeRT(corres, relative_R, relative_T))
{
l = i;
//ROS_DEBUG("average_parallax %f choose l %d and newest frame to triangulate the whole structure", average_parallax * 460, l);
return true;
}
}
}
return false;
}
首先获取窗口内每一帧与最后一帧之间所有的特征对应关系。具体地,对于每个特征,如果[i,WINDOW]区间被包含在其[start_frame,endFrame()]中,那么获取i和WINDOW分别对应的观测点坐标,存储在corres中。具体实现过程如下
vector<pair<Vector3d, Vector3d>> FeatureManager::getCorresponding(int frame_count_l, int frame_count_r)
{
vector<pair<Vector3d, Vector3d>> corres;
for (auto &it : feature)
{
if (it.start_frame <= frame_count_l && it.endFrame() >= frame_count_r)
{
Vector3d a = Vector3d::Zero(), b = Vector3d::Zero();
int idx_l = frame_count_l - it.start_frame;
int idx_r = frame_count_r - it.start_frame;
a = it.feature_per_frame[idx_l].point;
b = it.feature_per_frame[idx_r].point;
corres.push_back(make_pair(a, b));
}
}
return corres;
}
得到两帧之间的特征对应corres之后,如果其size大于20,计算其平均位移,如果具有足够的视差,那么就执行m_estimator.solveRelativeRT(corres, relative_R, relative_T)求解当前帧和最后一帧之间的相对位姿,然后把当前帧的索引赋值给变量 l l l,记录的是第一个与最后一帧具有足够的共视特征以及视差的帧的索引。其中,有一个需要说明的问题,即relative_R和relative_T是相对于哪个坐标系的,具体可参考该回答。这里的R,T是在最后一帧相对于 l l l帧坐标系的位姿估计。因为在bool MotionEstimator::solveRelativeRT()函数中的最后,对R,T进行了取反,如下
int inlier_cnt = cv::recoverPose(E, ll, rr, cameraMatrix, rot, trans, mask);
//cout << "inlier_cnt " << inlier_cnt << endl;
Eigen::Matrix3d R;
Eigen::Vector3d T;
for (int i = 0; i < 3; i++)
{
T(i) = trans.at<double>(i, 0);
for (int j = 0; j < 3; j++)
R(i, j) = rot.at<double>(i, j);
}
Rotation = R.transpose();
Translation = -R.transpose() * T;
4. 全局SFM构建
下面介绍
sfm.construct(frame_count + 1, Q, T, l, relative_R, relative_T,sfm_f, sfm_tracked_points)
首先前面进行初始化的操作。
feature_num = sfm_f.size();
//cout << "set 0 and " << l << " as known " << endl;
// have relative_r relative_t
// intial two view
q[l].w() = 1;
q[l].x() = 0;
q[l].y() = 0;
q[l].z() = 0;
T[l].setZero();
q[frame_num - 1] = q[l] * Quaterniond(relative_R);
T[frame_num - 1] = relative_T;
这里即可以看出以 l l l帧的坐标系为参考系。然后,接下来
//rotate to cam frame
Matrix3d c_Rotation[frame_num];
Vector3d c_Translation[frame_num];
Quaterniond c_Quat[frame_num];
double c_rotation[frame_num][4];
double c_translation[frame_num][3];
Eigen::Matrix<double, 3, 4> Pose[frame_num];
c_Quat[l] = q[l].inverse();
c_Rotation[l] = c_Quat[l].toRotationMatrix();
c_Translation[l] = -1 * (c_Rotation[l] * T[l]);
Pose[l].block<3, 3>(0, 0) = c_Rotation[l];
Pose[l].block<3, 1>(0, 3) = c_Translation[l];
c_Quat[frame_num - 1] = q[frame_num - 1].inverse();
c_Rotation[frame_num - 1] = c_Quat[frame_num - 1].toRotationMatrix();
c_Translation[frame_num - 1] = -1 * (c_Rotation[frame_num - 1] * T[frame_num - 1]);
Pose[frame_num - 1].block<3, 3>(0, 0) = c_Rotation[frame_num - 1];
Pose[frame_num - 1].block<3, 1>(0, 3) = c_Translation[frame_num - 1];
这里全部进行了取反操作并存储到Pose中。
然后,执行
//1: trangulate between l ----- frame_num - 1
//2: solve pnp l + 1; trangulate l + 1 ------- frame_num - 1;
for (int i = l; i < frame_num - 1 ; i++)
{
// solve pnp
if (i > l)
{
Matrix3d R_initial = c_Rotation[i - 1];
Vector3d P_initial = c_Translation[i - 1];
if(!solveFrameByPnP(R_initial, P_initial, i, sfm_f))
return false;
c_Rotation[i] = R_initial;
c_Translation[i] = P_initial;
c_Quat[i] = c_Rotation[i];
Pose[i].block<3, 3>(0, 0) = c_Rotation[i];
Pose[i].block<3, 1>(0, 3) = c_Translation[i];
}
// triangulate point based on the solve pnp result
triangulateTwoFrames(i, Pose[i], frame_num - 1, Pose[frame_num - 1], sfm_f);
}
首先看下
bool GlobalSFM::solveFrameByPnP(Matrix3d &R_initial, Vector3d &P_initial, int i,
vector<SFMFeature> &sfm_f)
{
vector<cv::Point2f> pts_2_vector;
vector<cv::Point3f> pts_3_vector;
for (int j = 0; j < feature_num; j++)
{
if (sfm_f[j].state != true)
continue;
Vector2d point2d;
for (int k = 0; k < (int)sfm_f[j].observation.size(); k++)
{
if (sfm_f[j].observation[k].first == i)
{
Vector2d img_pts = sfm_f[j].observation[k].second;
cv::Point2f pts_2(img_pts(0), img_pts(1));
pts_2_vector.push_back(pts_2);
cv::Point3f pts_3(sfm_f[j].position[0], sfm_f[j].position[1], sfm_f[j].position[2]);
pts_3_vector.push_back(pts_3);
break;
}
}
}
if (int(pts_2_vector.size()) < 15)
{
printf("unstable features tracking, please slowly move you device!\n");
if (int(pts_2_vector.size()) < 10)
return false;
}
cv::Mat r, rvec, t, D, tmp_r;
cv::eigen2cv(R_initial, tmp_r);
cv::Rodrigues(tmp_r, rvec);
cv::eigen2cv(P_initial, t);
cv::Mat K = (cv::Mat_<double>(3, 3) << 1, 0, 0, 0, 1, 0, 0, 0, 1);
bool pnp_succ;
pnp_succ = cv::solvePnP(pts_3_vector, pts_2_vector, K, D, rvec, t, 1);
if(!pnp_succ)
{
return false;
}
cv::Rodrigues(rvec, r);
//cout << "r " << endl << r << endl;
MatrixXd R_pnp;
cv::cv2eigen(r, R_pnp);
MatrixXd T_pnp;
cv::cv2eigen(t, T_pnp);
R_initial = R_pnp;
P_initial = T_pnp;
return true;
}
这里是执行PNP操作。首先对于每个特征,检查其所有的被观测信息,找到特征在当前帧上的观测信息,将该观测赋值给pts_2_vector,而其三维坐标存储在pts_3_vector。如果满足条件的特征数小于15,则认为无法进行PNP操作。接下来,执行
pnp_succ = cv::solvePnP(pts_3_vector, pts_2_vector, K, D, rvec, t, 1);
这里的rvec,t是将pts_3_vector从其所在的 l l l帧相机坐标系旋转到pts_2_vector所在的坐标系。也就是当前帧在 l l l帧相机坐标系下的位姿。注意这里的sfm_f里面存储的特征点的三维坐标赋值状态的判断在于
for (int j = 0; j < feature_num; j++)
{
if (sfm_f[j].state != true)
continue;
然后,执行两帧间的三角化。
void GlobalSFM::triangulateTwoFrames(int frame0, Eigen::Matrix<double, 3, 4> &Pose0,
int frame1, Eigen::Matrix<double, 3, 4> &Pose1,
vector<SFMFeature> &sfm_f)
{
assert(frame0 != frame1);
for (int j = 0; j < feature_num; j++)
{
if (sfm_f[j].state == true)
continue;
bool has_0 = false, has_1 = false;
Vector2d point0;
Vector2d point1;
for (int k = 0; k < (int)sfm_f[j].observation.size(); k++)
{
if (sfm_f[j].observation[k].first == frame0)
{
point0 = sfm_f[j].observation[k].second;
has_0 = true;
}
if (sfm_f[j].observation[k].first == frame1)
{
point1 = sfm_f[j].observation[k].second;
has_1 = true;
}
}
if (has_0 && has_1)
{
Vector3d point_3d;
triangulatePoint(Pose0, Pose1, point0, point1, point_3d);
sfm_f[j].state = true;
sfm_f[j].position[0] = point_3d(0);
sfm_f[j].position[1] = point_3d(1);
sfm_f[j].position[2] = point_3d(2);
//cout << "trangulated : " << frame1 << " 3d point : " << j << " " << point_3d.transpose() << endl;
}
}
}
对 l − − − − − l + 1 , l + 2 , . . . , f r a m e n u m − 2 l-----l+1, l+2, ... ,framenum -2 l−−−−−l+1,l+2,...,framenum−2进行三角化
//3: triangulate l-----l+1 l+2 ... frame_num -2
for (int i = l + 1; i < frame_num - 1; i++)
triangulateTwoFrames(l, Pose[l], i, Pose[i], sfm_f);
接下来对 l − 1 , . . . , 1 l-1,...,1 l−1,...,1和 l l l进行三角化。
//4: solve pnp l-1; triangulate l-1 ----- l
// l-2 l-2 ----- l
for (int i = l - 1; i >= 0; i--)
{
//solve pnp
Matrix3d R_initial = c_Rotation[i + 1];
Vector3d P_initial = c_Translation[i + 1];
if(!solveFrameByPnP(R_initial, P_initial, i, sfm_f))
return false;
c_Rotation[i] = R_initial;
c_Translation[i] = P_initial;
c_Quat[i] = c_Rotation[i];
Pose[i].block<3, 3>(0, 0) = c_Rotation[i];
Pose[i].block<3, 1>(0, 3) = c_Translation[i];
//triangulate
triangulateTwoFrames(i, Pose[i], l, Pose[l], sfm_f);
}
然后,对其余的点进行三角化,即sfm_f[j].state == false的点。
//5: triangulate all other points
for (int j = 0; j < feature_num; j++)
{
if (sfm_f[j].state == true)
continue;
if ((int)sfm_f[j].observation.size() >= 2)
{
Vector2d point0, point1;
int frame_0 = sfm_f[j].observation[0].first;
point0 = sfm_f[j].observation[0].second;
int frame_1 = sfm_f[j].observation.back().first;
point1 = sfm_f[j].observation.back().second;
Vector3d point_3d;
triangulatePoint(Pose[frame_0], Pose[frame_1], point0, point1, point_3d);
sfm_f[j].state = true;
sfm_f[j].position[0] = point_3d(0);
sfm_f[j].position[1] = point_3d(1);
sfm_f[j].position[2] = point_3d(2);
//cout << "trangulated : " << frame_0 << " " << frame_1 << " 3d point : " << j << " " << point_3d.transpose() << endl;
}
}
接下来就是基于BA的位姿和特征点坐标的优化,代码略。再次强调,这里的Pose存储的是每一帧在 l l l帧坐标系的位姿。
if (!sfm.construct(frame_count + 1, Q, T, l,
relative_R, relative_T,
sfm_f, sfm_tracked_points))
{
cout << "global SFM failed!" << endl;
marginalization_flag = MARGIN_OLD;
return false;
}
这里的Q,T位姿是相对于 l l l帧相机坐标系的。然后接下来对所有帧进行PNP操作。因为前面只得到了滑动窗口内所有关键帧的位姿,但由于并不是第一次视觉初始化就能成功,此时图像帧数目有可能会超过滑动窗口的大小(根据图像帧的视差判断是否为关键帧,然后选择滑窗的策略),此时要对那些不被包括在滑动窗口内的图像帧位姿进行求解。
//solve pnp for all frame
map<double, ImageFrame>::iterator frame_it;
map<int, Vector3d>::iterator it;
frame_it = all_image_frame.begin();
for (int i = 0; frame_it != all_image_frame.end(); frame_it++)
{
// provide initial guess
cv::Mat r, rvec, t, D, tmp_r;
if ((frame_it->first) == Headers[i])
{
frame_it->second.is_key_frame = true;
frame_it->second.R = Q[i].toRotationMatrix() * RIC[0].transpose();
frame_it->second.T = T[i];
i++;
continue;
}
if ((frame_it->first) > Headers[i])
{
i++;
}
Matrix3d R_inital = (Q[i].inverse()).toRotationMatrix();
Vector3d P_inital = -R_inital * T[i];
cv::eigen2cv(R_inital, tmp_r);
cv::Rodrigues(tmp_r, rvec);
cv::eigen2cv(P_inital, t);
frame_it->second.is_key_frame = false;
vector<cv::Point3f> pts_3_vector;
vector<cv::Point2f> pts_2_vector;
for (auto &id_pts : frame_it->second.points)
{
int feature_id = id_pts.first;
for (auto &i_p : id_pts.second)
{
it = sfm_tracked_points.find(feature_id);
if (it != sfm_tracked_points.end())
{
Vector3d world_pts = it->second;
cv::Point3f pts_3(world_pts(0), world_pts(1), world_pts(2));
pts_3_vector.push_back(pts_3);
Vector2d img_pts = i_p.second.head<2>();
cv::Point2f pts_2(img_pts(0), img_pts(1));
pts_2_vector.push_back(pts_2);
}
}
}
cv::Mat K = (cv::Mat_<double>(3, 3) << 1, 0, 0, 0, 1, 0, 0, 0, 1);
if (pts_3_vector.size() < 6)
{
cout << "Not enough points for solve pnp pts_3_vector size " << pts_3_vector.size() << endl;
return false;
}
if (!cv::solvePnP(pts_3_vector, pts_2_vector, K, D, rvec, t, 1))
{
cout << " solve pnp fail!" << endl;
return false;
}
cv::Rodrigues(rvec, r);
MatrixXd R_pnp, tmp_R_pnp;
cv::cv2eigen(r, tmp_R_pnp);
R_pnp = tmp_R_pnp.transpose();
MatrixXd T_pnp;
cv::cv2eigen(t, T_pnp);
T_pnp = R_pnp * (-T_pnp);
frame_it->second.R = R_pnp * RIC[0].transpose();
frame_it->second.T = T_pnp;
}
注意这里
frame_it->second.R = R_pnp * RIC[0].transpose();
frame_it->second.T = T_pnp;
这里是相对于 l l l帧相机坐标系下的body坐标系的姿态。
参考主要步骤的介绍:接着由对极约束中的F矩阵恢复出R、t,主要调用方法relativePose(relative_R, relative_T, l)。relativePose方法中首先通过FeatureManeger获取(滑动窗口中)第i帧和最后一帧的特征匹配corres,当corres匹配足够大时,考察最新的keyFrame和sliding window中某个keyFrame之间有足够feature匹配和足够大的视差(id为l=i),满足这两个条件,然后这两帧之间通过五点法恢复出R,t并且三角化出3D的特征点feature point,这里是使用solveRelativeRT(corres, relative_R, relative_T),solveRelativeRT方法定义在solv_5pts.cpp类中,由对极约束中的F矩阵恢复出R、t,直接调用opencv中的方法,没什么好说的,这里值得注意的是,这种relativePose得到的位姿是第l帧的,第l帧的筛选是从第一帧开始到滑动窗口所有帧中一开始满足平均视差足够大的帧,这里的第l帧会作为参考帧到下面的全局SFM使用。到这里就已经得到图像的特征点2d坐标的提取,相机第l帧和最后一帧之间的旋转和平移(注意暂时还没有得到特征的3d点坐标),有了这些信息就可以构建全局的SFM类GlobalSFM sfm,在这里调用sfm.construct(frame_count + 1, Q, T,l,relative_R, relative_T,sfm_f, sfm_tracked_points),这里以第l帧作为参考帧,在进行PNP求解之前,需要判断当前帧数要大于第l帧,这保证了第l帧直接跳过PNP步骤,首先执行下面的第l帧和最后一帧的三角化,得到共视的特征点,供下面第l+1帧和最后一帧求解PNP,然后利用pnp求解l+1帧到最后一帧的位姿R_initial, P_initial,最后的位姿都保存在Pose中,一次循环,得到l+1,l+2…n-1帧的位姿。跳出步骤2 的循环后,至此得到了l+1,l+2…n-1帧的位姿以及l+1,l+2…帧与n-1帧的特征点三角化。然后再三角化l帧和i帧(在第l帧和最后一帧之间的帧)之间的3d坐标,接着PNP求解l-1,l-2…0帧和l帧之间的位姿已经三角化相应的特征点坐标,最后三角化其他所有的特征点。至此得到了滑动窗口中所有相机的位姿以及特征点的3d坐标。第6部就是进行BA优化,使用的是ceres优化位姿和特征点,这里可以参考视觉SLAM第十讲中的内容,优化方式相同。