2021-06-23 基于AprilTag的位姿估计，原理，完整代码（相机坐标系、世界坐标系）

1.AprilTag概述

AprilTag是一种视觉基准系统，类似的视觉基准标签还有：ArUco、ARTag、TopoTag等。而AprilTag因其对旋转、光照、模糊具有良好的鲁棒性，因此被广泛应用于目标跟踪、室内定位、增强现实等领域。本文主要展示使用AprilTag标签，分别获取AprilTag标签在相机坐标系下和世界坐标系下位置。

生成AprilTag不同家族标签图像教程

2.AprilTag 相关资料

AprilTag论文三部曲：

AprilTag: A robust and flexible visual fiducial system. --2011
AprilTag 2: Efficient and robust fiducial detection. --2016
Flexible Layouts for Fiducial Tags. --2019（AprilTag3）

一些应用AprilTag的论文：

UAV Autonomous Landing Technology Based on AprilTags Vision Positioning Algorithm.
Pose Estimation for Multicopters Based on Monocular Vision and AprilTag.
Image Digital Zoom Based Single Target Apriltag Recognition Algorithm in Large Scale Changes on the Distance.

上述资料打包下载地址：https://cloud.189.cn/web/share?code=JRJRJbNfe2ay（访问码：7l96）

AprilTag论文源码地址：https://github.com/AprilRobotics/apriltag

3.AprilTag 原理

检测标签

检测线段
- 计算每个像素处的梯度大小和方向
- 利用聚类算法，将这些像素聚集成一个具有相似梯度方向和大小的分量
- 使用加权最小二乘法，每一点的权值就是该点的梯度的强度，把上述分量拟合成线段
检测矩形（基于深度优先搜索）
- 深度为1：考虑所有的线段
- 深度为2-4：考虑与上一条线段的末端足够近的线段，和服从逆时针方向的线段
单应性变换与外参估计

单应矩阵计算：

由于世界坐标和图像坐标之间的关系，能够通过一个单应矩阵H(homography)联系在一起，公式如下所示：

$s \begin{bmatrix} u\\ v\\ 1\\ \end{bmatrix} = \begin{bmatrix} h_{00} & h_{01} & h_{02} \\ h_{10} & h_{11} & h_{12} \\ h_{20} & h_{21} & h_{22} \\ \end{bmatrix}\begin{bmatrix} x_w\\ y_w\\ 1\\ \end{bmatrix} \tag{1-1}$

将上式(1-1)展开可得：

进一步变换消去深度信息s，可得:
$\begin{bmatrix} x_w & y_w & 1 & 0 & 0 & 0 &-ux_w & -uy_w & -u \\ 0 & 0 & 0 & xw & y_w & 1 & -vx_w & -vy_w & -v \\ \end{bmatrix} \begin{bmatrix} h_{00}\\h_{01}\\h_{02}\\h_{10}\\h_{11}\\h_{12}\\h_{20}\\h_{21}\\h_{22} \end{bmatrix}=0 \tag{1-3}$

最终产生一个的矩阵求解问题，论文中根据从AprilTag标签图像坐标系上提取的四个角点，和在世界坐标系中假设的四个点对应，形成四个点对，然后采用直接线性法(DLT)求解该线性方程组。

外参估计：

到此为止，我们已经得到了单应矩阵H，然而估计标签的位姿还需要额外的信息：相机内参、AprilTag标签的物理尺寸。而这两个信息都能通过事先标定得知。

因此，接下来我们把单应矩阵写成如下公式(2-1)所示的形式：

$s = 比例因子（表示投影过程重点缩放平移关系） \quad \\ P = 3\times4的相机投影矩阵（表示三维世界坐标系到二维图像坐标系的变换）$

$\begin{bmatrix} h_{00} & h_{01} & h_{02} \\ h_{10} & h_{11} & h_{12} \\ h_{20} & h_{21} & h_{22} \\ \end{bmatrix} = sPE = s\begin{bmatrix} 1/f_x & 0 & 0 & 0\\ 0 & 1/f_y & 0 & 0\\ 0 & 0 & 1 & 0\\ \end{bmatrix} \begin{bmatrix} R_{00} & R_{01} & T_{x} \\ R_{10} & R_{11} & T_{y} \\ R_{20} & R_{21} & T_{z} \\ 0 & 0 & 1 \end{bmatrix} \tag{2-1}$

将上式(1-1)展开可得：

现在我们已知了单应矩阵和相机内参，因此很容易求得和。但比例因子还很难求得。所以，我们利用约束条件：旋转矩阵中列向量的模长均为1，能够计算得到的大小；而由标签必须位于摄像机前方可以推导出约束条件比例因子必须保证，从而得到的符号。

由于旋转矩阵为正交矩阵，因此被截断的列向量可以通过其与前两组列向量的内积为求解。

至此，我们得到了从世界坐标系转换到相机坐标系的旋转矩阵和平移矩阵，即相机的外参矩阵。

解码内容

建立两种阈值模型（超过该阈值即认为”1“，少于该阈值即认为是”0“），一种基于“黑色”像素强度的空间变化，一种基于“白色“像素强度的空间变化模型：

该模型有四个参数，能用最小二乘回归求解。最后阈值大小由两种模型预测的平均值取得。根据不同黑白色块解码标签得到不同的码型，将解码得到的码型与储存在本地的码型库相对比，排除其它码型的干扰，判断是否为正确的标签。

解算位姿

现在我们已知了由世界坐标系至相机坐标系的旋转矩阵和平移矩阵，现在可以根据这两个矩阵，估计相机在世界坐标系下的位置和相机坐标系下标签的位置。

1. 相机在世界坐标系下的位置

根据相机坐标系与世界坐标系之间的关系可知：

因为相机在相机坐标系下的坐标为，因此可得：

从而得到相机在世界坐标系下的位置。

2.相机坐标系下标签的位置

因为标签中心在世界坐标系下的位置为，因此可得：

从而得到标签在相机坐标系下的位置。

4.为ApriTag论文源码 opencv_demo.cc 添加位姿估计的代码

运行环境

Ubuntu 18.04

Opencv 3.14

IDE: Clion

推荐在Linux环境下，先安装基于C++的OpenCV lib，再编译执行下列代码。

添加的改进代码如下：

首先在头文件中添加了用于计算线性代数和估计位姿的库（已注释的行）

#include 
#include "opencv2/opencv.hpp"
#include   //necessary before including the opencv eigen lib
#include  //include linear calculation lib

extern "C" {
#include "apriltag.h"
#include "apriltag_pose.h" //pose estimation lib
#include "tag36h11.h"
#include "tag25h9.h"
#include "tag16h5.h"
#include "tagCircle21h7.h"
#include "tagCircle49h12.h"
#include "tagCustom48h12.h"
#include "tagStandard41h12.h"
#include "tagStandard52h13.h"
#include "common/getopt.h"

其次，为了便于使用，添加了命令行参数代码，用于选择世界坐标系还是相机坐标系为基准

getopt_add_string(getopt, 'a', "axis", "world", "Choose axis to use (world/camera)");
const char *axisname = getopt_get_string(getopt, "axis");
unsigned int coordinate;
if(!strcmp(axisname,"world"))
    coordinate = 1;
else
    coordinate = 0;

接下来，输入相机的内参矩阵和打印出的标签大小（单位为m）

apriltag_detection_info_t info;
info.tagsize = 0.146-0.012*4;   //The size of Tag
info.det = NULL;
info.fx = 848.469;   //focal length of x
info.fy = 847.390;   //focal length of y
info.cx = cap.get(CV_CAP_PROP_FRAME_WIDTH)/2;
info.cy = cap.get(CV_CAP_PROP_FRAME_HEIGHT)/2;

根据解算单应矩阵得到的旋转矩阵和平移矩阵估计相机在世界坐标系中的位置（此处产生了一个问题：由apriltag源代码解算得到的旋转矩阵，在估计相机在世界坐标系中的位置时，不用乘以负号，与上一节公式的推导结果不同，如果谁知道原因还请赐教！）

if (coordinate) 
{
    Mat rvec(3, 3, CV_64FC1, pose.R->data); //rotation matrix
    Mat tvec(3, 1, CV_64FC1, pose.t->data); //translation matrix
    Mat Pos(3, 3, CV_64FC1);
    Pos = rvec.inv() * tvec;              //注意，此处的旋转矩阵的逆矩阵没有乘上负号
    cout << "x: " << Pos.ptr(0)[0] << endl;
    cout << "y: " << Pos.ptr(1)[0] << endl;
    cout << "z: " << Pos.ptr(2)[0] << endl;
    cout << "-----------world--------------" << endl;
}

根据解算单应矩阵得到的旋转矩阵和平移矩阵估计AprilTag标签在相机坐标系中的位置

 else
 {
     cout << "x: " << pose.t ->data[0] << endl;
     cout << "y: " << pose.t ->data[1] << endl;
     cout << "z: " << pose.t ->data[2] << endl;
     cout << "-----------camera-------------" << endl;
 }

完整代码见第六节

5.为Apriltag-cpp添加位姿估计代码

Apriltag-cpp源码的github地址：https://github.com/swatbotics/apriltags-cpp

该代码由第三方开发者用C++重构了AprilTag论文源代码。

在OpenCV环境下编译前，只需要在camtest.cpp的第283行添加如下代码，（注意这里旋转矩阵需要和公式中推导的一样，在前面添加负号），然后编译执行即可。

cv::Mat R;
cv::Rodrigues(r, R);
cv::Mat Pos(3,3,CV_64F);
Pos = -R.inv() * t; //注意此处的 -R
std::cout << "x: " << Pos.ptr(0)[0] << std::endl;
std::cout << "y: " << Pos.ptr(1)[0] << std::endl;
std::cout << "z: " << Pos.ptr(2)[0] << std::endl;
std::cout << "-----------world--------------" << std::endl;

即可解算出世界坐标系下相机的位置。如果需要计算相机坐标系下标签的位置，则与上一节中的代码同理。

6.opencv_demo.cc的完整代码：

GitHub地址：https://github.com/Reza666-cloud/apriltag-with-pose-estimation/blob/master/example/opencv_demo.cc

#include 
#include "opencv2/opencv.hpp"
#include   //necessary before including the opencv eigen lib
#include  //include linear calculation lib

extern "C" {
#include "apriltag.h"
#include "apriltag_pose.h" //pose estimation lib
#include "tag36h11.h"
#include "tag25h9.h"
#include "tag16h5.h"
#include "tagCircle21h7.h"
#include "tagCircle49h12.h"
#include "tagCustom48h12.h"
#include "tagStandard41h12.h"
#include "tagStandard52h13.h"
#include "common/getopt.h"
}

using namespace std;
using namespace cv;


int main(int argc, char *argv[])
{
    getopt_t *getopt = getopt_create();

    getopt_add_bool(getopt, 'h', "help", 0, "Show this help");
    getopt_add_bool(getopt, 'd', "debug", 0, "Enable debugging output (slow)");
    getopt_add_bool(getopt, 'q', "quiet", 0, "Reduce output");
    getopt_add_string(getopt, 'f', "family", "tag36h11", "Tag family to use");
    getopt_add_int(getopt, 't', "threads", "1", "Use this many CPU threads");
    getopt_add_double(getopt, 'x', "decimate", "2.0", "Decimate input image by this factor");
    getopt_add_double(getopt, 'b', "blur", "0.0", "Apply low-pass blur to input");
    getopt_add_bool(getopt, '0', "refine-edges", 1, "Spend more time trying to align edges of tags");
    getopt_add_string(getopt, 'a', "axis", "world", "Choose axis to use (world/camera)");


    if (!getopt_parse(getopt, argc, argv, 1) ||
            getopt_get_bool(getopt, "help")) {
        printf("Usage: %s [options]\n", argv[0]);
        getopt_do_usage(getopt);
        exit(0);
    }

    // Initialize camera
    VideoCapture cap(0);
    if (!cap.isOpened()) {
        cerr << "Couldn't open video capture device" << endl;
        return -1;
    }

    // Initialize tag detector with options
    apriltag_family_t *tf = NULL;
    const char *famname = getopt_get_string(getopt, "family");
    if (!strcmp(famname, "tag36h11")) {
        tf = tag36h11_create();
    } else if (!strcmp(famname, "tag25h9")) {
        tf = tag25h9_create();
    } else if (!strcmp(famname, "tag16h5")) {
        tf = tag16h5_create();
    } else if (!strcmp(famname, "tagCircle21h7")) {
        tf = tagCircle21h7_create();
    } else if (!strcmp(famname, "tagCircle49h12")) {
        tf = tagCircle49h12_create();
    } else if (!strcmp(famname, "tagStandard41h12")) {
        tf = tagStandard41h12_create();
    } else if (!strcmp(famname, "tagStandard52h13")) {
        tf = tagStandard52h13_create();
    } else if (!strcmp(famname, "tagCustom48h12")) {
        tf = tagCustom48h12_create();
    } else {
        printf("Unrecognized tag family name. Use e.g. \"tag36h11\".\n");
        exit(-1);
    }

    const char *axisname = getopt_get_string(getopt, "axis");
    unsigned int coordinate;
    if(!strcmp(axisname,"world"))
        coordinate = 1;
    else
        coordinate = 0;

//  input camera intrinsic matrix
    apriltag_detection_info_t info;
    info.tagsize = 0.146-0.012*4;   //The size of Tag
    info.det = NULL;
    info.fx = 848.469;   //focal length of x
    info.fy = 847.390;   //focal length of y
    info.cx = cap.get(CV_CAP_PROP_FRAME_WIDTH)/2;
    info.cy = cap.get(CV_CAP_PROP_FRAME_HEIGHT)/2;

    apriltag_detector_t *td = apriltag_detector_create();
    apriltag_detector_add_family(td, tf);
    td->quad_decimate = getopt_get_double(getopt, "decimate");
    td->quad_sigma = getopt_get_double(getopt, "blur");
    td->nthreads = getopt_get_int(getopt, "threads");
    td->debug = getopt_get_bool(getopt, "debug");
    td->refine_edges = getopt_get_bool(getopt, "refine-edges");

    Mat frame, gray;
    while (true) {
        cap >> frame;
        cvtColor(frame, gray, COLOR_BGR2GRAY);

        // Make an image_u8_t header for the Mat data
        image_u8_t im = { .width = gray.cols,
            .height = gray.rows,
            .stride = gray.cols,
            .buf = gray.data
        };

        zarray_t *detections = apriltag_detector_detect(td, &im);

        // Draw detection outlines
        for (int i = 0; i < zarray_size(detections); i++) {
            apriltag_detection_t *det;
            zarray_get(detections, i, &det);
            info.det = det;
            apriltag_pose_t pose;
            estimate_tag_pose(&info, &pose);
            //calculate cam position in world coordinate
            if (coordinate)
            {
                Mat rvec(3, 3, CV_64FC1, pose.R->data); //rotation matrix
                Mat tvec(3, 1, CV_64FC1, pose.t->data); //translation matrix
                Mat Pos(3, 3, CV_64FC1);
                Pos = rvec.inv() * tvec;
                cout << "x: " << Pos.ptr(0)[0] << endl;
                cout << "y: " << Pos.ptr(1)[0] << endl;
                cout << "z: " << Pos.ptr(2)[0] << endl;
                cout << "-----------world--------------" << endl;
            }
            else
            {
                cout << "x: " << pose.t ->data[0] << endl;
                cout << "y: " << pose.t ->data[1] << endl;
                cout << "z: " << pose.t ->data[2] << endl;
                cout << "-----------camera-------------" << endl;
            }

            //draw the line and show tag ID
            line(frame, Point(det->p[0][0], det->p[0][1]),
                     Point(det->p[1][0], det->p[1][1]),
                     Scalar(0, 0xff, 0), 2);
            line(frame, Point(det->p[0][0], det->p[0][1]),
                     Point(det->p[3][0], det->p[3][1]),
                     Scalar(0, 0, 0xff), 2);
            line(frame, Point(det->p[1][0], det->p[1][1]),
                     Point(det->p[2][0], det->p[2][1]),
                     Scalar(0xff, 0, 0), 2);
            line(frame, Point(det->p[2][0], det->p[2][1]),
                     Point(det->p[3][0], det->p[3][1]),
                     Scalar(0xff, 0, 0), 2);

            stringstream ss;
            ss << det->id;
            String text = ss.str();
            int fontface = FONT_HERSHEY_SCRIPT_SIMPLEX;
            double fontscale = 1.0;
            int baseline;
            Size textsize = getTextSize(text, fontface, fontscale, 2,
                                            &baseline);
            putText(frame, text, Point(det->c[0]-textsize.width/2,
                                       det->c[1]+textsize.height/2),
                    fontface, fontscale, Scalar(0xff, 0x99, 0), 2);
        }
        apriltag_detections_destroy(detections);

        imshow("Tag Detections", frame);
        if (waitKey(30) >= 0)
            break;
    }

    apriltag_detector_destroy(td);

    if (!strcmp(famname, "tag36h11")) {
        tag36h11_destroy(tf);
    } else if (!strcmp(famname, "tag25h9")) {
        tag25h9_destroy(tf);
    } else if (!strcmp(famname, "tag16h5")) {
        tag16h5_destroy(tf);
    } else if (!strcmp(famname, "tagCircle21h7")) {
        tagCircle21h7_destroy(tf);
    } else if (!strcmp(famname, "tagCircle49h12")) {
        tagCircle49h12_destroy(tf);
    } else if (!strcmp(famname, "tagStandard41h12")) {
        tagStandard41h12_destroy(tf);
    } else if (!strcmp(famname, "tagStandard52h13")) {
        tagStandard52h13_destroy(tf);
    } else if (!strcmp(famname, "tagCustom48h12")) {
        tagCustom48h12_destroy(tf);
    }

    getopt_destroy(getopt);
    return 0;
}

参考资料：

三维空间坐标系转换：https://blog.csdn.net/fireflychh/article/details/82352710

相机位姿估计：https://www.cnblogs.com/singlex/p/pose_estimation_0.html

AprilTag原理：https://www.cnblogs.com/brt2/p/13547075.html

https://zhuanlan.zhihu.com/p/53367734