Stefanie_Song

KinectFusion实战笔记

KinectFusion算法实战笔记

- 引言
- 阅读材料
- 重建数据集
- 算法主要步骤
- - 重要概念
- 实战代码解读
- - 一些先验知识
  - 数据预处理
  - 算法主函数
- 数据集的处理
- 体素的融合
- 相机的追踪
- 本文内容总结

引言

KinectFusion是RGBD SLAM重建技术的开山鼻祖，是三维重建过程中的经典算法。本文是对KinectFusion实践过程中的一次梳理，希望能够督促我快点完成代码的应用。水平不高，但求学习，通过记录的方式让自己静下心来梳理。

阅读材料

代码参考：

python版本：https://github.com/JingwenWang95/KinectFusion
C++版本：https://github.com/Nerei/kinfu_remake
先验知识：

RGBD图像，网格重建，四元数，相机模型

opencv关于相机校准和重建的文档：https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#decomposeprojectionmatrix

重建数据集

常用的RGBD SLAM重建数据集是TUM的RGBD数据集： https://vision.in.tum.de/data/datasets/rgbd-dataset/download

该数据集中包含PNG格式的RBG彩色图和Depth灰度图对。其中，深度图按照5000倍的比例缩放，深度图中像素值5000代表距离相机1米，10000代表距离相机2米。像素值为0则表示缺失值，即此处没有数据。rgb.txt和depth.txt记录的每张图片的采集时间和图片名称（图片名称也是以时间来命名的）。由于RGB相机和深度传感器的差异，几乎没有对应的RGB图像和深度图是同一个时刻采集的。在处理图像时，需要一步预处理，找到rgb和depth图像的一一对应关系。

除了重建必要的深度图和彩色图之外，TUM数据集也包含相机轨迹的groundtruth，即在一个固定坐标系下，每一帧图像拍摄时相机的位置和方向坐标。在groundtruth.txt中，timestamp代表unix纪元下的时间， $t_x, t_y,t_z$ 代表了相机的光学中心相对于运动捕捉系统所定义的世界原点的位置。 $q_x, q_y, q_z, q_w$ 是以四元数的形式给出的相机的光学中心相对于世界原点的方向。

在实战过程中，相机焦距 $f_x$ , $f_y$ 和相机光学中心 $c_x$ , $c_y$ 在关联深度和RGB图中至关重要。相机参数矩阵为：
$\begin{bmatrix} f_x & 0&c_x \\ 0 & f_y&c_y\\0&0&1 \end{bmatrix} \quad$
TUM数据集中device共有三组，分别对应 $f r 1, f r 2, f r 3$ 三组相机参数。

对于初学者（比如我）而言，要了解程序的各个步骤是个万丈高楼平地起的过程。具体应用过程会出现这样那样的不理解，看CSDN和博客园是很好的方式。关于TUM数据集，推荐半闲居士: 一起做RGB-D SLAM。

算法主要步骤

重要概念

TSDF： TSDF可以译作截断符号距离函数，其是一种网格式的地图先选定要建模的三维空间，比如3×3×3 m^3 那么大，按照一定分辨率，将这个空间分成许多小块，存储每个小块内部的信息。TSDF地图整个存储在显存当中而不是内存中，由于每个体素的计算互相并不干扰，所以可以利用GPU的并行特性，并行地对对每个体素进行计算和更新（此处参考https://blog.csdn.net/qinqinxiansheng/article/details/119449196）。

图像金字塔：

RGBD SLAM算法的核心是使用多帧RGB-D图像从相机拍摄的图像中重建出三维场景。

原文中的算法流程图如下所示：

深度图转换（Depth Map Conversion）

深度图本质上是2.5D的信息，由彩色图（像素点由u = (x, y)表示）和深度图(包含深度信息D)组成。采集深度图的过程，是世界坐标到相机坐标的一个转换过程。算法根据采得的深度信息，将RGBD图转为点云，随后根据点云信息计算出顶点图和法线图。

Depth Map 深度图
|— Vertex Map 顶点图
|— Normal Map 法线图

相机位姿估计

相机姿态估计是重建过程中至关重要的一步。本文中使用ICP算法用于不同帧图像间的注册，通过寻找不同帧彩色图点与点之间的相关性估计相机位姿 $T_{g,k}$
体块集成 (Volumetric Integration)

体块集成的过程中用到了TSDF（volumetric truncated signed distance）技术。作者先设置一个很大的三维空间用于存放重建场景，场景中的内容是不同帧获取的空间切片的叠加，体块内的元素随着深度图的读取逐步更新。
射线投射 (Raycasting)
用于重建物体表面的绘制。

实战代码解读

KinectFusion的运行过程需要GPU加速，Jingwen版本的KinectFusion运行使用了pytorch框架而不是直接使用CUDA。

一些先验知识

深度图和RGB图像关联
在实战过程中，相机焦距 $f_x$ , $f_y$ 和相机光学中心 $c_x$ , $c_y$ 在关联深度和RGB图中至关重要。
相机参数矩阵为：
$\begin{bmatrix} f_x & 0&c_x \\ 0 & f_y&c_y\\0&0&1 \end{bmatrix} \quad$
估计相机轨迹
在估计相机轨迹的过程中有两种衡量误差的方式。一种是绝对轨迹误差ATE，此类方法适合估计视觉SLAM系统的性能。另一种是相对位姿误差RPE，此类方法适合测量视觉里程计系统的漂移。表示相机位置有七个变量，分别是位置参数 $t_x, t_y, t_z$ 和方向参数 $q_x, q_y, q_z, q_w$ ，前者代表相机光学中心在空间中的位置，后者代表相机光学中心的方向。

数据预处理

在正式执行重建算法之前，作者进行了一些数据集的预处理。preprocess.py在重建程序执行前使用的，目的是关联RGBD文件的名称，保存成更易处理的格式。
preprocess.py

import os
import math
import shutil
import numpy as np
import argparse
from tum_rgbd import get_calib
from utils import load_config

def read_file_list(filename):
    """
    从txt文件中读取轨迹，此处遵循的是tum中相机trajectory的数据格式
    Reads a trajectory from a text file.
    File format:
    The file format is "stamp d1 d2 d3 ...", where stamp denotes the time stamp (to be matched)
    and "d1 d2 d3.." is arbitary data (e.g., a 3D position and 3D orientation) associated to this timestamp.
    Input:
    filename -- File name
    Output:
    dict -- dictionary of (stamp,data) tuples
    """
    
    file = open(filename)
    data = file.read()
    lines = data.replace(","," ").replace("\t"," ").split("\n")
    list = [[v.strip() for v in line.split(" ") if v.strip()!=""] for line in lines if len(line)>0 and line[0]!="#"]
    list = [(float(l[0]),l[1:]) for l in list if len(l)>1]
    return dict(list)

# 深度相机和色彩相机获取图像的时间帧存在时间差，需要对齐。最大时间差是0.02s
# 把拍摄时间相近的RGB图片和Depth图片，写在associate.txt文件的同一行，这就表示，两个图是一对 RGB-D图片。
def associate(first_list, second_list, offset=0.0, max_difference=0.02):
    """
    Associate two dictionaries of (stamp,data). As the time stamps never match exactly, we aim
    to find the closest match for every input tuple.
    Input:
    first_list -- first dictionary of (stamp,data) tuples
    second_list -- second dictionary of (stamp,data) tuples
    offset -- time offset between both dictionaries (e.g., to models the delay between the sensors)
    max_difference -- search radius for candidate generation
    Output:
    matches -- list of matched tuples ((stamp1,data1),(stamp2,data2))
    """
    first_keys = list(first_list)
    second_keys = list(second_list)
    potential_matches = [(abs(a - (b + offset)), a, b)
                         for a in first_keys
                         for b in second_keys
                         if abs(a - (b + offset)) < max_difference]
    potential_matches.sort()
    matches = []
    for diff, a, b in potential_matches:
        if a in first_keys and b in second_keys:
            first_keys.remove(a)
            second_keys.remove(b)
            matches.append((a, b))

    matches.sort()
    return matches

# 从输入的文件中得到匹配的文件对，并保存
def get_association(file_a, file_b, out_file):
    first_list = read_file_list(file_a)
    second_list = read_file_list(file_b)
    matches = associate(first_list, second_list)
    with open(out_file, "w") as f:
        for a, b in matches:
            line = "%f %s %f %s\n" % (a, " ".join(first_list[a]), b, " ".join(second_list[b]))
            f.write(line)
            
# 此处是相机旋转角度信息，从四元数到齐次矩阵的转换
def tum2matrix(pose):
    """
    Return homogeneous rotation matrix from quaternion.
    """
    # 只截取前3个位置坐标
    t = pose[:3]
    # under TUM format q is in the order of [x, y, z, w], need change to [w, x, y, z]
    quaternion = [pose[6], pose[3], pose[4], pose[5]]
    q = np.array(quaternion, dtype=np.float64, copy=True)
    n = np.dot(q, q)
    if n < np.finfo(np.float64).eps:
        return np.identity(4)

    q *= math.sqrt(2.0 / n)
    q = np.outer(q, q)
    return np.array([
        [1.0-q[2, 2]-q[3, 3],     q[1, 2]-q[3, 0],     q[1, 3]+q[2, 0], t[0]],
        [    q[1, 2]+q[3, 0], 1.0-q[1, 1]-q[3, 3],     q[2, 3]-q[1, 0], t[1]],
        [    q[1, 3]-q[2, 0],     q[2, 3]+q[1, 0], 1.0-q[1, 1]-q[2, 2], t[2]],
        [0., 0., 0., 1.]])

# 从association中得到相机位姿
def get_poses_from_associations(fname):
    poses = []
    with open(fname) as f:
        for line in f.readlines():
            pose_str = line.strip("\n").split(" ")[-7:]
            pose = [float(p) for p in pose_str]
            poses += [tum2matrix(pose)]
    return poses

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    # 输入待处理的数据集，并保存在processed文件夹里面
    # standard configs
    parser.add_argument('--config', type=str, default="../configs/fr1_desk.yaml", help='Path to config file.')
    args = load_config(parser.parse_args())
    out_dir = os.path.join(args.data_root, "processed")

    # create association files
    get_association(os.path.join(args.data_root, "depth.txt"), os.path.join(args.data_root, "groundtruth.txt"), os.path.join(args.data_root, "dep_traj.txt"))
    get_association(os.path.join(args.data_root, "rgb.txt"), os.path.join(args.data_root, "dep_traj.txt"), os.path.join(args.data_root, "rgb_dep_traj.txt"))

    if not os.path.exists(out_dir):
        os.makedirs(out_dir)
    out_rgb_dir = os.path.join(out_dir, "rgb")
    if not os.path.exists(out_rgb_dir):
        os.makedirs(out_rgb_dir)
    out_dep_dir = os.path.join(out_dir, "depth")
    if not os.path.exists(out_dep_dir):
        os.makedirs(out_dep_dir)

    # rename image files and save c2w poses
    # 此处将图像文件编码之后重新储存
    poses = []
    with open(os.path.join(args.data_root, "rgb_dep_traj.txt")) as f:
        for i, line in enumerate(f.readlines()):
            line_list = line.strip().split(" ")
            rgb_file = line_list[1]
            shutil.copyfile(os.path.join(args.data_root, rgb_file), os.path.join(out_rgb_dir, "%04d.png" % i))
            dep_file = line_list[3]
            shutil.copyfile(os.path.join(args.data_root, dep_file), os.path.join(out_dep_dir, "%04d.png" % i))
            poses += [tum2matrix([float(x) for x in line_list[5:]])]

    np.savez(os.path.join(out_dir, "raw_poses.npz"), c2w_mats=poses)

    # save projection matrices
    # 保存投影矩阵
    K = np.eye(3)
    intri = get_calib()[args.data_type] # 此处得到相机内参
    K[0, 0] = intri[0]
    K[1, 1] = intri[1]
    K[0, 2] = intri[2]
    K[1, 2] = intri[3]
    camera_dict = np.load(os.path.join(out_dir, "raw_poses.npz"))
    poses = camera_dict["c2w_mats"]
    P_mats = []
    for c2w in poses:
        w2c = np.linalg.inv(c2w)
        P = K @ w2c[:3, :]
        P_mats += [P]
    np.savez(os.path.join(out_dir, "cameras.npz"), world_mats=P_mats)

算法主函数

首先从main函数开始，kinfu.py
此段是代码执行的主函数，输入RGBD数据集，重建为网格并保存为.ply文件。

import os
import argparse # 用于解析命令行参数和选项
import numpy as np
import torch
import cv2
import trimesh # 用于绘制网格
from matplotlib import pyplot as plt
from fusion import TSDFVolumeTorch # 外部函数，用于Volumetric Integration
from dataset.tum_rgbd import TUMDataset, TUMDatasetOnline # 用来处理RGBD数据集
from tracker import ICPTracker # 用来追踪相机位姿（相机位姿是通过RGB图像估计的）
from utils import load_config, get_volume_setting, get_time #（用来获取重建参数）


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    # standard configs
    # 执行命令时使用，确定数据集和重建结果保存的路径
    parser.add_argument('--config', type=str, default="configs/fr1_desk.yaml", help='Path to config file.')
    parser.add_argument("--save_dir", type=str, default=None, help="Directory of saving results.")
    args = load_config(parser.parse_args())

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    # device = torch.device("cpu")
   
    # 处理数据集，此处用到了tum_rgbd中的函数
    # 此处获得了相机到世界的转换矩阵，相机内参，rgb图和深度图
    dataset = TUMDataset(os.path.join(args.data_root), device, near=args.near, far=args.far, img_scale=0.25)
    # 此处获得了图像的高和宽
    H, W = dataset.H, dataset.W
	
	# 重建的体空间在config文件中有定义
    vol_dims, vol_origin, voxel_size = get_volume_setting(args)
    # 不同帧体素中体素的融合，信息的动态合并过程，由面到体，此处调用了GPU加速
    tsdf_volume = TSDFVolumeTorch(vol_dims, vol_origin, voxel_size, device, margin=3, fuse_color=True)
    # 根据图像特征追踪相机位姿
    icp_tracker = ICPTracker(args, device)
    
    # 时间，位姿，位姿变换
    t, poses, poses_gt = list(), list(), list()
    curr_pose, depth1, color1 = None, None, None
    for i in range(0, len(dataset), 1):
        t0 = get_time()
        sample = dataset[i]
        color0, depth0, pose_gt, K = sample  # use live image as template image (0)
        # depth0[depth0 <= 0.5] = 0.

	   # 初始化位姿
        if i == 0:  # initialize
            curr_pose = pose_gt
        else:  # tracking
            # 1. render depth image (1) from tsdf volume
            depth1, color1, vertex01, normal1, mask1 = tsdf_volume.render_model(curr_pose, K, H, W, near=args.near, far=args.far, n_samples=args.n_steps)
            # depth0和depth1是什么？
            T10 = icp_tracker(depth0, depth1, K)  # transform from 0 to 1
            curr_pose = curr_pose @ T10

        # fusion
        # 此处将不同帧所得深度图像进行融合
        tsdf_volume.integrate(depth0,
                              K,
                              curr_pose,
                              obs_weight=1.,
                              color_img=color0
                              )
        t1 = get_time()
        t += [t1 - t0]
        print("processed frame: {:d}, time taken: {:f}s".format(i, t1 - t0))
        poses += [curr_pose.cpu().numpy()]
        poses_gt += [pose_gt.cpu().numpy()]

    avg_time = np.array(t).mean()
    print("average processing time: {:f}s per frame, i.e. {:f} fps".format(avg_time, 1. / avg_time))
    
    # compute tracking ATE
    # 本文使用ATE来估计追踪效果
    poses_gt = np.stack(poses_gt, 0)
    poses = np.stack(poses, 0)
    traj_gt = np.array(poses_gt)[:, :3, 3]
    traj = np.array(poses)[:, :3, 3]
    rmse = np.sqrt(np.mean(np.linalg.norm(traj_gt - traj, axis=-1) ** 2))
    print("RMSE: {:f}".format(rmse))
    # plt.plot(traj[:, 0], traj[:, 1])
    # plt.plot(traj_gt[:, 0], traj_gt[:, 1])
    # plt.legend(['Estimated', 'GT'])
    # plt.show()

    # save results
    if args.save_dir is not None:
        if not os.path.exists(args.save_dir):
            os.makedirs(args.save_dir)
        
        # 绘制重建网格
        verts, faces, norms, colors = tsdf_volume.get_mesh()
        partial_tsdf = trimesh.Trimesh(vertices=verts, faces=faces, vertex_normals=norms, vertex_colors=colors)
        partial_tsdf.export(os.path.join(args.save_dir, "mesh.ply"))
        np.savez(os.path.join(args.save_dir, "traj.npz"), poses=poses)
        np.savez(os.path.join(args.save_dir, "traj_gt.npz"), poses=poses_gt)

数据集的处理

tum_rgbd.py
此代码主要是为了在程序执行过程中获得RGB图，深度图，相机位置转换矩阵和相机内参四个数组：rgb, depth, c2w, K。

import torch
from os import path
from tqdm import tqdm
import imageio
import cv2
import numpy as np
import open3d as o3d


def get_calib(): # TUM数据集有三组相机。此处用于获取拍摄图像时相机的参数，fx, fy, cx, cy。默认每个相机的焦距和光学中心的位置是固定的。
    return {
        "fr1": [517.306408, 516.469215, 318.643040, 255.313989],
        "fr2": [520.908620, 521.007327, 325.141442, 249.701764],
        "fr3": [535.4, 539.2, 320.1, 247.6]
    }


# Note,this step converts w2c (Tcw) to c2w (Twc)
# 从世界到相机转换矩阵，化为相机到世界转换矩阵
# 获取相机内参和旋转、平移矩阵
def load_K_Rt_from_P(P):
    """
    modified from IDR https://github.com/lioryariv/idr
    """
    # 此处参见opencv官方文件
    out = cv2.decomposeProjectionMatrix(P)
    K = out[0]
    R = out[1]
    t = out[2]

    # 相机参数归一化
    K = K/K[2,2]
    intrinsics = np.eye(4)
    intrinsics[:3, :3] = K

    pose = np.eye(4, dtype=np.float32)
    pose[:3, :3] = R.transpose()  # convert from w2c to c2w
    pose[:3, 3] = (t[:3] / t[3])[:, 0]

    return intrinsics, pose

# TUM数据集的数据类型
class TUMDataset(torch.utils.data.Dataset):
    """
    TUM dataset loader, pre-load images in advance
    """
    # 此处定义了重建参数
    def __init__(
            self,
            rootdir,
            device,
            near: float = 0.2, #设置最近点的范围
            far: float = 5., #最远点
            img_scale: float = 1.,  # image scale factor
            start: int = -1,
            end: int = -1,
    ):
        super().__init__()
        assert path.isdir(rootdir), f"'{rootdir}' is not a directory"
        self.device = device
        self.c2w_all = [] # 相机到世界坐标系的变换矩阵
        self.K_all = [] # 相机内参
        self.rgb_all = [] # rgb图
        self.depth_all = [] # 深度图

        # root should be tum_sequence
        data_path = path.join(rootdir, "processed") # 重新整理后的RGBD数据集
        cam_file = path.join(data_path, "cameras.npz") # 相机位置数组
        print("LOAD DATA", data_path)

        # world_mats, normalize_mat
        cam_dict = np.load(cam_file) # 相机位置
        world_mats = cam_dict["world_mats"]  # K @ w2c # 世界坐标

        d_min = [] # 最小深度
        d_max = [] # 最大深度
        
        # TUM saves camera poses in OpenCV convention
        # tqdm 提示进度信息
        for i, world_mat in enumerate(tqdm(world_mats)):
            # ignore all the frames betfore
            if start > 0 and i < start:
                continue
            # ignore all the frames after
            if 0 < end < i:
                break

            intrinsics, c2w = load_K_Rt_from_P(world_mat)
            c2w = torch.tensor(c2w, dtype=torch.float32)
            # read images
            # 提取深度图和彩色图数据，保存为数组
            rgb = np.array(imageio.imread(path.join(data_path, "rgb/{:04d}.png".format(i)))).astype(np.float32)
            depth = np.array(imageio.imread(path.join(data_path, "depth/{:04d}.png".format(i)))).astype(np.float32)
            depth /= 5000.  # TODO: put depth factor to args
            d_max += [depth.max()]
            d_min += [depth.min()]
            # depth = cv2.bilateralFilter(depth, 5, 0.2, 15)
            # print(depth[depth > 0.].min())
            # 超出距离范围的都定义为无效点
            invalid = (depth < near) | (depth > far)
            depth[invalid] = -1.
            # downscale the image size if needed
            if img_scale < 1.0:
                full_size = list(rgb.shape[:2])
                rsz_h, rsz_w = [round(hw * img_scale) for hw in full_size]
                # TODO: figure out which way is better: skimage.rescale or cv2.resize
                rgb = cv2.resize(rgb, (rsz_w, rsz_h), interpolation=cv2.INTER_AREA)
                depth = cv2.resize(depth, (rsz_w, rsz_h), interpolation=cv2.INTER_NEAREST)
                intrinsics[0, 0] *= img_scale
                intrinsics[1, 1] *= img_scale
                intrinsics[0, 2] *= img_scale
                intrinsics[1, 2] *= img_scale

            self.c2w_all.append(c2w)
            self.K_all.append(torch.from_numpy(intrinsics[:3, :3]))
            self.rgb_all.append(torch.from_numpy(rgb))
            self.depth_all.append(torch.from_numpy(depth))
        print("Depth min: {:f}".format(np.array(d_min).min()))
        print("Depth max: {:f}".format(np.array(d_max).max()))
        self.n_images = len(self.rgb_all)
        self.H, self.W, _ = self.rgb_all[0].shape

    def __len__(self):
        return self.n_images

    def __getitem__(self, idx):
        return self.rgb_all[idx].to(self.device), self.depth_all[idx].to(self.device), \
               self.c2w_all[idx].to(self.device), self.K_all[idx].to(self.device)


class TUMDatasetOnline(torch.utils.data.Dataset):
    """
    Online TUM dataset loader, load images when __getitem__() is called
    """

    def __init__(
            self,
            rootdir,
            device,
            near: float = 0.2,
            far: float = 5.,
            img_scale: float = 1.,  # image scale factor
            start: int = -1,
            end: int = -1,
    ):
        super().__init__()
        assert path.isdir(rootdir), f"'{rootdir}' is not a directory"
        self.device = device
        self.img_scale = img_scale
        self.near = near
        self.far = far
        self.c2w_all = []
        self.K_all = []
        self.rgb_files_all = []
        self.depth_files_all = []

        # root should be tum_sequence
        data_path = path.join(rootdir, "processed")
        cam_file = path.join(data_path, "cameras.npz")
        print("LOAD DATA", data_path)

        # world_mats, normalize_mat
        cam_dict = np.load(cam_file)
        world_mats = cam_dict["world_mats"]  # K @ w2c

        # TUM saves camera poses in OpenCV convention
        for i, world_mat in enumerate(world_mats):
            # ignore all the frames betfore
            if start > 0 and i < start:
                continue
            # ignore all the frames after
            if 0 < end < i:
                break

            intrinsics, c2w = load_K_Rt_from_P(world_mat)
            c2w = torch.tensor(c2w, dtype=torch.float32)
            self.c2w_all.append(c2w)
            self.K_all.append(torch.from_numpy(intrinsics[:3, :3]))
            self.rgb_files_all.append(path.join(data_path, "rgb/{:04d}.png".format(i)))
            self.depth_files_all.append(path.join(data_path, "depth/{:04d}.png".format(i)))

        self.n_images = len(self.rgb_files_all)
        H, W, _ = np.array(imageio.imread(self.rgb_files_all[0])).shape
        self.H = round(H * img_scale)
        self.W = round(W * img_scale)

    def __len__(self):
        return self.n_images

    def __getitem__(self, idx):
        K = self.K_all[idx].to(self.device)
        c2w = self.c2w_all[idx].to(self.device)
        # read images
        rgb = np.array(imageio.imread(self.rgb_files_all[idx])).astype(np.float32)
        depth = np.array(imageio.imread(self.depth_files_all[idx])).astype(np.float32)
        depth /= 5000.
        # depth = cv2.bilateralFilter(depth, 5, 0.2, 15)
        depth[depth < self.near] = 0.
        depth[depth > self.far] = -1.
        # downscale the image size if needed
        if self.img_scale < 1.0:
            full_size = list(rgb.shape[:2])
            rsz_h, rsz_w = [round(hw * self.img_scale) for hw in full_size]
            rgb = cv2.resize(rgb, (rsz_w, rsz_h), interpolation=cv2.INTER_AREA)
            depth = cv2.resize(depth, (rsz_w, rsz_h), interpolation=cv2.INTER_NEAREST)
            K[0, 0] *= self.img_scale
            K[1, 1] *= self.img_scale
            K[0, 2] *= self.img_scale
            K[1, 2] *= self.img_scale

        rgb = torch.from_numpy(rgb).to(self.device)
        depth = torch.from_numpy(depth).to(self.device)

        return rgb, depth, c2w, K

体素的融合

fusion.py
此处还涉及图像金字塔的实现。

import os
import numpy as np
from skimage import measure
import torch
import cv2
import open3d as o3d
import imageio


def integrate(
        depth_im,
        cam_intr,
        cam_pose,
        obs_weight, # 权重，在KinectFusion算法有特殊的权重设置
        world_c,  # world coordinates grid [nx*ny*nz, 4]
        vox_coords,  # voxel coordinates grid [nx*ny*nz, 3]
        weight_vol,  # weight volume [nx, ny, nz]
        tsdf_vol,  # tsdf volume [nx, ny, nz]
        sdf_trunc,
        im_h,
        im_w,
        color_vol=None,
        color_im=None,
):

    world2cam = torch.inverse(cam_pose) # cam_pose中记录相机坐标到相机世界坐标的转换关系，现在进行转置，获得相机到世界的转换关系
    cam_c = torch.matmul(world2cam, world_c.transpose(1, 0)).transpose(1, 0).float()  # [nx*ny*nz, 4]，连乘，此处获得了相机的位置
    # Convert camera coordinates to pixel coordinates，将相机坐标转化为像素坐标
    fx, fy = cam_intr[0, 0], cam_intr[1, 1]
    cx, cy = cam_intr[0, 2], cam_intr[1, 2]
    pix_z = cam_c[:, 2]
    # project all the voxels back to image plane，将所有体素投射至成像面
    pix_x = torch.round((cam_c[:, 0] * fx / cam_c[:, 2]) + cx).long()  # [nx*ny*nz]
    pix_y = torch.round((cam_c[:, 1] * fy / cam_c[:, 2]) + cy).long()  # [nx*ny*nz]

    # Eliminate pixels outside view frustum，消除视锥之外的像素
    valid_pix = (pix_x >= 0) & (pix_x < im_w) & (pix_y >= 0) & (pix_y < im_h) & (pix_z > 0)  # [n_valid]
    valid_vox_x = vox_coords[valid_pix, 0]
    valid_vox_y = vox_coords[valid_pix, 1]
    valid_vox_z = vox_coords[valid_pix, 2]
    depth_val = depth_im[pix_y[valid_pix], pix_x[valid_pix]]  # [n_valid]

    # Integrate tsdf
    depth_diff = depth_val - pix_z[valid_pix]
    dist = torch.clamp(depth_diff / sdf_trunc, max=1)
    valid_pts = (depth_val > 0.) & (depth_diff >= -sdf_trunc)  # all points 1. inside frustum 2. with valid depth 3. outside -truncate_dist
    valid_vox_x = valid_vox_x[valid_pts]
    valid_vox_y = valid_vox_y[valid_pts]
    valid_vox_z = valid_vox_z[valid_pts]
    valid_dist = dist[valid_pts]
    w_old = weight_vol[valid_vox_x, valid_vox_y, valid_vox_z]
    tsdf_vals = tsdf_vol[valid_vox_x, valid_vox_y, valid_vox_z]
    w_new = w_old + obs_weight
    tsdf_vol[valid_vox_x, valid_vox_y, valid_vox_z] = (w_old * tsdf_vals + obs_weight * valid_dist) / w_new
    weight_vol[valid_vox_x, valid_vox_y, valid_vox_z] = w_new

    if color_vol is not None and color_im is not None:
        old_color = color_vol[valid_vox_x, valid_vox_y, valid_vox_z]
        new_color = color_im[pix_y[valid_pix], pix_x[valid_pix]]
        new_color = new_color[valid_pts]
        color_vol[valid_vox_x, valid_vox_y, valid_vox_z, :] = (w_old[:, None] * old_color + obs_weight * new_color) / w_new[:, None]

    return weight_vol, tsdf_vol, color_vol


class TSDFVolumeTorch:
    """
    Volumetric TSDF Fusion of RGB-D Images.
    """

    def __init__(self, voxel_dim, origin, voxel_size, device, margin=3, fuse_color=False):
        """
        Args:
            voxel_dim (ndarray): [3,] stores volume dimensions: Nx, Ny, Nz
            origin (ndarray): [3,] world coordinate of voxel [0, 0, 0]
            voxel_size (float): The volume discretization in meters.
        """

        self.device = device
        # Define voxel volume parameters
        self.voxel_size = float(voxel_size)
        self.sdf_trunc = margin * self.voxel_size
        self.integrate_func = integrate
        self.fuse_color = fuse_color

        # Adjust volume bounds
        if isinstance(voxel_dim, list):
            voxel_dim = torch.Tensor(voxel_dim).to(self.device)
        elif isinstance(voxel_dim, np.ndarray):
            voxel_dim = torch.from_numpy(voxel_dim).to(self.device)
        if isinstance(origin, list):
            origin = torch.Tensor(origin).to(self.device)
        elif isinstance(origin, np.ndarray):
            origin = torch.from_numpy(origin).to(self.device)

        self.vol_dim = voxel_dim.long()
        self.vol_origin = origin
        self.num_voxels = torch.prod(self.vol_dim).item()

        # Get voxel grid coordinates
        xv, yv, zv = torch.meshgrid(
            torch.arange(0, self.vol_dim[0]),
            torch.arange(0, self.vol_dim[1]),
            torch.arange(0, self.vol_dim[2]),
        )
        self.vox_coords = torch.stack([xv.flatten(), yv.flatten(), zv.flatten()], dim=1).long().to(self.device)

        # Convert voxel coordinates to world coordinates
        self.world_c = self.vol_origin + (self.voxel_size * self.vox_coords)
        self.world_c = torch.cat([
            self.world_c, torch.ones(len(self.world_c), 1, device=self.device)], dim=1).float()
        self.reset()

    def reset(self):
        """Set volumes
        """
        self.tsdf_vol = torch.ones(*self.vol_dim).to(self.device)
        self.weight_vol = torch.zeros(*self.vol_dim).to(self.device)
        if self.fuse_color:
            # [nx, ny, nz, 3]
            self.color_vol = torch.zeros(*self.vol_dim, 3).to(self.device)
        else:
            self.color_vol = None

    def data_transfer(self, data):
        if isinstance(data, np.ndarray):
            data = torch.from_numpy(data)
        return data.float().to(self.device)

    @torch.no_grad()
    def integrate(self, depth_im, cam_intr, cam_pose, obs_weight, color_img=None):
        """Integrate an RGB-D frame into the TSDF volume.
        Args:
        depth_im (torch.Tensor): A depth image of shape (H, W).
        cam_intr (torch.Tensor): The camera intrinsics matrix of shape (3, 3).
        cam_pose (torch.Tensor): The camera pose (i.e. extrinsics) of shape (4, 4). T_wc
        obs_weight (float): The weight to assign to the current observation.
        """

        cam_pose = self.data_transfer(cam_pose)
        cam_intr = self.data_transfer(cam_intr)
        depth_im = self.data_transfer(depth_im)
        if color_img is not None:
            color_img = self.data_transfer(color_img)
        else:
            color_img = None
        im_h, im_w = depth_im.shape
        # fuse
        weight_vol, tsdf_vol, color_vol = self.integrate_func(
            depth_im,
            cam_intr,
            cam_pose,
            obs_weight,
            self.world_c,
            self.vox_coords,
            self.weight_vol,
            self.tsdf_vol,
            self.sdf_trunc,
            im_h, im_w,
            self.color_vol,
            color_img,
        )
        self.weight_vol = weight_vol
        self.tsdf_vol = tsdf_vol
        self.color_vol = color_vol

    def get_volume(self):
        return self.tsdf_vol, self.weight_vol, self.color_vol

    def get_mesh(self):
        """Compute a mesh from the voxel volume using marching cubes.
        """
        tsdf_vol, weight_vol, color_vol = self.get_volume()
        verts, faces, norms, vals = measure.marching_cubes(tsdf_vol.cpu().numpy(), level=0)
        verts_ind = np.round(verts).astype(int)
        verts = verts * self.voxel_size + self.vol_origin.cpu().numpy()  # voxel grid coordinates to world coordinates

        if self.fuse_color:
            rgb_vals = color_vol[verts_ind[:, 0], verts_ind[:, 1], verts_ind[:, 2]].cpu().numpy()
            return verts, faces, norms, rgb_vals.astype(np.uint8)
        else:
            return verts, faces, norms

    def to_o3d_mesh(self):
        """Convert to o3d mesh object for visualization
        """
        verts, faces, norms, colors = self.get_mesh()
        mesh = o3d.geometry.TriangleMesh()
        mesh.vertices = o3d.utility.Vector3dVector(verts.astype(float))
        mesh.triangles = o3d.utility.Vector3iVector(faces.astype(np.int32))
        mesh.vertex_colors = o3d.utility.Vector3dVector(colors / 255.)
        return mesh

    def get_normals(self):
        """Compute normal volume
        """
        nx, ny, nz = self.vol_dim
        device = self.device
        # dx = torch.cat([torch.zeros(1, ny, nz).to(device), (self.tsdf_vol[2:, :, :] - self.tsdf_vol[:-2, :, :]) / (2 * self.voxel_size), torch.zeros(1, ny, nz).to(device)], dim=0)
        # dy = torch.cat([torch.zeros(nx, 1, nz).to(device), (self.tsdf_vol[:, 2:, :] - self.tsdf_vol[:, :-2, :]) / (2 * self.voxel_size), torch.zeros(nx, 1, nz).to(device)], dim=1)
        # dz = torch.cat([torch.zeros(nx, ny, 1).to(device), (self.tsdf_vol[:, :, 2:] - self.tsdf_vol[:, :, :-2]) / (2 * self.voxel_size), torch.zeros(nx, ny, 1).to(device)], dim=2)
        # norms = torch.stack([dx, dy, dz], -1)
        dx = torch.cat([(self.tsdf_vol[1:, :, :] - self.tsdf_vol[:-1, :, :]) / self.voxel_size, torch.zeros(1, ny, nz).to(device)], dim=0)
        dy = torch.cat([(self.tsdf_vol[:, 1:, :] - self.tsdf_vol[:, :-1, :]) / self.voxel_size, torch.zeros(nx, 1, nz).to(device)], dim=1)
        dz = torch.cat([(self.tsdf_vol[:, :, 1:] - self.tsdf_vol[:, :, :-1]) / self.voxel_size, torch.zeros(nx, ny, 1).to(device)], dim=2)
        norms = torch.stack([dx, dy, dz], -1)
        n = torch.norm(norms, dim=-1)
        # remove large values
        outliers_mask = n > 1. / (2 * self.voxel_size)
        norms[outliers_mask] = 0.
        # normalize
        eps = 1e-7
        non_zero_grad = n > eps
        norms[non_zero_grad, :] = norms[non_zero_grad, :] / n[non_zero_grad][:, None]
        return norms  # [nx, ny, nz, 3]

    def get_nn(self, field_vol, coords_w):
        """Get nearest-neigbor values from a given volume
        """
        field_dim = field_vol.shape
        assert len(field_dim) == 3 or len(field_dim) == 4
        vox_coord_float = (coords_w - self.vol_origin[None, :]) / self.voxel_size
        vox_coord = torch.floor(vox_coord_float)
        vox_offset = vox_coord_float - vox_coord  # [N, 3]
        vox_coord[vox_offset >= 0.5] += 1.
        vox_coord[:, 0] = torch.clamp(vox_coord[:, 0], 0., self.vol_dim[0] - 1)
        vox_coord[:, 1] = torch.clamp(vox_coord[:, 1], 0., self.vol_dim[1] - 1)
        vox_coord[:, 2] = torch.clamp(vox_coord[:, 2], 0., self.vol_dim[2] - 1)
        vox_coord = vox_coord.long()
        vx, vy, vz = vox_coord[:, 0], vox_coord[:, 1], vox_coord[:, 2]
        v_nn = field_vol[vx, vy, vz]
        return v_nn

    def tril_interp(self, field_vol, coords_w):
        """Get tri-linear interpolated value from a given volume
        """
        field_dim = field_vol.shape
        assert len(field_dim) == 3 or len(field_dim) == 4
        n_pts = coords_w.shape[0]
        vox_coord = torch.floor((coords_w - self.vol_origin[None, :]) / self.voxel_size).long()  # [N, 3]

        # for border points, don't do interpolation
        non_border_mask = (vox_coord[:, 0] < self.vol_dim[0] - 1) & (vox_coord[:, 1] < self.vol_dim[1] - 1) & \
                          (vox_coord[:, 2] < self.vol_dim[2] - 1)
        v_interp = torch.zeros(n_pts) if len(field_dim) == 3 else torch.zeros(n_pts, field_vol.shape[-1])
        v_interp = v_interp.to(self.device)
        vx_, vy_, vz_ = vox_coord[~non_border_mask, 0], vox_coord[~non_border_mask, 1], vox_coord[~non_border_mask, 2]
        v_interp[~non_border_mask] = field_vol[vx_, vy_, vz_]

        # get interpolated values for normal points
        vx, vy, vz = vox_coord[non_border_mask, 0], vox_coord[non_border_mask, 1], vox_coord[non_border_mask, 2]  # [N]
        vox_idx = vz + vy * self.vol_dim[-1] + vx * self.vol_dim[-1] * self.vol_dim[-2]
        vertices_coord = self.world_c[vox_idx][:, :3]  # [N, 3]
        r = (coords_w[non_border_mask] - vertices_coord) / self.voxel_size
        rx, ry, rz = r[:, 0], r[:, 1], r[:, 2]
        if len(field_dim) == 4:
            rx = rx.unsqueeze(1)
            ry = ry.unsqueeze(1)
            rz = rz.unsqueeze(1)
        # get values at eight corners
        v000 = field_vol[vx, vy, vz]
        v001 = field_vol[vx, vy, vz+1]
        v010 = field_vol[vx, vy+1, vz]
        v011 = field_vol[vx, vy+1, vz+1]
        v100 = field_vol[vx+1, vy, vz]
        v101 = field_vol[vx+1, vy, vz+1]
        v110 = field_vol[vx+1, vy+1, vz]
        v111 = field_vol[vx+1, vy+1, vz+1]
        v_interp[non_border_mask] = v000 * (1 - rx) * (1 - ry) * (1 - rz) \
                                   + v001 * (1 - rx) * (1 - ry) * rz \
                                   + v010 * (1 - rx) * ry * (1 - rz) \
                                   + v011 * (1 - rx) * ry * rz \
                                   + v100 * rx * (1 - ry) * (1 - rz) \
                                   + v101 * rx * (1 - ry) * rz \
                                   + v110 * rx * ry * (1 - rz) \
                                   + v111 * rx * ry * rz

        return v_interp

    def get_pts_inside(self, pts, margin=0):
        vox_coord = torch.floor((pts - self.vol_origin[None, :]) / self.voxel_size).long()  # [N, 3]
        valid_pts_mask = (vox_coord[..., 0] >= margin) & (vox_coord[..., 0] < self.vol_dim[0] - margin) \
                         & (vox_coord[..., 1] >= margin) & (vox_coord[..., 1] < self.vol_dim[1] - margin) \
                         & (vox_coord[..., 2] >= margin) & (vox_coord[..., 2] < self.vol_dim[2] - margin)
        return valid_pts_mask

    # use simple root finding
    @torch.no_grad()
    def render_model(self, c2w, intri, imh, imw, near=0.5, far=5., n_samples=192):
        """
        Perform ray-casting for frame-to-model tracking
        :param c2w: camera pose, [4, 4]
        :param intri: camera intrinsics, [3, 3]
        :param imh: image height
        :param imw: image width
        :param near: near bound for ray-casting
        :param far: far bound for ray-casting
        :param n_samples: number of samples along the ray
        :return: rendered depth, color, vertex, normal and valid mask, [H, W, C]
        """
        rays_o, rays_d = self.get_rays(c2w, intri, imh, imw)  # [h, w, 3]
        z_vals = torch.linspace(near, far, n_samples).to(rays_o)  # [n_samples]
        ray_pts_w = (rays_o[:, :, None, :] + rays_d[:, :, None, :] * z_vals[None, None, :, None]).to(self.device)  # [h, w, n_samples, 3]

        # need to query the tsdf and feature grid
        tsdf_vals = torch.ones(imh, imw, n_samples).to(self.device)
        # filter points that are outside the volume
        valid_ray_pts_mask = self.get_pts_inside(ray_pts_w)
        valid_ray_pts = ray_pts_w[valid_ray_pts_mask]  # [n_valid, 3]
        tsdf_vals[valid_ray_pts_mask] = self.tril_interp(self.tsdf_vol, valid_ray_pts)

        # surface prediction by finding zero crossings
        sign_matrix = torch.cat([torch.sign(tsdf_vals[..., :-1] * tsdf_vals[..., 1:]),
                                 torch.ones(imh, imw, 1).to(self.device)], dim=-1)  # [h, w, n_samples]
        cost_matrix = sign_matrix * torch.arange(n_samples, 0, -1).float().to(self.device)[None, None, :]  # [h, w, n_samples]
        # Get first sign change and mask for values where
        # a.) a sign changed occurred and
        # b.) not a neg to pos sign change occurred
        # c.) ignore border points
        values, indices = torch.min(cost_matrix, -1)
        mask_sign_change = values < 0
        hs, ws = torch.meshgrid(torch.arange(imh), torch.arange(imw))
        mask_pos_to_neg = tsdf_vals[hs, ws, indices] > 0
        inside_vol = self.get_pts_inside(ray_pts_w[hs, ws, indices])
        hit_surface_mask = mask_sign_change & mask_pos_to_neg & inside_vol
        hit_pts = ray_pts_w[hs, ws, indices][hit_surface_mask]  # [n_surf_pts, 3]

        # compute normals
        norms = self.get_normals()
        surf_tsdf = self.tril_interp(self.tsdf_vol, hit_pts)  # [n_surf_pts]
        # surf_norms = self.tril_interp(norms, hit_pts)  # [n_surf_pts, 3]
        surf_norms = self.get_nn(norms, hit_pts)
        updated_hit_pts = hit_pts - surf_tsdf[:, None] * self.sdf_trunc * surf_norms
        valid_mask = self.get_pts_inside(updated_hit_pts)
        hit_pts[valid_mask, :] = updated_hit_pts[valid_mask, :]

        # get depth values
        w2c = torch.inverse(c2w).to(self.device)
        hit_pts_c = (w2c[:3, :3] @ hit_pts.transpose(1, 0)).transpose(1, 0) + w2c[:3, 3][None, :]
        hit_pts_z = hit_pts_c[:, -1]
        depth_rend = torch.zeros(imh, imw).to(self.device)
        # depth_rend[hit_surface_mask] = z_vals[indices[hit_surface_mask]]
        depth_rend[hit_surface_mask] = hit_pts_z

        # vertex map
        vertex_rend = torch.zeros(imh, imw, 3).to(self.device)
        vertex_rend[hit_surface_mask] = hit_pts_c
        # normal map
        surf_norms_c = (w2c[:3, :3] @ surf_norms.transpose(1, 0)).transpose(1, 0)  # [h, w, 3]
        normal_rend = torch.zeros(imh, imw, 3).to(self.device)
        normal_rend[hit_surface_mask] = surf_norms_c

        if self.color_vol is not None:
            # hit_colors = self.color_vol[cx, cy, cz, :]
            hit_colors = self.tril_interp(self.color_vol, hit_pts)
            # set color
            color_rend = torch.zeros(imh, imw, 3).to(self.device)
            color_rend[hit_surface_mask] = hit_colors
        else:
            color_rend = None

        return depth_rend, color_rend, vertex_rend, normal_rend, hit_surface_mask

    def render_pyramid(self, c2w, intri, imh, imw, n_pyr=4, near=0.5, far=5., n_samples=192):
        K = intri.clone()
        dep_pyr, rgb_pyr, vtx_pyr, nrm_pyr, mask_pyr = [], [], [], [], []
        for l in range(n_pyr):
            dep, rgb, feat, vtx, nrm, mask = self.render_model(c2w, K, imh, imw, near=near, far=far, n_samples=n_samples)
            dep_pyr += [dep]
            rgb_pyr += [rgb]
            vtx_pyr += [vtx]
            nrm_pyr += [nrm]
            mask_pyr += [mask]
            imh = imh // 2
            imw = imw // 2
            K /= 2
        return dep_pyr, rgb_pyr, vtx_pyr, nrm_pyr, mask_pyr

    # get voxel index given world coordinate
    # used for testing
    def get_voxel_idx(self, x):
        """
        :param x: [N, 3] query points
        :return: [N] voxel indices
        """
        assert len(x.shape) == 2, print("only accept flattened input!!!")
        x.to(self.device)
        vox_coord = torch.floor((x - self.vol_origin[None, :]) / self.voxel_size)  # [N, 3]
        vx, vy, vz = vox_coord[:, 0], vox_coord[:, 1], vox_coord[:, 2]
        # very important! get voxel index from voxel coordinate
        vox_idx = vz + vy * self.vol_dim[-1] + vx * self.vol_dim[-1] * self.vol_dim[-2]
        return vox_idx.long()

    def get_rays(self, c2w, intrinsics, H, W):
        device = self.device
        c2w = c2w.to(device)
        fx = intrinsics[0, 0]
        fy = intrinsics[1, 1]
        cx = intrinsics[0, 2]
        cy = intrinsics[1, 2]

        i, j = torch.meshgrid(torch.linspace(0, W - 1, W), torch.linspace(0, H - 1, H))  # pytorch's meshgrid has indexing='ij'
        i = i.t().to(device).reshape(H * W)  # [hw]
        j = j.t().to(device).reshape(H * W)  # [hw]

        dirs = torch.stack([(i - cx) / fx, (j - cy) / fy, torch.ones_like(i)], -1).to(device)  # [hw, 3]
        # permute for bmm
        dirs = dirs.transpose(1, 0)  # [3, hw]
        rays_d = (c2w[:3, :3] @ dirs).transpose(1, 0)  # [hw, 3]
        rays_o = c2w[:3, 3].expand(rays_d.shape)

        return rays_o.reshape(H, W, 3), rays_d.reshape(H, W, 3)

相机的追踪

此处是我最喜欢的环节，自认为是整个算法的核心。
tracker.py是追踪过程的主函数，icp.py完成了追踪的主要功能。

import torch
import torch.nn as nn
from icp import ICP

class ICPTracker(nn.Module):

    def __init__(self,
                 args,
                 device,
                 ):

        super(ICPTracker, self).__init__()
        self.n_pyr = args.n_pyramids
        self.scales = list(range(self.n_pyr))
        self.n_iters = args.n_iters
        self.dampings = args.dampings
        # KinectFusion在执行过程调用了图像金字塔
        self.construct_image_pyramids = ImagePyramids(self.scales, pool='avg')
        self.construct_depth_pyramids = ImagePyramids(self.scales, pool='max')
        self.device = device
        # initialize tracker at different levels
        self.icp_solvers = []
        for i in range(self.n_pyr):
            self.icp_solvers += [ICP(self.n_iters[i], damping=self.dampings[i])]

    @torch.no_grad()
    def forward(self, depth0, depth1, K):
        H, W = depth0.shape
        dpt0_pyr = self.construct_depth_pyramids(depth0.view(1, 1, H, W))
        dpt0_pyr = [d.squeeze() for d in dpt0_pyr]
        dpt1_pyr = self.construct_depth_pyramids(depth1.view(1, 1, H, W))
        dpt1_pyr = [d.squeeze() for d in dpt1_pyr]
        # optimization steps
        pose10 = torch.eye(4).to(self.device)  # initialize from identity, eye()在这里创建了单位矩阵
        for i in reversed(range(self.n_pyr)):
            Ki = get_scaled_K(K, i)
            pose10 = self.icp_solvers[i](pose10, dpt0_pyr[i], dpt1_pyr[i], Ki)

        return pose10


class ImagePyramids(nn.Module):
    """ Construct the pyramids in the image / depth space
    """
    def __init__(self, scales, pool='avg'):
        super(ImagePyramids, self).__init__()
        if pool == 'avg':
            self.multiscales = [nn.AvgPool2d(1<

 
  icp.py 
  import torch
import torch.nn as nn
import torch.nn.functional as F


# forward ICP
class ICP(nn.Module):
    def __init__(self,
                 max_iter=3, # 最大迭代次数
                 damping=1e-3, # 衰减率
                 ):
        """
        :param max_iter, maximum number of iterations
        :param damping, damping added to Hessian matrix
        """
        super(ICP, self).__init__()

        self.max_iterations = max_iter
        self.damping = damping

    def forward(self, pose10, depth0, depth1, K):
        """
        In all cases we refer to 0 as template, and always warp pixels from 0 to 1
        :param pose10: initial pose estimate
        :param depth0: template depth image (0)
        :param depth1: depth image (1)
        :param K: intrinsic matric
        :return: refined 0-to-1 transformation pose10
        """
        # create vertex and normal for current frame
        vertex0 = compute_vertex(depth0, K)
        normal0 = compute_normal(vertex0)
        mask0 = depth0 > 0.
        vertex1 = compute_vertex(depth1, K)
        normal1 = compute_normal(vertex1)                            

        for idx in range(self.max_iterations):
            # compute residuals
            residuals, J_F_p = self.compute_residuals_jacobian(vertex0, vertex1, normal0, normal1, mask0, pose10, K)
            JtWJ = self.compute_jtj(J_F_p)  # [B, 6, 6]
            JtR = self.compute_jtr(J_F_p, residuals)
            pose10 = self.GN_solver(JtWJ, JtR, pose10, damping=self.damping)

        return pose10

    @staticmethod
    def compute_residuals_jacobian(vertex0, vertex1, normal0, normal1, mask0, pose10, K):
        """
        :param vertex0: vertex map 0
        :param vertex1: vertex map 1
        :param normal0: normal map 0
        :param normal1: normal map 1
        :param mask0: valid mask of template depth image
        :param pose10: current estimate of pose10
        :param K: intrinsics
        :return: residuals and Jacobians
        """
        R = pose10[:3, :3]
        t = pose10[:3, 3]
        H, W, C = vertex0.shape

        rot_vertex0_to1 = (R @ vertex0.view(-1, 3).permute(1, 0)).permute(1, 0).view(H, W, 3)
        vertex0_to1 = rot_vertex0_to1 + t[None, None, :]
        normal0_to1 = (R @ normal0.view(-1, 3).permute(1, 0)).permute(1, 0).view(H, W, 3)

        fx, fy, cx, cy = K[0, 0], K[1, 1], K[0, 2], K[1, 2]
        x_, y_, z_ = vertex0_to1[..., 0], vertex0_to1[..., 1], vertex0_to1[..., 2]  # [h, w]
        u_ = (x_ / z_) * fx + cx  # [h, w]
        v_ = (y_ / z_) * fy + cy  # [h, w]

        inviews = (u_ > 0) & (u_ < W-1) & (v_ > 0) & (v_ < H-1)
        # projective data association
        r_vertex1 = warp_features(vertex1, u_, v_)  # [h, w, 3]
        r_normal1 = warp_features(normal1, u_, v_)  # [h, w, 3]
        mask1 = r_vertex1[..., -1] > 0.

        diff = vertex0_to1 - r_vertex1  # [h, w, 3]

        # point-to-plane residuals
        res = (r_normal1 * diff).sum(dim=-1)  # [h, w]
        # point-to-plane jacobians
        J_trs = r_normal1.view(-1, 3)  # [hw, 3]
        J_rot = -torch.bmm(J_trs.unsqueeze(dim=1), batch_skew(vertex0_to1.view(-1, 3))).squeeze()   # [hw, 3]

        # compose ja cobians
        J_F_p = torch.cat((J_rot, J_trs), dim=-1).view(H, W, 6)  # follow the order of [rot, trs]  [hw, 1, 6]

        # occlusion
        occ = ~inviews | (diff.norm(p=2, dim=-1) > 0.10)
        invalid_mask = occ | ~mask0 | ~mask1
        J_F_p[invalid_mask] = 0.
        res[invalid_mask] = 0.
        res = res.view(-1, 1)  # [hw, 1]
        J_F_p = J_F_p.view(-1, 1, 6)  # [hw, 1, 6]

        return res, J_F_p

    @staticmethod
    def compute_jtj(jac):
        # J in the dimension of (HW, C, 6)
        jacT = jac.transpose(-1, -2)  # [HW, 6, C]
        jtj = torch.bmm(jacT, jac).sum(0)  # [6, 6]
        return jtj  # [6, 6]

    @staticmethod
    def compute_jtr(jac, res):
        # J in the dimension of (HW, C, 6)
        # res in the dimension of [HW, C]
        jacT = jac.transpose(-1, -2)  # [HW, 6, C]
        jtr = torch.bmm(jacT, res.unsqueeze(-1)).sum(0)  # [6, 1]
        return jtr  # [6, 1]

    @staticmethod
    def GN_solver(JtJ, JtR, pose0, damping=1e-6):
        # Add a small diagonal damping. Without it, the training becomes quite unstable
        # Do not see a clear difference by removing the damping in inference though
        Hessian = lev_mar_H(JtJ, damping)
        # Hessian = JtJ
        updated_pose = forward_update_pose(Hessian, JtR, pose0)

        return updated_pose


def warp_features(Feat, u, v, mode='bilinear'):
    """
    Warp the feature map (F) w.r.t. the grid (u, v). This is the non-batch version
    """
    assert len(Feat.shape) == 3
    H, W, C = Feat.shape
    u_norm = u / ((W - 1) / 2) - 1  # [h, w]
    v_norm = v / ((H - 1) / 2) - 1  # [h, w]
    uv_grid = torch.cat((u_norm.view(1, H, W, 1), v_norm.view(1, H, W, 1)), dim=-1)
    Feat_warped = F.grid_sample(Feat.unsqueeze(0).permute(0, 3, 1, 2), uv_grid, mode=mode, padding_mode='border', align_corners=True).squeeze()
    return Feat_warped.permute(1, 2, 0)


def compute_vertex(depth, K):
    H, W = depth.shape
    fx, fy, cx, cy = K[0, 0], K[1, 1], K[0, 2], K[1, 2]
    device = depth.device

    i, j = torch.meshgrid(torch.linspace(0, W - 1, W), torch.linspace(0, H - 1, H))  # pytorch's meshgrid has indexing='ij'
    i = i.t().to(device)  # [h, w]
    j = j.t().to(device)  # [h, w]

    vertex = torch.stack([(i - cx) / fx, (j - cy) / fy, torch.ones_like(i)], -1).to(device) * depth[..., None]  # [h, w, 3]
    return vertex


def compute_normal(vertex_map):
    """ Calculate the normal map from a depth map
    :param the input depth image
    -----------
    :return the normal map
    """
    H, W, C = vertex_map.shape
    img_dx, img_dy = feature_gradient(vertex_map, normalize_gradient=False)  # [h, w, 3]

    normal = torch.cross(img_dx.view(-1, 3), img_dy.view(-1, 3))
    normal = normal.view(H, W, 3)  # [h, w, 3]

    mag = torch.norm(normal, p=2, dim=-1, keepdim=True)
    normal = normal / (mag + 1e-8)

    # filter out invalid pixels
    depth = vertex_map[:, :, -1]
    # 0.5 and 5.
    invalid_mask = (depth <= depth.min()) | (depth >= depth.max())
    zero_normal = torch.zeros_like(normal)
    normal = torch.where(invalid_mask[..., None], zero_normal, normal)

    return normal


def feature_gradient(img, normalize_gradient=True):
    """ Calculate the gradient on the feature space using Sobel operator
    :param the input image
    -----------
    :return the gradient of the image in x, y direction
    """
    H, W, C = img.shape
    # to filter the image equally in each channel
    wx = torch.FloatTensor([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]]).view(1, 1, 3, 3).type_as(img)
    wy = torch.FloatTensor([[-1, -2, -1], [0, 0, 0], [1, 2, 1]]).view(1, 1, 3, 3).type_as(img)

    img_permuted = img.permute(2, 0, 1).view(-1, 1, H, W)  # [c, 1, h, w]
    img_pad = F.pad(img_permuted, (1, 1, 1, 1), mode='replicate')
    img_dx = F.conv2d(img_pad, wx, stride=1, padding=0).squeeze().permute(1, 2, 0)  # [h, w, c]
    img_dy = F.conv2d(img_pad, wy, stride=1, padding=0).squeeze().permute(1, 2, 0)  # [h, w, c]

    if normalize_gradient:
        mag = torch.sqrt((img_dx ** 2) + (img_dy ** 2) + 1e-8)
        img_dx = img_dx / mag
        img_dy = img_dy / mag

    return img_dx, img_dy  # [h, w, c]


def batch_skew(w):
    """ Generate a batch of skew-symmetric matrices.

        function tested in 'test_geometry.py'

    :input
    :param skew symmetric matrix entry Bx3
    ---------
    :return
    :param the skew-symmetric matrix Bx3x3
    """
    B, D = w.shape
    assert(D == 3)
    o = torch.zeros(B).type_as(w)
    w0, w1, w2 = w[:, 0], w[:, 1], w[:, 2]
    return torch.stack((o, -w2, w1, w2, o, -w0, -w1, w0, o), 1).view(B, 3, 3)


def lev_mar_H(JtWJ, damping):
    # Add a small diagonal damping. Without it, the training becomes quite unstable
    # Do not see a clear difference by removing the damping in inference though
    diag_mask = torch.eye(6).to(JtWJ)
    diagJtJ = diag_mask * JtWJ
    traceJtJ = torch.sum(diagJtJ)
    epsilon = (traceJtJ * damping) * diag_mask
    Hessian = JtWJ + epsilon
    return Hessian


def forward_update_pose(H, Rhs, pose):
    """
    :param H:
    :param Rhs:
    :param pose:
    :return:
    """
    xi = least_square_solve(H, Rhs).squeeze()
    pose = exp_se3(xi) @ pose
    return pose


def exp_se3(xi):
    """
    :param x: Cartesian vector of Lie Algebra se(3)
    :return: exponential map of x
    """
    w = xi[:3].squeeze()  # rotation
    v = xi[3:6].squeeze()  # translation
    w_hat = torch.tensor([[0., -w[2], w[1]],
                          [w[2], 0., -w[0]],
                          [-w[1], w[0], 0.]]).to(xi)
    w_hat_second = torch.mm(w_hat, w_hat).to(xi)

    theta = torch.norm(w)
    theta_2 = theta ** 2
    theta_3 = theta ** 3
    sin_theta = torch.sin(theta)
    cos_theta = torch.cos(theta)
    eye_3 = torch.eye(3).to(xi)

    eps = 1e-8

    if theta <= eps:
        e_w = eye_3
        j = eye_3
    else:
        e_w = eye_3 + w_hat * sin_theta / theta + w_hat_second * (1. - cos_theta) / theta_2
        k1 = (1 - cos_theta) / theta_2
        k2 = (theta - sin_theta) / theta_3
        j = eye_3 + k1 * w_hat + k2 * w_hat_second

    T = torch.eye(4).to(xi)
    T[:3, :3] = e_w
    T[:3, 3] = torch.mv(j, v)
    # T[:3, 3] = v

    return T


def invH(H):
    """ Generate (H+damp)^{-1}, with predicted damping values
    :param approximate Hessian matrix JtWJ
    -----------
    :return the inverse of Hessian
    """
    # GPU is much slower for matrix inverse when the size is small (compare to CPU)
    # works (50x faster) than inversing the dense matrix in GPU
    if H.is_cuda:
        invH = torch.inverse(H.cpu()).cuda()
    else:
        invH = torch.inverse(H)
    return invH


def least_square_solve(H, Rhs):
    """
    Solve for JTJ @ xi = -JTR
    """
    inv_H = invH(H)  # [B, 6, 6] square matrix
    xi = -inv_H @ Rhs
    return xi
 
  其他链接:
 KinectFusion原理介绍. 
  本文内容总结 
   
    介绍KinectFusion算法流程 
    介绍RGBD重建的主要数据集 
    介绍python实现KinectFusion的代码

异物检测的计算机视觉算法技术路线思绪漂移计算机视觉算法人工智能
异物检测的计算机视觉算法技术路线在现代智能监测系统中，异物检测有着其必要性和运维重要性，通过计算机视觉算法，可以实时识别各种异常物体，为设备安全运行提供有力保障。本文将介绍异物检测的主要技术路线。一、分类识别适应场景分类识别技术主要适用于已知目标类别的异物检测场景。在运维环境中，这类场景包括：固定区域内的障碍物监测（如轨道区域的石块、工具、动物等）关键部件的异物附着检测（如固定装置上的杂物）安全通
[特殊字符] AlphaGo：“神之一手”背后的智能革命与人机博弈新纪元大千AI助手人工智能 Python #OTHER 人工智能算法数据挖掘机器学习 alphago google 围棋
从围棋棋盘到科学前沿的通用人工智能范式突破本文由「大千AI助手」原创发布，专注用真话讲AI，回归技术本质。拒绝神话或妖魔化。搜索「大千AI助手」关注我，一起撕掉过度包装，学习真实的AI技术！一、核心定义与历史意义AlphaGo是由谷歌DeepMind团队开发的围棋人工智能程序，其里程碑意义在于：首破人类围棋壁垒：2016年以4:1击败世界冠军李世石九段，成为首个在完整对局中战胜人类顶尖棋手的AI。
【人工智能】Spring AI Alibaba，一个面向 Java 开发者的开源框架，它旨在简化将人工智能（AI）功能集成到应用程序中的过程。本本本添哥 A -AIGC 人工智能大模型人工智能 java spring
一、SpringAIAlibaba介绍SpringAIAlibaba是一个面向Java开发者的开源框架，它旨在简化将人工智能（AI）功能集成到应用程序中的过程。该项目基于SpringAI构建，并且是阿里云通义系列模型及服务在JavaAI应用开发领域的最佳实践。SpringAIAlibaba的目标是为开发者提供一套高层次的AIAPI抽象以及与云原生基础设施的深度集成方案，从而帮助他们快速构建智能应用
模型融合与人机协同：构建人机共生的智能未来 AI天才研究院 Agentic AI 实战计算 AI人工智能与大数据计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
1.背景介绍在科技日新月异的今天，人工智能（AI）已经成为了我们生活中不可或缺的一部分。从智能手机，到自动驾驶汽车，再到医疗诊断，AI的应用已经渗透到了我们生活的方方面面。然而，尽管AI的发展已经取得了显著的成就，但是我们仍然面临着一个重大的挑战：如何让AI系统更好地理解和适应人类的需求，以实现人机共生的智能未来。为了解决这个问题，越来越多的研究者开始探索模型融合和人机协同的方法。2.核心概念与联
vLLM 优化与调优：提升模型性能的关键策略强哥之神人工智能深度学习计算机视觉 deepseek 智能体 vllm
在当今人工智能领域，大语言模型（LLM）的应用日益广泛，而优化和调优这些模型的性能成为了至关重要的任务。vLLM作为一种高效的推理引擎，提供了多种策略来提升模型的性能。本文将深入探讨vLLMV1的优化与调优策略，帮助读者更好地理解和应用这些技术。抢占式调度（Preemption）由于Transformer架构的自回归特性，有时键值缓存（KVcache）空间不足以处理所有批量请求。在这种情况下，vL
Spring Data Neo4j 与后端人工智能算法的数据交互 AI大模型应用实战 spring neo4j 人工智能 ai
SpringDataNeo4j与后端人工智能算法的数据交互关键词：SpringDataNeo4j、图数据库、人工智能算法、数据交互、知识图谱、图神经网络、数据集成摘要：本文深入探讨了如何利用SpringDataNeo4j框架实现后端人工智能算法与图数据库的高效数据交互。文章首先介绍了图数据库和人工智能算法的基本概念，然后详细解析了SpringDataNeo4j的核心架构和原理。接着，通过实际代码示
【AI大模型】深入解析预训练：大模型时代的核心引擎我爱一条柴ya 学习AI记录深度学习人工智能 ai python AI编程算法
预训练已成为现代人工智能，尤其是自然语言处理和计算机视觉领域的基石技术。它彻底改变了模型开发范式，催生了BERT、GPT等革命性模型。本文将系统阐述预训练的核心概念、原理、方法、应用及挑战。一、预训练的本质：为何需要它？核心问题：数据标注的瓶颈监督学习依赖海量高质量标注数据，获取成本极高（时间、金钱、专业知识）。对于复杂任务（如理解语义、生成文本），标注难度呈指数级上升。标注数据稀缺导致模型泛化能
广州曼顿2P数字微断：保护电力设备的安全守护者 mdkk678 安全
在现代社会，电力设备的安全运行对各行各业至关重要。然而，电力系统中存在各种电压波动、过载和短路等问题，可能对设备造成损害。为了保护电力设备免受这些问题的影响，广州曼顿推出了2P数字微断器。本文将介绍这一创新产品的特点和优势，以及它对电力设备的保护作用。广州曼顿科技有限公司专注用户侧智慧数字电气产品研制，以及智慧电能服务大数据云平台建设。基于人工智能技术，大幅提升人触电时的生命安全保障，以及电气火灾
Python通关秘籍之基础教程(一） Smile丶Life丶 Python 通关指南：从零基础到高手之路 python 开发语言后端
引言在编程的世界里，Python就像一位温和而强大的导师，它以简洁优雅的语法和强大的功能吸引着无数初学者和专业人士。无论你是想开发网站、分析数据、构建人工智能，还是仅仅想学习编程思维，Python都是你的理想选择。Python的魅力在于它的易读性和广泛的应用场景。它的代码就像英语句子一样自然，即使是完全没有编程经验的人也能快速上手。同时，Python拥有庞大的生态系统，从Web开发（Django、
多模态大模型发展全景：从架构创新到应用突破陈敬雷-充电了么-CEO兼CTO python 大模型多模态大模型 AIGC 机器学习深度学习 DeepSeek
注：此文章内容均节选自充电了么创始人，CEO兼CTO陈敬雷老师的新书《GPT多模态大模型与AIAgent智能体》（跟我一起学人工智能）【陈敬雷编著】【清华大学出版社】《GPT多模态大模型与AIAgent智能体》新出书籍配套视频【陈敬雷】推荐算法系统实战全系列精品课【陈敬雷】文章目录GPT多模态大模型系列四多模态大模型发展全景：从架构创新到应用突破更多技术内容总结GPT多模态大模型系列四多模态大模型
ollama v0.9.6版本发布详解：修复启动屏幕样式及新增工具名称参数支持福大大架构师每日一题文心一言vschatgpt ollama
作为近年来备受瞩目的开源对话式人工智能框架之一，ollama持续更新优化其产品，致力于为开发者带来更稳定、高效的使用体验。2025年7月8日，ollama发布了v0.9.6版本，这一版本在用户界面和API的可用性方面做出了重要改进，进一步增强了开发和集成的便捷性。本文将对ollamav0.9.6版本的更新内容进行全面解析，详细介绍新特性、修复的具体问题、应用示例及最佳实践，帮助开发者快速掌握和应用
AI人工智能与机器学习的大数据融合应用 AI智能探索者人工智能机器学习大数据 ai
AI人工智能与机器学习的大数据融合应用关键词：AI人工智能、机器学习、大数据、融合应用、数据挖掘摘要：本文深入探讨了AI人工智能与机器学习在大数据融合应用方面的相关内容。首先介绍了研究的背景、目的、预期读者和文档结构，对核心术语进行了清晰定义。接着阐述了AI、机器学习和大数据的核心概念及相互联系，给出了形象的文本示意图和Mermaid流程图。详细讲解了核心算法原理，并通过Python源代码进行说明
深入解读 Qwen3 技术报告（一）：引言小爷毛毛（卓寿杰）大模型AIGC 深度学习基础/原理人工智能自然语言处理 python 语言模型深度学习
重磅推荐专栏：《大模型AIGC》《课程大纲》《知识星球》本专栏致力于探索和讨论当今最前沿的技术趋势和应用领域，包括但不限于ChatGPT和StableDiffusion等。我们将深入研究大型模型的开发和应用，以及与之相关的人工智能生成内容（AIGC）技术。通过深入的技术解析和实践经验分享，旨在帮助读者更好地理解和应用这些领域的最新进展1.引言：迎接大型语言模型的新纪元我们正处在一个由人工智能（AI
AI人工智能遇上TensorFlow：技术融合新趋势 AI大模型应用之禅人工智能 tensorflow python ai
AI人工智能遇上TensorFlow：技术融合新趋势关键词：人工智能、TensorFlow、深度学习、神经网络、机器学习、技术融合、AI开发摘要：本文深入探讨了人工智能技术与TensorFlow框架的融合发展趋势。我们将从基础概念出发，详细分析TensorFlow在AI领域的核心优势，包括其架构设计、算法实现和实际应用。文章包含丰富的技术细节，如神经网络原理、TensorFlow核心算法实现、数学
边缘人工智能与医疗AI融合发展路径：技术融合与应用前景（上） Allen_Lyb 数智化医院2025 人工智能健康医疗算法
引言人工智能技术正以前所未有的速度改变着医疗保健领域，从辅助诊断到个性化治疗，AI应用的广度和深度不断拓展。在这一浪潮中，边缘人工智能（EdgeAI）作为一种新兴技术范式，正成为推动医疗AI创新的关键力量。边缘AI区别于传统的云计算模式，它将数据处理和AI模型部署在数据源头附近，实现快速响应和隐私保护。这种特性使其在医疗保健领域具有独特优势，特别是在实时监测、紧急响应和患者隐私保护等方面。边缘AI
OpenCvSharp 实现环形文字识别OCR实例（C#） XisVisual_Basic ocr c#计算机视觉 C#
近年来，随着计算机视觉和图像处理的不断发展，光学字符识别（OCR）技术也变得愈发成熟。OCR技术可以将图像中的文字转换为可编辑和可搜索的文本，为人们带来了极大的便利。在本篇文章中，我们将介绍如何使用OpenCvSharp库来实现环形文字的识别。首先，在使用OpenCvSharp之前，我们需要确保已经在项目中引用了该库，并添加相应的命名空间。usingOpenCvSharp;接下来，我们需要准备一张
环形文字识别实例：使用OpenCV和OCR的C/C++实现 TechPr opencv ocr c语言 C/C++
环形文字识别实例：使用OpenCV和OCR的C/C++实现在本篇文章中，我们将介绍如何使用OpenCV和OCR技术来实现环形文字的识别。我们将使用C/C++语言编写源代码，并通过一步一步的解释来帮助您理解实现的过程。导入必要的库首先，我们需要导入所需的库。我们将使用OpenCV来处理图像，以及OCR库来进行文字识别。以下是所需的头文件：#include#include#
Python|OpenCV-实现识别弧形文字(17) 写python的鑫哥 OpenCV入门与进阶 python opencv 人工智能计算机视觉弧形文字环形文字识别
前言本文是该专栏的第19篇，后面将持续分享OpenCV计算机视觉的干货知识，记得关注。我们知道，OCR可以识别文字方面的需求，但是如果遇到那些目标文字是“弧形文字”，需要怎么去识别呢？遇到想要识别“弧形文字”的需求，这个时候你可以借助于Opencv+OCR技术来实现。而本文，笔者将针对上述问题需求，利用OpenCV结合OCR来实现“弧形文字”的识别。废话不多说，具体的细节部分以及详细的解决方案，跟
AI人工智能领域中AI作画的技术优势 AI大模型应用之禅人工智能 AI作画 ai
AI人工智能领域中AI作画的技术优势关键词：AI作画、技术优势、人工智能、艺术创作、图像生成摘要：本文深入探讨了AI人工智能领域中AI作画的技术优势。从背景介绍出发，阐述了AI作画的起源与发展，明确了文章的目的、范围、预期读者以及文档结构。接着详细分析了AI作画的核心概念，包括其原理和架构，并通过Mermaid流程图进行直观展示。对核心算法原理进行了深入剖析，结合Python代码示例进行讲解。同时
快速掌握Python编程基础张彦峰ZYF python
干货分享，感谢您的阅读！备注：本博客将自己初步学习Python的总结进行分享，希望大家通过本博客可以在短时间内快速掌握Python的基本程序编码能力，如有错误请留言指正，谢谢！（持续更新）一、快速了解Python和环境准备（一）Python快速介绍Python是一种简洁、强大、易读的编程语言，广泛应用于Web开发、数据分析、人工智能、自动化运维等领域。它由GuidovanRossum在1991年设
人工智能开源的大模型训练微调框架LLaMA-Factory
LLaMA-Factory是一个开源的大模型训练微调框架，具有模块化设计和多种高效的训练方法，能够满足不同用户的需求。用户可以通过命令行或Web界面进行操作，实现个性化的语言模型微调。LLaMA-Factory是一个专注于高效微调LLaMA系列模型的开源框架（GitHub项目地址：https://github.com/hiyouga/LLaMA-Factory）。它以极简配置、低资源消耗和对中文任
智慧城市大脑解决方案
智慧城市大脑背景与意义智慧城市大脑作为城市管理的创新模式，通过集成大数据、人工智能等技术，实现了对城市运行的全面感知与智能决策。它不仅提升了城市管理效率，还为市民带来了更加便捷、安全的生活体验。智慧城市大脑建设历程某城市作为智慧城市大脑的创新策源地，自2016年起便与阿里巴巴集团深度合作，投入巨资自主研发城市数据大脑“交通小脑”平台。该平台成功接入了大量视频和数据，实现了对道路和时间资源的再分配，
csdn-AI测评 Right.W 人工智能
一、你平时会使用这类AI工具吗？你对这类型的工具有什么看法？AI工具灵活、多样、能够回答各种问题，大为方便了人们日常学习、工作、生活的需要。目前很流行的chartgpt就是一款超火爆的ai工具，可以写论文、敲代码各种功能十分强大，为各个领域的数字化和智能化进程给予了很大帮助。但是人的智慧和意识是机器无法取代的，人类对人工智能不能过度依赖，人工智能只是改善生活、提高效率的工具而已。二、你可以花几分钟
智慧城市大脑：城市治理的新引擎 Fulima_cloud 智慧城市人工智能
在科技日新月异的今天，智慧城市的概念已经深入人心。而智慧城市大脑，作为智慧城市的中枢神经系统，运用大数据、云计算、物联网、人工智能等先进技术，构建的城市级智能化管理体系，正逐步成为提升城市治理能力、优化城市服务、推动城市可持续发展的重要力量。智慧城市大脑是什么，简而言之，是运用大数据、云计算、物联网、人工智能等先进技术，构建的城市级智能化管理体系。它如同城市的“智慧中枢”，通过对城市全域运行数据的
【小白入门必看】一文读懂深度学习计算机视觉技术及学习路线
一、什么是计算机视觉？计算机视觉，其实就是教机器怎么像我们人一样，用摄像头看看周围的世界，然后理解它。比如说，它能认出这是个苹果，或者那边有辆车。除此之外，还能把拍到的照片或者视频转换成有用的信息，帮我们做决定。整个过程就是为了让机器能看懂图像，然后根据这些图像来做出聪明的选择。二、计算机视觉实现起来难吗？人类依赖视觉，找辆汽车轻而易举，毕竟汽车那么大，一眼就能看出来，所以常误以为计算机视觉简单，
【亲测免费】探索AudioSlicer：智能音频分割工具秦贝仁Lincoln
探索AudioSlicer：智能音频分割工具去发现同类优质开源项目:https://gitcode.com/项目介绍AudioSlicer是一个基于Python的轻量级工具，专门用于切割.wav音频文件。它通过检测静音段将音频拆分成多个独立样本，并生成一个.json文件，详细记录了每个切片的时间范围。该项目灵感源自AndrewPhillipDoss的工作，现在正向着人工智能适应的方向发展，有望实现
人工智能怎么入门？零基础入门指南：从小白到AI实战者的第一步 OpenCV图像识别人工智能人工智能计算机视觉自然语言处理神经网络机器学习
人工智能（AI）是当今最具前景的科技领域之一。从聊天机器人到自动驾驶，从图像识别到语音翻译，AI正在以前所未有的速度改变世界。但对于初学者来说，一个最常见的问题是：“我没有基础，也不是学数学或计算机的，人工智能还能学吗？我该怎么入门？”答案是：可以学，而且你并不孤单。越来越多的人正在以“跨专业、转行、自学”的方式进入AI领域。关键是，你需要一个清晰的入门路径，理解应该先做什么、学什么、避开什么误区
计算机视觉：Transformer的轻量化与加速策略 xcLeigh 计算机视觉CV 计算机视觉 transformer 人工智能 AI 策略
计算机视觉：Transformer的轻量化与加速策略一、前言二、Transformer基础概念回顾2.1Transformer架构概述2.2自注意力机制原理三、Transformer轻量化策略3.1模型结构优化3.1.1减少层数和头数3.1.2优化Patch大小3.2参数共享与剪枝3.2.1参数共享3.2.2剪枝3.3知识蒸馏四、Transformer加速策略4.1模型量化4.2.2TPU加速4.
深度学习基础与应用：从理论到实战创新工场
本文还有配套的精品资源，点击获取简介：深度学习是人工智能的核心分支，通过模拟人脑神经网络处理大量数据以执行复杂任务。Python因其简洁性和强大的库支持成为深度学习研究的首选语言。本文概述了深度学习基础概念、核心算法、Python框架，并假设了一个包含教程、示例代码、数据集、交互式学习环境、性能评估指标和进阶主题的“deep-learning-study-main”压缩包内容，旨在帮助学习者深入理
从点子到原型只需10分钟：用 Copilot 快速验证产品功能网罗开发 AI 大模型 Python 技术汇总人工智能 copilot
网罗开发（小红书、快手、视频号同名）大家好，我是展菲，目前在上市企业从事人工智能项目研发管理工作，平时热衷于分享各种编程领域的软硬技能知识以及前沿技术，包括iOS、前端、HarmonyOS、Java、Python等方向。在移动端开发、鸿蒙开发、物联网、嵌入式、云原生、开源等领域有深厚造诣。图书作者：《ESP32-C3物联网工程开发实战》图书作者：《SwiftUI入门，进阶与实战》超级个体：CO
统一思想认识永夜-极光思想
1.统一思想认识的基础,才能有的放矢原因: 总有一种描述事物的方式最贴近本质,最容易让人理解. 如何让教育更轻松,在于找到最适合学生的方式. 难点在于,如何模拟对方的思维基础选择合适的方式. &
Joda Time使用笔记 bylijinnan java joda time
Joda Time的介绍可以参考这篇文章： http://www.ibm.com/developerworks/cn/java/j-jodatime.html 工作中也常常用到Joda Time，为了避免每次使用都查API，记录一下常用的用法： /** * DateTime变化（增减） */ @Tes
FileUtils API eksliang FileUtils FileUtils API
转载请出自出处：http://eksliang.iteye.com/blog/2217374 一、概述这是一个Java操作文件的常用库，是Apache对java的IO包的封装，这里面有两个非常核心的类FilenameUtils跟FileUtils，其中FilenameUtils是对文件名操作的封装;FileUtils是文件封装，开发中对文件的操作，几乎都可以在这个框架里面找到。非常的好用。
各种新兴技术不懂事的小屁孩技术
1:gradle Gradle 是以 Groovy 语言为基础，面向Java应用为主。基于DSL（领域特定语言）语法的自动化构建工具。现在构建系统常用到maven工具，现在有更容易上手的gradle，搭建java环境: http://www.ibm.com/developerworks/cn/opensource/os-cn-gradle/ 搭建android环境： http://m
tomcat6的https双向认证酷的飞上天空 tomcat6
1.生成服务器端证书 keytool -genkey -keyalg RSA -dname "cn=localhost,ou=sango,o=none,l=china,st=beijing,c=cn" -alias server -keypass password -keystore server.jks -storepass password -validity 36
托管虚拟桌面市场势不可挡蓝儿唯美
用户还需要冗余的数据中心，dinCloud的高级副总裁兼首席营销官Ali Din指出。该公司转售一个MSP可以让用户登录并管理和提供服务的用于DaaS的云自动化控制台，提供服务或者MSP也可以自己来控制。在某些情况下，MSP会在dinCloud的云服务上进行服务分层，如监控和补丁管理。 MSP的利润空间将根据其参与的程度而有所不同，Din说。 “我们有一些合作伙伴负责将我们推荐给客户作为个
spring学习——xml文件的配置 a-john spring
在Spring的学习中，对于其xml文件的配置是必不可少的。在Spring的多种装配Bean的方式中，采用XML配置也是最常见的。以下是一个简单的XML配置文件： <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.or
HDU 4342 History repeat itself 模拟 aijuans 模拟
来源：http://acm.hdu.edu.cn/showproblem.php?pid=4342 题意：首先让求第几个非平方数，然后求从1到该数之间的每个sqrt(i)的下取整的和。思路：一个简单的模拟题目，但是由于数据范围大，需要用__int64。我们可以首先把平方数筛选出来，假如让求第n个非平方数的话，看n前面有多少个平方数，假设有x个，则第n个非平方数就是n+x。注意两种特殊情况，即
java中最常用jar包的用途 asia007 java
java中最常用jar包的用途 jar包用途axis.jarSOAP引擎包commons-discovery-0.2.jar用来发现、查找和实现可插入式接口，提供一些一般类实例化、单件的生命周期管理的常用方法.jaxrpc.jarAxis运行所需要的组件包saaj.jar创建到端点的点到点连接的方法、创建并处理SOAP消息和附件的方法，以及接收和处理SOAP错误的方法. w
ajax获取Struts框架中的json编码异常和Struts中的主控制器异常的解决办法百合不是茶 js json编码返回异常
一:ajax获取自定义Struts框架中的json编码出现以下问题: 1,强制flush输出 json编码打印在首页 2, 不强制flush js会解析json 打印出来的是错误的jsp页面却没有跳转到错误页面 3, ajax中的dataType的json 改为text 会
JUnit使用的设计模式 bijian1013 java 设计模式 JUnit
JUnit源代码涉及使用了大量设计模式 1、模板方法模式（Template Method）定义一个操作中的算法骨架，而将一些步骤延伸到子类中去，使得子类可以不改变一个算法的结构，即可重新定义该算法的某些特定步骤。这里需要复用的是算法的结构，也就是步骤，而步骤的实现可以在子类中完成。
Linux常用命令（摘录） sunjing crond chkconfig
chkconfig --list 查看linux所有服务 chkconfig --add servicename 添加linux服务 netstat -apn | grep 8080 查看端口占用 env 查看所有环境变量 echo $JAVA_HOME 查看JAVA_HOME环境变量安装编译器 yum install -y gcc
【Hadoop一】Hadoop伪集群环境搭建 bit1129 hadoop
结合网上多份文档，不断反复的修正hadoop启动和运行过程中出现的问题，终于把Hadoop2.5.2伪分布式安装起来，跑通了wordcount例子。Hadoop的安装复杂性的体现之一是，Hadoop的安装文档非常多，但是能一个文档走下来的少之又少，尤其是Hadoop不同版本的配置差异非常的大。Hadoop2.5.2于前两天发布，但是它的配置跟2.5.0，2.5.1没有分别。 &nb
Anychart图表系列五之事件监听白糖_ chart
创建图表事件监听非常简单：首先是通过addEventListener('监听类型',js监听方法)添加事件监听，然后在js监听方法中定义具体监听逻辑。以钻取操作为例，当用户点击图表某一个point的时候弹出point的name和value，代码如下： <script> //创建AnyChart var chart = new AnyChart(); //添加钻取操作&quo
Web前端相关段子 braveCS web前端
Web标准：结构、样式和行为分离使用语义化标签 0）标签的语义：使用有良好语义的标签，能够很好地实现自我解释，方便搜索引擎理解网页结构，抓取重要内容。去样式后也会根据浏览器的默认样式很好的组织网页内容，具有很好的可读性，从而实现对特殊终端的兼容。 1）div和span是没有语义的：只是分别用作块级元素和行内元素的区域分隔符。当页面内标签无法满足设计需求时，才会适当添加div
编程之美-24点游戏 bylijinnan 编程之美
import java.util.ArrayList; import java.util.Arrays; import java.util.HashSet; import java.util.List; import java.util.Random; import java.util.Set; public class PointGame { /**编程之美
主页面子页面传值总结 chengxuyuancsdn 总结
1、showModalDialog returnValue是javascript中html的window对象的属性,目的是返回窗口值,当用window.showModalDialog函数打开一个IE的模式窗口时,用于返回窗口的值主界面 var sonValue=window.showModalDialog("son.jsp"); 子界面 window.retu
[网络与经济]互联网+的含义 comsci 互联网+
互联网+后面是一个人的名字 = 网络控制系统互联网+你的名字 = 网络个人数据库每日提示:如果人觉得不舒服,千万不要外出到处走动,就呆在床上,玩玩手游,更不能够去开车,现在交通状况不
oracle 创建视图 with check option daizj 视图 view oralce
我们来看下面的例子： create or replace view testview as select empno,ename from emp where ename like ‘M%’ with check option; 这里我们创建了一个视图，并使用了with check option来限制了视图。然后我们来看一下视图包含的结果： select * from testv
ToastPlugin插件在cordova3.3下使用 dibov Cordova
自己开发的Todos应用，想实现“ 再按一次返回键退出程序 ”的功能，采用网上的ToastPlugins插件，发现代码或文章基本都是老版本，运行问题比较多。折腾了好久才弄好。下面吧基于cordova3.3下的ToastPlugins相关代码共享。 ToastPlugin.java package&nbs
C语言22个系统函数 dcj3sjt126com c function
C语言系统函数一、数学函数下列函数存放在math.h头文件中Double floor(double num) 求出不大于num的最大数。Double fmod(x, y) 求整数x/y的余数。Double frexp(num, exp); double num; int *exp; 将num分为数字部分（尾数）x和以2位的指数部分n，即num=x*2n，指数n存放在exp指向的变量中，返回x。D
开发一个类的流程 dcj3sjt126com 开发
本人近日根据自己的开发经验总结了一个类的开发流程。这个流程适用于单独开发的构件，并不适用于对一个项目中的系统对象开发。开发出的类可以存入私人类库，供以后复用。以下是开发流程： 1. 明确类的功能，抽象出类的大概结构 2. 初步设想类的接口 3. 类名设计（驼峰式命名） 4. 属性设置(权限设置) 判断某些变量是否有必要作为成员属
java 并发 shuizhaosi888 java 并发
能够写出高伸缩性的并发是一门艺术在JAVA SE5中新增了3个包 java.util.concurrent java.util.concurrent.atomic java.util.concurrent.locks 在java的内存模型中，类的实例字段、静态字段和构成数组的对象元素都会被多个线程所共享，局部变量与方法参数都是线程私有的，不会被共享。
Spring Security（11）——匿名认证 234390216 Spring Security ROLE_ANNOYMOUS 匿名
匿名认证目录 1.1 配置 1.2 AuthenticationTrustResolver 对于匿名访问的用户，Spring Security支持为其建立一个匿名的AnonymousAuthenticat
NODEJS项目实践0.2[ express,ajax通信...] 逐行分析JS源代码 Ajax nodejs express
一、前言通过上节学习，我们已经 ubuntu系统搭建了一个可以访问的nodejs系统，并做了nginx转发。本节原要做web端服务及 mongodb的存取，但写着写着，web端就
在Struts2 的Action中怎样获取表单提交上来的多个checkbox的值 lhbthanks java html struts checkbox
第一种方法：获取结果String类型在 Action 中获得的是一个 String 型数据，每一个被选中的 checkbox 的 value 被拼接在一起，每个值之间以逗号隔开(,)。所以在 Action 中定义一个跟 checkbox 的 name 同名的属性来接收这些被选中的 checkbox 的 value 即可。以下是实现的代码：前台 HTML 代码：
003.Kafka基本概念 nweiren hadoop kafka
Kafka基本概念：Topic、Partition、Message、Producer、Broker、Consumer。 Topic：消息源（Message）的分类。 Partition： Topic物理上的分组，一
Linux环境下安装JDK roadrunners jdk linux
1、准备工作创建JDK的安装目录： mkdir -p /usr/java/ 下载JDK，找到适合自己系统的JDK版本进行下载： http://www.oracle.com/technetwork/java/javase/downloads/index.html 把JDK安装包下载到/usr/java/目录，然后进行解压： tar -zxvf jre-7
Linux忘记root密码的解决思路 tomcat_oracle linux
1：使用同版本的linux启动系统，chroot到忘记密码的根分区passwd改密码　　2：grub启动菜单中加入init=/bin/bash进入系统，不过这时挂载的是只读分区。根据系统的分区情况进一步判断. 　　3: grub启动菜单中加入 single以单用户进入系统. 　　4:用以上方法mount到根分区把/etc/passwd中的root密码去除　　例如: 　　ro
跨浏览器 HTML5 postMessage 方法以及 message 事件模拟实现 xueyou jsonp jquery 框架 UI html5
postMessage 是 HTML5 新方法，它可以实现跨域窗口之间通讯。到目前为止，只有 IE8+, Firefox 3, Opera 9, Chrome 3和 Safari 4 支持，而本篇文章主要讲述 postMessage 方法与 message 事件跨浏览器实现。postMessage 方法 JSONP 技术不一样，前者是前端擅长跨域文档数据即时通讯，后者擅长针对跨域服务端数据通讯，p