iou的cpu和gpu源码实现

本专栏主要是深度学习/自动驾驶相关的源码实现,获取全套代码请参考

简介

IoU(Intersection over Union)是一种测量在特定数据集中检测相应物体准确度的一个标准,通常用于目标检测中预测框(bounding box)之间准确度的一个度量(预测框和实际目标框)。
iou的cpu和gpu源码实现_第1张图片

IoU计算的是“预测的边框”和“真实的边框”的交叠率,即它们的交集和并集的比值。最理想情况是完全重叠,即比值为1。

IoU的计算方法如下:

计算两个框的交集面积,即两个框的左、上、右、下四个点的交集。
计算两个框的并集面积,即两个框的左、上、右、下四个点的并集。
计算交集面积和并集面积的比值,即为 IoU 值。
IoU的优点是可以反映预测检测框与真实检测框的检测效果,并且具有尺度不变性,即对尺度不敏感。但是,IoU也存在一些缺点,例如无法反映两个框之间的距离大小(重合度),如果两个框没有相交,则 IoU 值为 0,无法进行学习训练。

源码实现:

cpu版源码实现:

def iou_core(box1: Tensor, box2: Tensor, area_sum: Tensor):
    overlap_w = torch.min(box1[2],box2[2]) - torch.max(box1[0],box2[0])
    overlap_h = torch.min(box1[3],box2[3]) - torch.max(box1[1],box2[1])
    if overlap_w <= 0 or overlap_h <= 0:
        return 0
    overlap_area = overlap_h * overlap_w
    return overlap_area / (area_sum - overlap_area)

def iou_cpu(box1: Tensor, box2: Tensor):
    box1_num = box1.size(0)
    box2_num = box2.size(0)
    box1_dim = box1.size(1)
    box2_dim = box2.size(1)
    if box1_dim != 4 or box2_dim != 4:
        return -1

    box1_area = (box1[:, 2] - box1[:, 0]) * (box1[:, 3] - box1[:, 1])
    box2_area = (box2[:, 2] - box2[:, 0]) * (box2[:, 3] - box2[:, 1])

    result = torch.zeros(size=(box1_num, box2_num))
    for i in range(box1_num):
        for j in range(box2_num):
            if box1_area[i] >= 0 and box2_area[j] >= 0:
                result[i, j] = iou_core(box1[i], box2[j], box1_area[i] + box2_area[j])
            else:
                result[i, j] = 9999
    return result

gpu版源码实现:

__device__ float iou_core(const float* box1 ,const float* box2){
    float box1_x0 = *(box1 + 0);
    float box1_y0 = *(box1 + 1);
    float box1_x1 = *(box1 + 2);
    float box1_y1 = *(box1 + 3);
    float box2_x0 = *(box2 + 0);
    float box2_y0 = *(box2 + 1);
    float box2_x1 = *(box2 + 2);
    float box2_y1 = *(box2 + 3);
    if(!(box1_x0 < box1_x1 && box1_y0 < box1_y1 && box2_x0 < box2_x1 && box2_y0 < box2_y1)){
        return 9999;
    }

    float inter_x0 = std::max(box1_x0, box2_x0);
    float inter_x1 = std::min(box1_x1, box2_x1);
    float inter_y0 = std::max(box1_y0, box2_y0);
    float inter_y1 = std::min(box1_y1, box2_y1);
    float inter_area = (inter_x1 - inter_x0)*(inter_y1-inter_y0);
    inter_area = std::max(inter_area, 0.0f);

    float box1_area = (box1_x1 - box1_x0)*(box1_y1-box1_y0);
    float box2_area = (box2_x1 - box2_x0)*(box2_y1-box2_y0);
    float iou = inter_area / (box1_area + box2_area - inter_area);
    printf("iou =%f\n",iou);
    return iou;
}

__global__ void iou_gpu_kernel(const int box1_num,
const float* box1_ptr,
const int box2_num,
const float* box2_ptr,
float* result_ptr){
    const int box1_idx = blockIdx.x * THREADS_PER_BLOCK + threadIdx.x;
    const int box2_idx = blockIdx.y * THREADS_PER_BLOCK + threadIdx.y;
    printf("gpu: box1_idx = %d, box2_idy= %d\n",box1_idx,box2_idx);
    if(box1_idx>=box1_num || box2_idx>=box2_num){
        return;
    }
    printf("gpu: box1_idx = %d, box2_idy= %d, result_id= %d\n",box1_idx,box2_idx,box1_idx * box2_num + box2_idx);
    const float* box1 = box1_ptr + box1_idx * 4;
    const float* box2 = box2_ptr + box2_idx * 4;
    float iou = iou_core(box1, box2);
    *(result_ptr + box1_idx * box2_num + box2_idx) = iou;
}

void iou_gpu_launch(const int box1_num,
const float* box1_ptr,
const int box2_num,
const float* box2_ptr,
float* result_ptr){
    dim3 blocks(DIVUP(box1_num, THREADS_PER_BLOCK),DIVUP(box2_num, THREADS_PER_BLOCK));//每个grid的blocks
    dim3 threads(THREADS_PER_BLOCK,THREADS_PER_BLOCK);//每个block里面的thread
    printf("blocks=(%d %d), threads=(%d %d)\n",
        DIVUP(box1_num, THREADS_PER_BLOCK),DIVUP(box2_num, THREADS_PER_BLOCK),
        THREADS_PER_BLOCK,THREADS_PER_BLOCK);
    iou_gpu_kernel<<<blocks,threads>>>(box1_num,box1_ptr,box2_num,box2_ptr,result_ptr);
    cudaDeviceSynchronize();// waiting for gpu work
    printf("gpu done\n");
}

耗时测试:

import torch
from iou import iou_gpu, iou_cpu
from utils import TicToc

device = torch.device('cuda:0')
input1 = torch.Tensor([[0, 0, 1, 1],
                       [0, 2, 1, 3],
                       [0.2, 0, 1, 1],
                       [0.1, 2, 1, 3],
                       [0.11, 0, 1, 1],
                       [0, 2.4, 1, 3],
                       [0.2, 0.1, 1, 1],
                       [0.7, 2.5, 1, 3],
                       [0, 0, 6, 1],
                       [1.5, 2, 1, 3]]).to(device)
input2 = torch.Tensor([[0.5, 0, 1.5, 1],
                       [0, 0.5, 1, 1.5],
                       [0.5, 0.5, 1.5, 1.5],
                       [0, 0.5, 1, 2.5]]).to(device)

tictic = TicToc('iou fun')
for i in range(1000):
    result = iou_gpu(input1, input2)
tictic.toc()
tictic.tic()
for i in range(1000):
    result2 = iou_cpu(input1.to('cpu'), input2.to('cpu'))
tictic.toc()
pass

具体流程说明:

IoU的计算方法如下:
计算两个框的交集面积,即两个框的左、上、右、下四个点的交集。
计算两个框的并集面积,即两个框的左、上、右、下四个点的并集。
计算交集面积和并集面积的比值,即为 IoU 值。
在实际应用中,通常设定 IoU 的阈值,例如 0.5 或 0.7 等,当 IoU 值大于阈值时,认为预测成功。通过调整阈值,可以得到不同的模型,再通过不同的评价指标(如 ROC 曲线、F1 值等)来确定最优模型。

如需获取全套代码请参考

你可能感兴趣的:(手撕源码系列,BEV感知系列,深度学习,人工智能,pytorch,transformer,自然语言处理)