【庖丁解牛】从零实现RetinaNet（五）：回归预测转换、NMS后处理、decode解码

文章目录

回归预测转换
NMS后处理
decode解码

所有代码已上传到本人github repository：https://github.com/zgcr/pytorch-ImageNet-CIFAR-COCO-VOC-training
如果觉得有用，请点个star哟！
下列代码均在pytorch1.4版本中测试过，确认正确无误。

回归预测转换

模型训练完成后，需要decode模型输出才能进行测试。我们从RetinaNet类进行forward计算后可以得到cls heads和reg heads，但此时reg heads预测的是tx，ty，tw，th，我们需要使用对应的Anchor box坐标将其转换为预测的box坐标。坐标的转换规则就是从零实现RetinaNet（四）中box坐标转换为回归标签tx，ty，tw，th的逆运算。

回归预测转换为box预测的代码实现如下：

    def snap_tx_ty_tw_th_reg_heads_to_x1_y1_x2_y2_bboxes(
            self, reg_heads, anchors):
        """
        snap reg heads to pred bboxes
        reg_heads:[anchor_nums,4],4:[tx,ty,tw,th]
        anchors:[anchor_nums,4],4:[x_min,y_min,x_max,y_max]
        """
        anchors_wh = anchors[:, 2:] - anchors[:, :2]
        anchors_ctr = anchors[:, :2] + 0.5 * anchors_wh

        device = anchors.device
        factor = torch.tensor([[0.1, 0.1, 0.2, 0.2]]).to(device)

        reg_heads = reg_heads * factor

        pred_bboxes_wh = torch.exp(reg_heads[:, 2:]) * anchors_wh
        pred_bboxes_ctr = reg_heads[:, :2] * anchors_wh + anchors_ctr

        pred_bboxes_x_min_y_min = pred_bboxes_ctr - 0.5 * pred_bboxes_wh
        pred_bboxes_x_max_y_max = pred_bboxes_ctr + 0.5 * pred_bboxes_wh

        pred_bboxes = torch.cat(
            [pred_bboxes_x_min_y_min, pred_bboxes_x_max_y_max], axis=1)
        pred_bboxes = pred_bboxes.int()

        pred_bboxes[:, 0] = torch.clamp(pred_bboxes[:, 0], min=0)
        pred_bboxes[:, 1] = torch.clamp(pred_bboxes[:, 1], min=0)
        pred_bboxes[:, 2] = torch.clamp(pred_bboxes[:, 2],
                                        max=self.image_w - 1)
        pred_bboxes[:, 3] = torch.clamp(pred_bboxes[:, 3],
                                        max=self.image_h - 1)

        # pred bboxes shape:[anchor_nums,4]
        return pred_bboxes

NMS后处理

NMS后处理的标准方法是：先将所有候选目标按分类score从大到小排序，记录所有候选目标的分类类别有哪几种。然后开始遍历探测到的这几个类别，对于每个类别，提取出这个类别的所有候选目标（注意因为我们一开始已经排过序了，所以按类别提取出来仍然是有序的），先把第一个目标提取到保留目标集合中，然后计算剩余所有目标与该目标的IoU，IoU大于阈值的候选目标全部抛弃。对于RetinaNet，这个阈值为0.5。然后剩余没有抛弃的目标重复上面过程，继续把第一个目标提取到保留目标集合中，后面操作都是一样的，直到没有候选目标为止，对该类候选目标的NMS就做完了。对所有类别都遍历完，NMS就做完了。
在其他目标检测代码实现中，我发现有许多代码在做NMS后处理时并没有分类别来作NMS（即所有不同类别的候选目标一起作NMS）。因此我也尝试了这种做法，发现这种做法总是比NMS的标准做法要低0.2~0.5个mAP左右，因此，在下面的代码实现中，还是使用NMS的标准方法。

NMS后处理的代码实现如下：

    def nms(self, one_image_scores, one_image_classes, one_image_pred_bboxes):
        """
        one_image_scores:[anchor_nums],4:classification predict scores
        one_image_classes:[anchor_nums],class indexes for predict scores
        one_image_pred_bboxes:[anchor_nums,4],4:x_min,y_min,x_max,y_max
        """
        # Sort boxes
        sorted_one_image_scores, sorted_one_image_scores_indexes = torch.sort(
            one_image_scores, descending=True)
        sorted_one_image_classes = one_image_classes[
            sorted_one_image_scores_indexes]
        sorted_one_image_pred_bboxes = one_image_pred_bboxes[
            sorted_one_image_scores_indexes]
        sorted_pred_bboxes_w_h = sorted_one_image_pred_bboxes[:,
                                                              2:] - sorted_one_image_pred_bboxes[:, :
                                                                                                 2]

        sorted_pred_bboxes_areas = sorted_pred_bboxes_w_h[:,
                                                          0] * sorted_pred_bboxes_w_h[:,
                                                                                      1]
        detected_classes = torch.unique(sorted_one_image_classes, sorted=True)

        keep_scores, keep_classes, keep_pred_bboxes = [], [], []
        for detected_class in detected_classes:
            single_class_scores = sorted_one_image_scores[
                sorted_one_image_classes == detected_class]
            single_class_pred_bboxes = sorted_one_image_pred_bboxes[
                sorted_one_image_classes == detected_class]
            single_class_pred_bboxes_areas = sorted_pred_bboxes_areas[
                sorted_one_image_classes == detected_class]
            single_class = sorted_one_image_classes[sorted_one_image_classes ==
                                                    detected_class]

            single_keep_scores,single_keep_classes,single_keep_pred_bboxes=[],[],[]
            while single_class_scores.numel() > 0:
                top1_score, top1_class, top1_pred_bbox = single_class_scores[
                    0:1], single_class[0:1], single_class_pred_bboxes[0:1]

                single_keep_scores.append(top1_score)
                single_keep_classes.append(top1_class)
                single_keep_pred_bboxes.append(top1_pred_bbox)

                top1_areas = single_class_pred_bboxes_areas[0]

                if single_class_scores.numel() == 1:
                    break

                single_class_scores = single_class_scores[1:]
                single_class = single_class[1:]
                single_class_pred_bboxes = single_class_pred_bboxes[1:]
                single_class_pred_bboxes_areas = single_class_pred_bboxes_areas[
                    1:]

                overlap_area_top_left = torch.max(
                    single_class_pred_bboxes[:, :2], top1_pred_bbox[:, :2])
                overlap_area_bot_right = torch.min(
                    single_class_pred_bboxes[:, 2:], top1_pred_bbox[:, 2:])
                overlap_area_sizes = torch.clamp(overlap_area_bot_right -
                                                 overlap_area_top_left,
                                                 min=0)
                overlap_area = overlap_area_sizes[:, 0] * overlap_area_sizes[:,
                                                                             1]

                # compute union_area
                union_area = top1_areas + single_class_pred_bboxes_areas - overlap_area
                union_area = torch.clamp(union_area, min=1e-4)
                # compute ious for top1 pred_bbox and the other pred_bboxes
                ious = overlap_area / union_area

                single_class_scores = single_class_scores[
                    ious < self.nms_threshold]
                single_class = single_class[ious < self.nms_threshold]
                single_class_pred_bboxes = single_class_pred_bboxes[
                    ious < self.nms_threshold]
                single_class_pred_bboxes_areas = single_class_pred_bboxes_areas[
                    ious < self.nms_threshold]

            single_keep_scores = torch.cat(single_keep_scores, axis=0)
            single_keep_classes = torch.cat(single_keep_classes, axis=0)
            single_keep_pred_bboxes = torch.cat(single_keep_pred_bboxes,
                                                axis=0)

            keep_scores.append(single_keep_scores)
            keep_classes.append(single_keep_classes)
            keep_pred_bboxes.append(single_keep_pred_bboxes)

        keep_scores = torch.cat(keep_scores, axis=0)
        keep_classes = torch.cat(keep_classes, axis=0)
        keep_pred_bboxes = torch.cat(keep_pred_bboxes, axis=0)

        return keep_scores, keep_classes, keep_pred_bboxes

decode解码

有了上面两部分，现在我们可以开始decode解码了。整个decode解码的流程是：先将reg head的tx，ty，tw，th预测转换为box坐标预测（需要使用Anchor坐标信息），然后使用一个分类score阈值过滤到分类分数太低的候选目标，对于RetinaNet，这个阈值是0.05。然后，我们对剩下的候选目标NMS后处理，得到保留的候选目标。最后，我们还设置了一个max_detection_num，即确定最终输出时保留多少个目标，对于COCO数据集，这个值为100，因为COCO数据集的图片上没有单张图片标注了超过100个目标的情况。
decode解码的代码实现如下：

class RetinaDecoder(nn.Module):
    def __init__(self,
                 image_w,
                 image_h,
                 min_score_threshold=0.05,
                 nms_threshold=0.5,
                 max_detection_num=100):
        super(RetinaDecoder, self).__init__()
        self.image_w = image_w
        self.image_h = image_h
        self.min_score_threshold = min_score_threshold
        self.nms_threshold = nms_threshold
        self.max_detection_num = max_detection_num

    def forward(self, cls_heads, reg_heads, batch_anchors):
        device = cls_heads[0].device
        cls_heads = torch.cat(cls_heads, axis=1)
        reg_heads = torch.cat(reg_heads, axis=1)
        batch_anchors = torch.cat(batch_anchors, axis=1)

        batch_scores, batch_classes, batch_pred_bboxes = [], [], []
        for per_image_cls_heads, per_image_reg_heads, per_image_anchors in zip(
                cls_heads, reg_heads, batch_anchors):
            pred_bboxes = self.snap_tx_ty_tw_th_reg_heads_to_x1_y1_x2_y2_bboxes(
                per_image_reg_heads, per_image_anchors)
            scores, score_classes = torch.max(per_image_cls_heads, dim=1)
            score_classes = score_classes[
                scores > self.min_score_threshold].float()
            pred_bboxes = pred_bboxes[
                scores > self.min_score_threshold].float()
            scores = scores[scores > self.min_score_threshold].float()

            single_image_scores = (-1) * torch.ones(
                (self.max_detection_num, ), device=device)
            single_image_classes = (-1) * torch.ones(
                (self.max_detection_num, ), device=device)
            single_image_pred_bboxes = (-1) * torch.ones(
                (self.max_detection_num, 4), device=device)

            if scores.shape[0] != 0:
                scores, score_classes, pred_bboxes = self.nms(
                    scores, score_classes, pred_bboxes)

                sorted_keep_scores, sorted_keep_scores_indexes = torch.sort(
                    scores, descending=True)
                sorted_keep_classes = score_classes[sorted_keep_scores_indexes]
                sorted_keep_pred_bboxes = pred_bboxes[
                    sorted_keep_scores_indexes]

                final_detection_num = min(self.max_detection_num,
                                          sorted_keep_scores.shape[0])

                single_image_scores[
                    0:final_detection_num] = sorted_keep_scores[
                        0:final_detection_num]
                single_image_classes[
                    0:final_detection_num] = sorted_keep_classes[
                        0:final_detection_num]
                single_image_pred_bboxes[
                    0:final_detection_num, :] = sorted_keep_pred_bboxes[
                        0:final_detection_num, :]

            single_image_scores = single_image_scores.unsqueeze(0)
            single_image_classes = single_image_classes.unsqueeze(0)
            single_image_pred_bboxes = single_image_pred_bboxes.unsqueeze(0)

            batch_scores.append(single_image_scores)
            batch_classes.append(single_image_classes)
            batch_pred_bboxes.append(single_image_pred_bboxes)

        batch_scores = torch.cat(batch_scores, axis=0)
        batch_classes = torch.cat(batch_classes, axis=0)
        batch_pred_bboxes = torch.cat(batch_pred_bboxes, axis=0)

        # batch_scores shape:[batch_size,max_detection_num]
        # batch_classes shape:[batch_size,max_detection_num]
        # batch_pred_bboxes shape[batch_size,max_detection_num,4]
        return batch_scores, batch_classes, batch_pred_bboxes

    def nms(self, one_image_scores, one_image_classes, one_image_pred_bboxes):
        """
        one_image_scores:[anchor_nums],4:classification predict scores
        one_image_classes:[anchor_nums],class indexes for predict scores
        one_image_pred_bboxes:[anchor_nums,4],4:x_min,y_min,x_max,y_max
        """
        # Sort boxes
        sorted_one_image_scores, sorted_one_image_scores_indexes = torch.sort(
            one_image_scores, descending=True)
        sorted_one_image_classes = one_image_classes[
            sorted_one_image_scores_indexes]
        sorted_one_image_pred_bboxes = one_image_pred_bboxes[
            sorted_one_image_scores_indexes]
        sorted_pred_bboxes_w_h = sorted_one_image_pred_bboxes[:,
                                                              2:] - sorted_one_image_pred_bboxes[:, :
                                                                                                 2]

        sorted_pred_bboxes_areas = sorted_pred_bboxes_w_h[:,
                                                          0] * sorted_pred_bboxes_w_h[:,
                                                                                      1]
        detected_classes = torch.unique(sorted_one_image_classes, sorted=True)

        keep_scores, keep_classes, keep_pred_bboxes = [], [], []
        for detected_class in detected_classes:
            single_class_scores = sorted_one_image_scores[
                sorted_one_image_classes == detected_class]
            single_class_pred_bboxes = sorted_one_image_pred_bboxes[
                sorted_one_image_classes == detected_class]
            single_class_pred_bboxes_areas = sorted_pred_bboxes_areas[
                sorted_one_image_classes == detected_class]
            single_class = sorted_one_image_classes[sorted_one_image_classes ==
                                                    detected_class]

            single_keep_scores,single_keep_classes,single_keep_pred_bboxes=[],[],[]
            while single_class_scores.numel() > 0:
                top1_score, top1_class, top1_pred_bbox = single_class_scores[
                    0:1], single_class[0:1], single_class_pred_bboxes[0:1]

                single_keep_scores.append(top1_score)
                single_keep_classes.append(top1_class)
                single_keep_pred_bboxes.append(top1_pred_bbox)

                top1_areas = single_class_pred_bboxes_areas[0]

                if single_class_scores.numel() == 1:
                    break

                single_class_scores = single_class_scores[1:]
                single_class = single_class[1:]
                single_class_pred_bboxes = single_class_pred_bboxes[1:]
                single_class_pred_bboxes_areas = single_class_pred_bboxes_areas[
                    1:]

                overlap_area_top_left = torch.max(
                    single_class_pred_bboxes[:, :2], top1_pred_bbox[:, :2])
                overlap_area_bot_right = torch.min(
                    single_class_pred_bboxes[:, 2:], top1_pred_bbox[:, 2:])
                overlap_area_sizes = torch.clamp(overlap_area_bot_right -
                                                 overlap_area_top_left,
                                                 min=0)
                overlap_area = overlap_area_sizes[:, 0] * overlap_area_sizes[:,
                                                                             1]

                # compute union_area
                union_area = top1_areas + single_class_pred_bboxes_areas - overlap_area
                union_area = torch.clamp(union_area, min=1e-4)
                # compute ious for top1 pred_bbox and the other pred_bboxes
                ious = overlap_area / union_area

                single_class_scores = single_class_scores[
                    ious < self.nms_threshold]
                single_class = single_class[ious < self.nms_threshold]
                single_class_pred_bboxes = single_class_pred_bboxes[
                    ious < self.nms_threshold]
                single_class_pred_bboxes_areas = single_class_pred_bboxes_areas[
                    ious < self.nms_threshold]

            single_keep_scores = torch.cat(single_keep_scores, axis=0)
            single_keep_classes = torch.cat(single_keep_classes, axis=0)
            single_keep_pred_bboxes = torch.cat(single_keep_pred_bboxes,
                                                axis=0)

            keep_scores.append(single_keep_scores)
            keep_classes.append(single_keep_classes)
            keep_pred_bboxes.append(single_keep_pred_bboxes)

        keep_scores = torch.cat(keep_scores, axis=0)
        keep_classes = torch.cat(keep_classes, axis=0)
        keep_pred_bboxes = torch.cat(keep_pred_bboxes, axis=0)

        return keep_scores, keep_classes, keep_pred_bboxes

    def snap_tx_ty_tw_th_reg_heads_to_x1_y1_x2_y2_bboxes(
            self, reg_heads, anchors):
        """
        snap reg heads to pred bboxes
        reg_heads:[anchor_nums,4],4:[tx,ty,tw,th]
        anchors:[anchor_nums,4],4:[x_min,y_min,x_max,y_max]
        """
        anchors_wh = anchors[:, 2:] - anchors[:, :2]
        anchors_ctr = anchors[:, :2] + 0.5 * anchors_wh

        device = anchors.device
        factor = torch.tensor([[0.1, 0.1, 0.2, 0.2]]).to(device)

        reg_heads = reg_heads * factor

        pred_bboxes_wh = torch.exp(reg_heads[:, 2:]) * anchors_wh
        pred_bboxes_ctr = reg_heads[:, :2] * anchors_wh + anchors_ctr

        pred_bboxes_x_min_y_min = pred_bboxes_ctr - 0.5 * pred_bboxes_wh
        pred_bboxes_x_max_y_max = pred_bboxes_ctr + 0.5 * pred_bboxes_wh

        pred_bboxes = torch.cat(
            [pred_bboxes_x_min_y_min, pred_bboxes_x_max_y_max], axis=1)
        pred_bboxes = pred_bboxes.int()

        pred_bboxes[:, 0] = torch.clamp(pred_bboxes[:, 0], min=0)
        pred_bboxes[:, 1] = torch.clamp(pred_bboxes[:, 1], min=0)
        pred_bboxes[:, 2] = torch.clamp(pred_bboxes[:, 2],
                                        max=self.image_w - 1)
        pred_bboxes[:, 3] = torch.clamp(pred_bboxes[:, 3],
                                        max=self.image_h - 1)

        # pred bboxes shape:[anchor_nums,4]
        return pred_bboxes

这样decode解码部分就实现好了。

深度学习算法，该如何深入，举例说明 liyy614 深度学习
深度学习算法的深入学习可以从理论和实践两个方面进行。理论上，深入理解深度学习需要掌握数学基础（如线性代数、概率论、微积分）、机器学习基础和深度学习框架原理。实践上，可以通过实现和优化深度学习模型来提升技能。理论深入数学基础线性代数：理解向量、矩阵、特征值和特征向量等，对于理解神经网络的权重和偏置矩阵至关重要。概率论：用于理解模型的不确定性，如Dropout等正则化技术。微积分：理解梯度下降等优化算
python基础学习 agente python python 学习开发语言
第一章标识符1、python被称为胶水语言，可以跟各个代码能一块儿使用爬虫、数据分析web全栈开发、数据科学方向、人工智能的机械学习和深度学习、自动化运维、爬虫、办公自动化python是跨平台的，python是解释型语言，不需要编译，python是面向对象的语言1、print()#print()可以输出数字、字符串、含有运算符的表达式#print()可以将内容输出到显示器、文件#print()输出
机器学习和深度学习区别 hong161688 机器学习深度学习人工智能
机器学习和深度学习作为人工智能领域的两大重要分支，虽然有着紧密的联系，但在多个方面存在显著的差异。以下将从定义与起源、技术基础、模型复杂度、数据需求、计算资源需求、应用领域以及学习方式与特点等角度，详细阐述机器学习和深度学习的区别。一、定义与起源机器学习：是人工智能的一个分支，它让计算机能够在没有明确编程的情况下，通过观察和分析大量数据来学习并做出预测或决策。机器学习起源于20世纪50年代，随着算
深度学习算法在图算法中的应用（图卷积网络GCN和图自编码器GAE）大嘤三喵军团深度学习算法网络
深度学习算法在图算法中的应用1.图卷积网络（GraphConvolutionalNetworks,GCN）图卷积网络（GCN）是一种将卷积神经网络（ConvolutionalNeuralNetworks,CNN）推广到图结构数据的方法。GCN被广泛用于节点分类、图分类、链接预测等任务。优势和好处灵活性：GCN可以处理不规则和不均匀的数据结构，比如社交网络、分子结构、交通网络等。高效性：GCN使用局
深度学习速通系列:LoRA微调是什么 Ven% 深度学习速通系列人工智能深度学习 python 机器学习 nlp
LoRA微调（Low-RankAdaptation）是一种用于大型预训练语言模型（LLM）的高效微调技术。它的核心思想是在不改变预训练模型权重的前提下，通过在模型的Transformer层中引入可训练的低秩矩阵来实现模型的微调。这种方法可以显著减少训练参数的数量，从而降低对计算资源的需求。LoRA微调的原理：LoRA微调方法建议冻结预训练模型的权重，并在每个Transformer块中注入可训练的低
torch.stack()方法在数据集构造中的应用大多_C pytorch 人工智能 python
torch.stack()是PyTorch中用于将多个张量沿着新维度进行堆叠的操作。在你的代码中，e1_encodings和e2_encodings是从每个句子中提取的和的向量，形状为[hidden_size]。当我们对它们使用torch.stack()时，多个向量会堆叠成一个新的二维张量，形状为[num_sentences,hidden_size]，其中num_sentences是句子的数量。如
基于深度学习的基因组数据分析 SEU-WYL 深度学习dnn 深度学习数据分析人工智能
基于深度学习的基因组数据分析利用深度学习技术来处理和分析基因组数据，帮助解决基因组学领域中一些复杂且具有挑战性的问题。这种方法已经在疾病预测、基因功能预测、变异检测、基因表达调控分析、个性化医疗等方面取得了显著进展。1.基因组数据分析的核心挑战基因组数据分析涉及以下主要挑战：高维数据与稀疏性：基因组数据通常包括数百万到数十亿个碱基对，数据维度非常高。同时，许多基因变异事件是稀有的，这种稀疏性使得数
Pytorch维度转换操作：view，reshape，permute，flatten函数详解 ghx3110 深度学习笔记 pytorch 维度转换操作
引言Pytorch中常见的维度转换函数有view,reshape,permute,flatten。本文将详细介绍这几个函数的作用与使用方式，并给出了具体的代码示例，希望能够帮助大家。常见的维度有四维：比如（batch,channel,height,width）；三维：比如（b,n,c）；二维：比如（h,w）。下面介绍如何使用上述函数进行维度之间的转换。1.view函数作用tensor.view()
基于深度学习的信号滤波：创新技术与应用挑战逼子歌深度学习神经网络信号滤波图像去噪卷积神经网络长短期记忆网络
一、引言1.1研究背景随着科技的不断发展，信号处理领域面临着越来越复杂的挑战。在众多信号处理技术中，基于深度学习的信号滤波技术逐渐崭露头角，成为研究的热点。基于深度学习的信号滤波在信号处理领域具有至关重要的地位。如今，我们生活在一个数据爆炸的时代，各种信号源不断产生大量的复杂数据。例如，在通信领域，信号常常受到噪声干扰，传统的滤波方法在处理复杂、非线性信号时可能效果不佳。而深度学习技术具有自动特征
pytorch计算网络参数量和Flops Mr_Lowbee PyTorch pytorch 深度学习人工智能
fromtorchsummaryimportsummarysummary(net,input_size=(3,256,256),batch_size=-1)输出的参数是除以一百万（/1000000）M，fromfvcore.nnimportFlopCountAnalysisinputs=torch.randn(1,3,256,256).cuda()flop_counter=FlopCountAna
Pytorch ResNet Fashion-Mnist hyhchaos
pytorch实现ResNetonFashion-MNISTfrom__future__importprint_functionimporttorchimporttimeimporttorch.nnasnnimporttorch.nn.functionalasFimporttorchvisionimporttorchvision.transformsastransformsfromtorchimp
ONNXRuntime与CUDA版本对应 zy_destiny 部署 YOLO onnxruntime onnX 部署 cuda python
onnxruntime-gpu版本可以说是一个非常简单易用的框架，因为通常用pytorch训练的模型，在部署时，会首先转换成onnx，而onnxruntime和onnx又是有着同一个爸爸，无疑，在op的支持上肯定是最好的。通常在安装onnxruntime时，需要将其版本与pytorch版本和CUDA版本进行对应，其中ONNXRuntime与CUDA版本对应关系表如下表所示。ONNXRuntimeC
【已解决】torch包下载缓慢烟花节已解决 pytorch 深度学习 python pip
方法一直接将PyTorch安装指引中的https://download.pytorch.org/whl替换为https://mirrors.aliyun.com/pytorch-wheels即可。这个是阿里云的镜像，如pip3installtorchtorchvisiontorchaudio--index-urlhttps://download.pytorch.org/whl/cu121改为这个，
MoveNet: PyTorch实现的轻量级人体姿态估计框架侯深业Dorian
MoveNet:PyTorch实现的轻量级人体姿态估计框架movenet.pytorch项目地址:https://gitcode.com/gh_mirrors/mo/movenet.pytorchMoveNet是一个基于PyTorch的人体姿态估计算法实现，由开发者fire717贡献至GitCode平台。该项目旨在提供一个高效、易用的解决方案，用于实时处理视频或图像中的人体动作识别。通过其强大的性
MoveNet PyTorch 项目教程邢琛高
MoveNetPyTorch项目教程movenet.pytorch项目地址:https://gitcode.com/gh_mirrors/mo/movenet.pytorch项目介绍MoveNet是一个超快速且精确的模型，用于检测人体的17个关键点。本项目是GoogleMoveNet的PyTorch实现，包含了训练代码和预训练模型。Google最近发布了预训练模型（tfjs或tflite），但这些
pytorh基础知识和函数的学习：torchvision.transforms() 深蓝海拓机器视觉和人工智能学习学习 pytorch
transforms是PyTorch的torchvision库中用于图像处理的一个模块。它提供了一组工具，用于在图像数据集上进行常见的预处理和数据增强操作，以便更好地训练深度学习模型。以下是一些常用的torchvision.transforms转换：基础图像转换：transforms.ToTensor():将PIL图像或NumPy数组转换为PyTorch的张量，并将像素值范围从[0,255]缩放到
使用PyTorch实现的DeepSpeech模型: 强大的语音识别利器毕艾琳
使用PyTorch实现的DeepSpeech模型:强大的语音识别利器deepspeech.pytorchSpeechRecognitionusingDeepSpeech2.项目地址:https://gitcode.com/gh_mirrors/de/deepspeech.pytorch在今天的数字化世界中，语音识别技术已成为人机交互的关键组成部分。deepspeech.pytorch是一个由Sea
深度学习驱动下的字符识别：挑战与创新逼子歌神经网络深度学习字符识别卷积神经网络图像处理特征提取
一、引言1.1研究背景深度学习在字符识别领域具有至关重要的地位。随着信息技术的飞速发展，对字符识别的准确性和效率要求越来越高。字符识别作为计算机视觉领域的一个重要研究方向，其主要目的是将各种形式的字符转换成计算机可识别的文本信息。近年来，深度学习技术在字符识别领域取得了显著的进展。国内研究者主要使用基于模板匹配的方法、基于统计模型的方法、基于神经网络的方法等各种方法进行字符识别研究。目前，国内各大
Adam优化器：深度学习中的自适应方法 2401_85743969 深度学习人工智能
引言在深度学习领域，优化算法是训练神经网络的核心组件之一。Adam（AdaptiveMomentEstimation）优化器因其自适应学习率调整能力而受到广泛关注。本文将详细介绍Adam优化器的工作原理、实现机制以及与其他优化器相比的优势。深度学习优化器概述优化器在深度学习中负责调整模型的参数，以最小化损失函数。常见的优化器包括SGD（随机梯度下降）、RMSprop、AdaGrad、AdaDelt
【pytorch】register_buffer的使用 Aha_aho pytorch 人工智能 python
这篇文章讲解很清晰，以下内容仅做补充，探讨哪些对象需要手动注册，哪些会自动注册。在PyTorch中，哪些对象会自动注册为模型的一部分取决于它们的类型以及你如何定义它们。下面列出不需要手动注册、会自动注册的几种情况：1.nn.Parameter自动注册：任何你在nn.Module中定义为nn.Parameter的张量都会自动注册为模型的参数。它们会被视为模型的可训练参数，并且会被包含在模型的stat
研1日记5 qq_55033799 人工智能深度学习
x=torch.tensor(x),numpy转tensor三维矩阵相加screen-Spid进入之前创建好的screentranspose()只能一次操作两个维度；permute()可以一次操作多维数据，且必须传入所有维度数，transpose()中的dim没有数的大小区分；permute()中的dim有数的大小区分PyTorch两大转置函数transpose()和permute(),以及Run
旧版中 pytorch.rfft 函数与新版 pytorch.fft.rfft 函数对应修改问题带鱼的鱼香肉丝 pytorch Python pytorch python fft
旧版中pytorch.rfft函数与新版pytorch.fft.rfft函数对应修改问题前言一、旧版pytorch.rfft()函数解释二、新版pytorch.fft.rfft()函数解释三、总结前言这两天整理谱池化操作，需要用到傅里叶变换这个函数。后来提升了pytorch的版本以后，发现之前的torch.rfft()函数在新版的pytorch中使用会报错，后来查阅资料，发现是新版的参数有些变动。
【深度学习实战】行人检测追踪与双向流量计数系统【python源码+Pyqt5界面+数据集+训练代码】YOLOv8、ByteTrack、目标追踪、双向计数、行人检测追踪、过线计数阿_旭 AI应用软件开发实战深度学习实战深度学习 python 行人检测行人追踪过线计数
《博主简介》小伙伴们好，我是阿旭。专注于人工智能、AIGC、python、计算机视觉相关分享研究。✌更多学习资源，可关注公-仲-hao:【阿旭算法与机器学习】，共同学习交流~感谢小伙伴们点赞、关注！《------往期经典推荐------》一、AI应用软件开发实战专栏【链接】项目名称项目名称1.【人脸识别与管理系统开发】2.【车牌识别与自动收费管理系统开发】3.【手势识别系统开发】4.【人脸面部活体
【PyTorch】常用网络层layers总结遥感小萌新 python 深度学习 pytorch 人工智能 python 深度学习
文章目录前言一、ConvolutionLayers二、PoolingLayers三、PaddingLayers总结前言PyTorch中网络搭建主要是通过调用layers实现的，这篇文章总结了putorch中最常用的几个网络层接口及其参数。一、ConvolutionLayerspytorch官方文档介绍了众多卷积层算法，以最新的pytorch2.4为例，针对处理的数据维度不同，有如下卷积层layer
【激活函数总结】Pytorch中的激活函数详解： ReLU、Leaky ReLU、Sigmoid、Tanh 以及 Softmax 阿_旭深度学习知识点 pytorch 人工智能 python 激活函数深度学习
《博主简介》小伙伴们好，我是阿旭。专注于人工智能、AIGC、python、计算机视觉相关分享研究。感谢小伙伴们点赞、关注！《------往期经典推荐------》一、AI应用软件开发实战专栏【链接】项目名称项目名称1.【人脸识别与管理系统开发】2.【车牌识别与自动收费管理系统开发】3.【手势识别系统开发】4.【人脸面部活体检测系统开发】5.【图片风格快速迁移软件开发】6.【人脸表表情识别系统】7.
【PyTorch】使用容器(Containers)进行网络层管理(Module) 遥感小萌新深度学习 python pytorch 人工智能 python 深度学习
文章目录前言一、Sequential二、ModuleList三、ModuleDict四、ParameterList&ParameterDict总结前言当深度学习模型逐渐变得复杂，在编写代码时便会遇到诸多麻烦，此时便需要Containers的帮助。Containers的作用是将一部分网络层模块化，从而更方便地管理和调用。本文介绍PyTorch库常用的nn.Sequential，nn.ModuleLi
利用 pip 安装 PyTorch 问题记录 yyywxk #软件安装使用问题 pip pytorch 人工智能 python
目录1.pytorch源下载安装缓慢2.ERROR:HTTPerror4031.pytorch源下载安装缓慢利用pip安装pytorch时，官方源下载速度十分缓慢：pipinstalltorch==2.1.1torchvision==0.16.1torchaudio==2.1.1--index-urlhttps://download.pytorch.org/whl/cu118可以考虑换源：pipi
pytorch训练后pt模型中保存内容详解(yolov8n.pt为例) yueguang8 yolo算法 pytorch YOLO 人工智能
在PyTorch中，.pt模型文件通常包含以下几类数据：模型参数：存储模型的权重和偏置参数。优化器状态：包含优化器的状态信息，以便在恢复训练时能够从中断的地方继续。训练状态：一些训练过程中的信息，例如当前的epoch数和训练进度。其他元数据：包括模型的配置、训练时使用的超参数等。在讲解pytorchpt(pth)文件中保存了什么内容之前，需要先了解pt在保存时保存了那些参数。以YOLO系列pt保存
深度学习之sigmoid函数介绍 yueguang8 人工智能深度学习人工智能
1.基本概念Sigmoid函数，也称为Logistic函数，是一种常用的数学函数，其数学表达式为：其中，e是自然对数的底数，Zj是输入变量。Sigmoid函数曲线如下所示：计算示例：原始输出结果Zj：[-0.6,1.4,2.5]使用Sigmoid函数后输出为：[0.35,0.8,0.92]2.Sigmoid函数特点Sigmoid函数具有以下特点：值域限定在(0,1)之间：Sigmoid函数的输出范
Deep learning for Computer Vision with Python（1）从零开始入门计算机视觉 Hazelyu27 计算机视觉大数据计算机视觉深度学习
本书的内容分成三个部分：1.初始阶段初始阶段学习：机器学习、神经网络、卷积神经网络、建立数据集。2.实践阶段实践阶段：深入学习深度学习，理解先进技术，发现最佳实践方式。3.图像网络阶段完成计算机视觉领域的经验积累。使用大规模数据集和真实图片案例作为数据集，包括年龄和性别预测，交通工具模型识别。本书提供了对应网站：http://pyimg.co/fnkxk本文介绍前两章内容：基本介绍和深度学习简介。
ASM系列六利用TreeApi 添加和移除类成员 lijingyao8206 jvm 动态代理 ASM 字节码技术 TreeAPI
同生成的做法一样，添加和移除类成员只要去修改fields和methods中的元素即可。这里我们拿一个简单的类做例子，下面这个Task类，我们来移除isNeedRemove方法，并且添加一个int 类型的addedField属性。 package asm.core; /** * Created by yunshen.ljy on 2015/6/
Springmvc-权限设计 bee1314 spring Web jsp
万丈高楼平地起。权限管理对于管理系统而言已经是标配中的标配了吧，对于我等俗人更是不能免俗。同时就目前的项目状况而言，我们还不需要那么高大上的开源的解决方案，如Spring Security，Shiro。小伙伴一致决定我们还是从基本的功能迭代起来吧。目标： 1.实现权限的管理（CRUD） 2.实现部门管理（CRUD) 3.实现人员的管理（CRUD） 4.实现部门和权限
算法竞赛入门经典（第二版）第2章习题 CrazyMizzz c 算法
2.4.1 输出技巧 #include <stdio.h> int main() { int i, n; scanf("%d", &n); for (i = 1; i <= n; i++) printf("%d\n", i); return 0; } 习题2-2 水仙花数(daffodil
struts2中jsp自动跳转到Action 麦田的设计者 jsp webxml struts2 自动跳转
1、在struts2的开发中，经常需要用户点击网页后就直接跳转到一个Action，执行Action里面的方法，利用mvc分层思想执行相应操作在界面上得到动态数据。毕竟用户不可能在地址栏里输入一个Action（不是专业人士） 2、＜jsp:forward page="xxx.action" /＞，这个标签可以实现跳转，page的路径是相对地址,不同与jsp和j
php 操作webservice实例 IT独行者 PHP webservice
首先大家要简单了解了何谓webservice，接下来就做两个非常简单的例子，webservice还是逃不开server端与client端。我测试的环境为：apache2.2.11 php5.2.10做这个测试之前，要确认你的php配置文件中已经将soap扩展打开，即extension=php_soap.dll; OK 现在我们来体验webservice //server端 serve
Windows下使用Vagrant安装linux系统 _wy_ windows vagrant
准备工作：下载安装 VirtualBox ：https://www.virtualbox.org/ 下载安装 Vagrant ：http://www.vagrantup.com/ 下载需要使用的 box ：官方提供的范例：http://files.vagrantup.com/precise32.box 还可以在 http://www.vagrantbox.es/
更改linux的文件拥有者及用户组(chown和chgrp) 无量 c linux chgrp chown
本文（转） http://blog.163.com/yanenshun@126/blog/static/128388169201203011157308/ http://ydlmlh.iteye.com/blog/1435157 一、基本使用：使用chown命令可以修改文件或目录所属的用户：命令
linux下抓包工具矮蛋蛋 linux
原文地址： http://blog.chinaunix.net/uid-23670869-id-2610683.html tcpdump -nn -vv -X udp port 8888 上面命令是抓取udp包、端口为8888 netstat -tln 命令是用来查看linux的端口使用情况 13 . 列出所有的网络连接 lsof -i 14. 列出所有tcp 网络连接信息 l
我觉得mybatis是垃圾！：“每一个用mybatis的男纸，你伤不起” alafqq mybatis
最近看了每一个用mybatis的男纸，你伤不起原文地址：http://www.iteye.com/topic/1073938 发表一下个人看法。欢迎大神拍砖；个人一直使用的是Ibatis框架，公司对其进行过小小的改良；最近换了公司，要使用新的框架。听说mybatis不错；就对其进行了部分的研究；发现多了一个mapper层；个人感觉就是个dao；
解决java数据交换之谜百合不是茶数据交换
交换两个数字的方法有以下三种，其中第一种最常用 /* 输出最小的一个数 */ public class jiaohuan1 { public static void main(String[] args) { int a =4; int b = 3; if(a<b){ // 第一种交换方式 int tmep =
渐变显示 bijian1013 JavaScript
<style type="text/css"> #wxf { FILTER: progid:DXImageTransform.Microsoft.Gradient(GradientType=0, StartColorStr=#ffffff, EndColorStr=#97FF98); height: 25px; } </style>
探索JUnit4扩展：断言语法assertThat bijian1013 java 单元测试 assertThat
一.概述 JUnit 设计的目的就是有效地抓住编程人员写代码的意图，然后快速检查他们的代码是否与他们的意图相匹配。 JUnit 发展至今，版本不停的翻新，但是所有版本都一致致力于解决一个问题，那就是如何发现编程人员的代码意图，并且如何使得编程人员更加容易地表达他们的代码意图。JUnit 4.4 也是为了如何能够
【Gson三】Gson解析{"data":{"IM":["MSN","QQ","Gtalk"]}} bit1129 gson
如何把如下简单的JSON字符串反序列化为Java的POJO对象? {"data":{"IM":["MSN","QQ","Gtalk"]}} 下面的POJO类Model无法完成正确的解析： import com.google.gson.Gson;
【Kafka九】Kafka High Level API vs. Low Level API bit1129 kafka
1. Kafka提供了两种Consumer API High Level Consumer API Low Level Consumer API(Kafka诡异的称之为Simple Consumer API，实际上非常复杂) 在选用哪种Consumer API时，首先要弄清楚这两种API的工作原理，能做什么不能做什么，能做的话怎么做的以及用的时候，有哪些可能的问题
在nginx中集成lua脚本：添加自定义Http头，封IP等 ronin47 nginx lua
Lua是一个可以嵌入到Nginx配置文件中的动态脚本语言，从而可以在Nginx请求处理的任何阶段执行各种Lua代码。刚开始我们只是用Lua 把请求路由到后端服务器，但是它对我们架构的作用超出了我们的预期。下面就讲讲我们所做的工作。强制搜索引擎只索引mixlr.com Google把子域名当作完全独立的网站，我们不希望爬虫抓取子域名的页面，降低我们的Page rank。 location /{
java-归并排序 bylijinnan java
import java.util.Arrays; public class MergeSort { public static void main(String[] args) { int[] a={20,1,3,8,5,9,4,25}; mergeSort(a,0,a.length-1); System.out.println(Arrays.to
Netty源码学习-CompositeChannelBuffer bylijinnan java netty
CompositeChannelBuffer体现了Netty的“Transparent Zero Copy” 查看API（ http://docs.jboss.org/netty/3.2/api/org/jboss/netty/buffer/package-summary.html#package_description）可以看到，所谓“Transparent Zero Copy”是通
Android中给Activity添加返回键 hotsunshine Activity
// this need android:minSdkVersion="11" getActionBar().setDisplayHomeAsUpEnabled(true); @Override public boolean onOptionsItemSelected(MenuItem item) {
静态页面传参 ctrain 静态
$(document).ready(function () { var request = { QueryString : function (val) { var uri = window.location.search; var re = new RegExp("" + val + "=([^&?]*)", &
Windows中查找某个目录下的所有文件中包含某个字符串的命令 daizj windows 查找某个目录下的所有文件包含某个字符串
findstr可以完成这个工作。 [html] view plain copy >findstr /s /i "string" *.* 上面的命令表示，当前目录以及当前目录的所有子目录下的所有文件中查找"string&qu
改善程序代码质量的一些技巧 dcj3sjt126com 编程 PHP 重构
有很多理由都能说明为什么我们应该写出清晰、可读性好的程序。最重要的一点，程序你只写一次，但以后会无数次的阅读。当你第二天回头来看你的代码时，你就要开始阅读它了。当你把代码拿给其他人看时，他必须阅读你的代码。因此，在编写时多花一点时间，你会在阅读它时节省大量的时间。让我们看一些基本的编程技巧：尽量保持方法简短尽管很多人都遵
SharedPreferences对数据的存储 dcj3sjt126com
SharedPreferences简介： &nbs
linux复习笔记之bash shell (2) bash基础 eksliang bash bash shell
转载请出自出处： http://eksliang.iteye.com/blog/2104329 1.影响显示结果的语系变量（locale） 1.1locale这个命令就是查看当前系统支持多少种语系，命令使用如下： [root@localhost shell]# locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8"
Android零碎知识总结 gqdy365 android
1、CopyOnWriteArrayList add(E) 和remove(int index)都是对新的数组进行修改和新增。所以在多线程操作时不会出现java.util.ConcurrentModificationException错误。所以最后得出结论：CopyOnWriteArrayList适合使用在读操作远远大于写操作的场景里，比如缓存。发生修改时候做copy，新老版本分离，保证读的高
HoverTree.Model.ArticleSelect类的作用 hvt Web .net C#hovertree asp.net
ArticleSelect类在命名空间HoverTree.Model中可以认为是文章查询条件类，用于存放查询文章时的条件，例如HvtId就是文章的id。HvtIsShow就是文章的显示属性，当为-1是，该条件不产生作用，当为0时，查询不公开显示的文章，当为1时查询公开显示的文章。HvtIsHome则为是否在首页显示。HoverTree系统源码完全开放，开发环境为Visual Studio 2013
PHP 判断是否使用代理 PHP Proxy Detector 天梯梦 proxy
1. php 类 I found this class looking for something else actually but I remembered I needed some while ago something similar and I never found one. I'm sure it will help a lot of developers who try to
apache的math库中的回归——regression（翻译） lvdccyb Math apache
这个Math库，虽然不向weka那样专业的ML库，但是用户友好，易用。多元线性回归，协方差和相关性（皮尔逊和斯皮尔曼），分布测试（假设检验，t，卡方，G），统计。数学库中还包含，Cholesky，LU，SVD，QR，特征根分解，真不错。基本覆盖了：线代，统计，矩阵，最优化理论曲线拟合常微分方程遗传算法（GA），还有3维的运算。。。
基础数据结构和算法十三：Undirected Graphs (2) sunwinner Algorithm
Design pattern for graph processing. Since we consider a large number of graph-processing algorithms, our initial design goal is to decouple our implementations from the graph representation
云计算平台最重要的五项技术 sumapp 云计算云平台智城云
云计算平台最重要的五项技术 1、云服务器云服务器提供简单高效，处理能力可弹性伸缩的计算服务，支持国内领先的云计算技术和大规模分布存储技术，使您的系统更稳定、数据更安全、传输更快速、部署更灵活。特性机型丰富通过高性能服务器虚拟化为云服务器，提供丰富配置类型虚拟机，极大简化数据存储、数据库搭建、web服务器搭建等工作；仅需要几分钟，根据CP
《京东技术解密》有奖试读获奖名单公布 ITeye管理员活动
ITeye携手博文视点举办的12月技术图书有奖试读活动已圆满结束，非常感谢广大用户对本次活动的关注与参与。 12月试读活动回顾： http://webmaster.iteye.com/blog/2164754 本次技术图书试读活动获奖名单及相应作品如下：一等奖（两名） Microhardest：http://microhardest.ite

【庖丁解牛】从零实现RetinaNet（五）：回归预测转换、NMS后处理、decode解码

文章目录

回归预测转换

NMS后处理

decode解码

你可能感兴趣的:(深度学习,pytorch)