星 ♚ 城

小样本目标检测论文翻译总结 2021 CVPR Dense Relation Distillation with Context-aware Aggregation

Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection

Abstract

Conventional deep learning based methods for object detection require a large amount of bounding box annotations for training, which is expensive to obtain such high quality annotated data. Few-shot object detection, which learns to adapt to novel classes with only a few annotated examples, is very challenging since the fine-grained feature of novel object can be easily overlooked with only a few data available. In this work, aiming to fully exploit features of annotated novel object and capture fine-grained features of query object, we propose Dense Relation Distillation with Context-aware Aggregation (DCNet) to tackle the few-shot detection problem. Built on the meta-learning based framework, Dense Relation Distillation module targets at fully exploiting support features, where support features and query feature are densely matched, covering all spatial locations in a feed-forward fashion. The abundant usage of the guidance information endows model the capability to handle common challenges such as appearance changes and occlusions. Moreover, to better capture scale-aware features, Context-aware Aggregation module adaptively harnesses features from different scales for a more comprehensive feature representation. Extensive experiments illustrate that our proposed approach achieves state-of-the-art results on PASCAL VOC and MS COCO datasets. Code will be made available at https://github.com/hzhupku/DCNet.

传统的基于深度学习的目标检测方法需要大量的边界框注释来进行训练，而要获得这种高质量的注释数据是很昂贵的。小样本目标检测，在只有少数注释实例的情况下学习适应新的类别，是非常具有挑战性的，因为只有少数数据可用，新物体的细粒度特征很容易被忽略。在这项工作中，我们提出了带有上下文感知聚合的密集关系蒸馏法（DCNet）来解决小样本检测的问题，目的是为了充分利用注释过的新对象的特征并捕捉查询对象的细粒度特征。密集关系蒸馏模块建立在基于元学习的框架之上，旨在充分利用支持特征，支持特征与查询特征密集匹配，以前馈的方式覆盖所有空间位置。指导信息的大量使用使模型有能力处理常见的挑战，如外观变化和遮挡。此外，为了更好地捕捉尺度感知的特征，上下文感知的聚合模块自适应地利用不同尺度的特征，以获得更全面的特征表示。大量的实验表明，我们提出的方法在PASCAL VOC和MS COCO数据集上取得了最先进的结果。代码将在https://github.com/hzhupku/DCNet.

Word
a large amount of: 大量的
is expensive to: 是昂贵的
adapt to: 适应于
overlooked: v. 忽略, 俯视, 漏看, 假装不见
Built on: 建立在, 构建在, 建在, 建于
the meta-learning based framework: 基于元学习的框架
target at: 目标在
exploit: v. 开发, 利用, 剥削, 利用…谋私利 n. 功绩
in a feed-forward fashion: 以前馈的方式以前馈方式, 前馈式的, 前馈式
abundant: adj. 大量的, 丰盛的, 充裕的
guidance information: 指导信息
endows: 赋予, 赠与, 授予, 捐赠
handle common challenges: 处理常见的挑战, 处理共同的挑战, 处理常见挑战, 处理常见的难题
Moreover: adv. 此外, 而且
Extensive experiments illustrate that: 大量实验表明
state-of-the-art: 最先进的, 最先进的技术, 先进的, 先进的技术

1. Introduction

With the success of deep convolutional neural works, object detection has made great progress these years [20, 23, 8]. The success of deep CNNs, however, heavily relies on large-scale datasets such as ImageNet [2] that enable the training of deep models. When the labeled data becomes scarce, CNNs can severely overfit and fail to generalize. While in contrast, human beings have exhibited strong performance in learning a new concept with only a few examples available. Since some object categories naturally have scarce examples or bounding box annotations are laborsome to obtain such as medical data. These problems have triggered increasing attentions to deal with learning models with limited examples. Few-shot learning aims to train models to generalize well with a few examples provided. However, most existing few-shot learning works focus on image classification [29, 26, 27] problem and only a few focus on few-shot object detection problem. Since object detection not only requires class prediction, but also demands localization of the object, making it much more difficult than few-shot classification task.
随着深度卷积神经工作的成功，这些年目标检测取得了很大的进展[20, 23, 8]. 然而，深度CNN的成功在很大程度上依赖于大规模的数据集，如ImageNet[2]，这些数据集能够对深度模型进行训练。当标记的数据变得稀少时，CNN就会严重地过度拟合，无法进行泛化。而与此相反，人类在学习一个新的概念时，在只有几个例子的情况下表现出了强大的性能。由于一些物体类别的例子很稀少，或者边界框注释的获取很费力，如医疗数据。这些问题引发了人们对处理有限实例的学习模型的日益关注。小样本学习的目的是训练模型，使其在提供少量例子的情况下就能很好地进行泛化。然而，大多数现有的小样本学习工作侧重于图像分类[29, 26, 27]问题，只有少数工作侧重于少量物体检测问题。由于物体检测不仅需要预测类别，而且还需要对物体进行定位，这使得它比几张照片的分类任务要困难得多。

Word
however: adv. 然而, 不过, 无论如何, 不管怎样 conj. 不管用什么方法, 〈古〉虽然
heavily relies on: 严重依赖
large-scale datasets: 大规模数据集
labeled data: 标记数据
scarce: adv. 勉强, 刚, 几乎不, 简直不 adj. 缺乏的, 不足的, 稀少的
fail to: v. 未能, 使…失望
generalize: v. 归纳, 概括, 笼统地讲, 概括地谈论
While in contrast: 而相反
exhibited strong performance: 表现强劲
laborsome to obtain: 很难获得
triggered : v. 触发, 扳动扳机射击, 松开扳柄 n. (枪上的)扳机, 【物】触发器, 【电】起动线路, 起动装置
These problems have triggered increasing attentions to deal with learning models with limited examples.
这些问题引发了越来越多的关注，以处理具有有限示例的学习模型。
aims to：旨在；计划做
a few examples：几个例子
focus on：专注于

Prior studies in few-shot object detection mainly consist of two groups. Most of them [13, 35, 34] adopt a meta learning [5] based framework to perform feature reweighting for a class-specific prediction. While Wang et al. [31] adopt a two-stage fine-tuning approach with only finetuning the last layer of detectors and achieve state-of-the-art performance. Wu et al. [33] also use similar strategy and focus on the scale variation problem in few-shot detection.

之前的小样本目标检测的研究主要包括两组。其中大多数[13, 35, 34]采用了基于元学习[5]的框架，对特定类别的预测进行特征重配。而Wang等人[31]采用两阶段微调方法，只对最后一层检测器进行微调，并取得最先进的性能。Wu等人[33]也采用了类似的策略，并专注于小样本检测中的尺度变化问题。

Word
Most of them：他们中的大多数
adopt ：采用通过, 通过了, 采用了
focus on：专注于
the scale variation problem：尺度变化问题

However, aforementioned methods often suffer from several drawbacks due to the challenging nature of few shot object detection. Firstly, relations between support features and query feature are hardly fully explored in previous few-shot detection works, where global pooling operation on support features is mostly adopted to modulate the query branch, which is prone to loss of detailed local context. Specifically, appearance changes and occlusions are common for objects, as shown Fig. 1. Without enough discriminative information provided, the model is obstructed from learning critical features for class and bounding box predictions. Secondly, although scale variation problem has been widely studied in prior works [17, 15, 33], it remains a serious obstacle in few-shot detection tasks. Under few shot settings, feature extractor with scale-aware modifications is inclined to overfitting, leading to a deteriorated performance for both base and novel classes.

然而，由于小样本目标检测的挑战性性质，上述方法通常存在几个缺点。首先，在之前的小样本检测工作中几乎没有充分探索支持特征和查询特征之间的关系，其中主要采用支持特征的全局池化操作来调制查询分支，这很容易丢失详细的局部上下文。具体来说，外观变化和遮挡是常见的物体，如图 1 所示。如果没有提供足够的判别信息，模型就会受到阻碍从学习类和边界框预测的关键特征。其次，尽管尺度变化问题在之前的工作中得到了广泛的研究 [17, 15, 33]，但它仍然是小样本检测任务中的一个严重障碍。在小样本设置下，具有尺度感知修改的特征提取器倾向于过度拟合，导致基类和新类的性能下降。

Word
aforementioned ：adj. 前面提到的, 上述的
suffer from ：遭受
drawbacks ：缺点
due to：由于原因是, 因为, 因
hardly ：副词: 毫不, 丝毫, 丝, 几乎不, 简直不
fully explored：充分探索
modulate ：调制，调控, 调变, 调解
is prone to：容易出现
is obstructed from：受阻于
it remains a serious obstacle：它仍然是一个严重的障碍
modifications ：n. 修改, 修正, 改进, 缓和
is inclined to：倾向于
leading to：导致

In order to alleviate the above issues, we first propose the dense relation distillation module to fully exploit support set. Given a query image and a few support images from novel classes, the shared feature learner extracts query feature and support features for subsequent matching procedure. Intuitively, the criteria that determines whether query object and support object belong to the same category mainly measures how much feature similarity they share in common. When appearance changes or occlusions occur, local detailed features are dominant for matching candidate objects and template ones. Hence, instead of obtaining global representations of support set, we propose a dense relation distillation mechanism where query and support features are matched in a pixel-wise level. Specifically, key and value maps are produced from features, which serve as encoding visual semantics for matching and containing detailed appearance information for decoding respectively. With local information of support set effectively retrieved for guidance, the performance can be significantly boosted, especially in extremely low-shot scenarios.

为了缓解上述问题，我们首先提出了密集关系蒸馏模块来充分利用支持集。给定一个查询图像和一些来自新类的支持图像，共享特征学习器提取查询特征和支持特征用于后续匹配过程。直观地说，决定是否的标准查询对象和支持对象属于同一类别，主要衡量它们共有多少特征相似度。当出现外观变化或遮挡时，局部细节特征对于匹配候选目标和模板目标起主导作用。因此，我们没有获得支持集的全局表示，而是提出了一种密集关系蒸馏机制，其中查询和支持特征在像素级别进行匹配。具体来说，键和值图是由特征产生的，它们分别作为用于匹配的编码视觉语义和包含用于解码的详细外观信息。通过有效检索支持集的细节信息以进行指导，可以显着提高性能，尤其是在小样本场景中。

Word
In order to：为了为了使, 为了让, 为
alleviate ：v. 减轻, 缓解, 缓和
propose ：建议，提议, 提出, 提出建议
subsequent ：adj. 随后的, 后来的, 之后的, 接后的
Intuitively：直观地说，直观地讲, 直观地看, 直观上
criteria ：n. 标准, 尺度
in common：共同点
are dominant for：占主导地位
Hence：因此
mechanism ：n. 机制, 机械装置, 方法, 机件
Specifically：adv. 具体来说, 明确地, 具体地, 特意
instead of：prep. 代替, 作为…的替换
respectively：adv. 分别, 各自, 顺序为, 依次为
significantly boosted：显著提高，显著提升, 显著提高了, 显著提升了
especially ：adv. 尤其, 特别, 专门, 非常

Furthermore, for the purpose of mitigating the scale variation problem, we design the context-aware feature aggregation module to capture essential cues for different scales during RoI pooling. Since directly modifying feature extractor could result in overfitting, we choose to perform adjustment from a more flexible perspective. Recognition of objects with different scales requires different levels of contextual information, while the fixed pooling resolution may bring about loss of substantial context information. Hence, an adaptive aggregation mechanism that allocates specific attention to local and global features simultaneously could help preserve contextual information for different scales of objects. Therefore, instead of performing RoI pooling with one fixed resolution, we choose three different pooling resolutions to capture richer context features. Then an attention mechanism is introduced to adaptively aggregate output features to present a more comprehensive representation.

此外，为了减轻尺度变化问题，我们设计了上下文感知特征聚合模块，以在 RoI 池化期间捕获不同尺度的基本线索。由于直接修改特征提取器可能会导致过拟合，因此我们选择从更灵活的角度进行调整。不同尺度物体的识别需要不同层次的上下文信息，而固定的池化分辨率可能会导致大量上下文信息的丢失。因此，同时将特定注意力分配给局部和全局特征的自适应聚合机制可以帮助保留不同尺度对象的上下文信息。因此，我们选择三种不同的池化分辨率来捕获更丰富的上下文特征，而不是使用一种固定分辨率执行 RoI 池化。然后引入注意力机制来自适应地聚合输出特征以呈现更全面的表示。

Word
Furthermore：adv. 此外, 而且, 再者
mitigating ：v. 缓解, 减轻, 缓和, 平息 adj. 可考虑从轻处置的情节（或因素）
capture essential cues：捕捉重要线索
perspective：n. 观点, 远景, 景观, 透视法 adj. (按照)透视画法的, 透视的
allocates ：v. 配置, 部署, 分派, 划拨(经费等)
simultaneously ：adv. 同时, 联立, 急切地
richer ：更丰富，更为丰富, 更加丰富, 更丰富的
aggregate ：n. 骨料, 合计, 总数 v. 合计, 总计 adj. 总数的, 总计的

The contributions of this paper can be summarized as follows:

We propose a dense relation distillation module for few-shot detection problem, which targets at fully exploiting support information to assist the detection process for objects from novel classes.
We propose an adaptive context-aware feature aggregation module to better capture global and local features to alleviate scale variation problem, boosting the performance of few-shot detection.
Extensive experiments illustrate that our approach has achieved a consistent improvement on PASCAL VOC
and MS COCO datasets. Specially, our approach achieves better performance than the state-of-the-art methods on the two datasets.

本文的贡献可以总结如下：

我们为小样本检测问题提出了一个密集关系蒸馏模块，其目标是充分利用支持信息来辅助新类别对象的检测过程。
我们提出了一个自适应上下文感知特征聚合模块，以更好地捕捉全局和局部特征，以缓解尺度变化问题，提高小样本检测的性能。
大量实验表明，我们的方法在 PASCAL VOC 和 MS COCO 数据集上取得了一致的改进。特别是，我们的方法在两个数据集上实现了比最先进的方法更好的性能。

Word
targets at：目标在
assist ：v. 协助, 帮助, 援助, 促进 n. （曲棍球等）助攻, （棒球等）助杀
alleviate ：v. 减轻, 缓解, 缓和
Specially：adv. 特别, 特意, 尤其, 专门地
our approach achieves better performance than the state-of-the-art methods on the two datasets.
我们的方法在两个数据集上实现了比最先进的方法更好的性能。

2.相关工作

2.1 一般的目标检测

Deep learning based object detection can be mainly divided into two categories: one-stage and two-stage detectors. One-stage detector YOLO series [20, 21, 22] provide a proposal-free framework, which uses a single convolutional network to directly perform class and bounding box predictions. SSD [18] uses default boxes to adjust to various object shapes. On the other hand, RCNN and its variants [7, 9, 6, 23, 8] fall into the second category. These methods first extract class-agnostic region proposals of the potential objects from a given image. The generated boxes are then further refined and classified into different categories by subsequent modules. Moreover, many works are proposed to handle scale variance [17, 15, 24, 25]. Compared to one-stage methods, two-stage methods are slower but exhibit better performance. In our work, we adopt Faster RCNN as the base detector.

基于深度学习的目标检测主要可以分为两类：单阶段检测器和两阶段检测器。单阶段检测器 YOLO 系列 [20, 21, 22] 提供了一个无提议框架，它使用单个卷积网络直接执行类别和边界框预测。 SSD [18] 使用默认框来调整各种目标形状。另一方面，RCNN 及其变体 [7, 9, 6, 23, 8] 属于第二类。这些方法首先从给定图像中提取潜在目标的类别不可知区域提议。然后生成的框被后续模块进一步细化并分类为不同的类别。此外，提出了许多工作来处理尺度方差 [17, 15, 24, 25]。与单阶段方法相比，两阶段方法速度较慢，但表现出更好的性能。在我们的工作中，我们采用 Faster RCNN 作为基础检测器。

Word
are proposed to：拟将，建议将, 建议, 拟在

2.2 小样本学习

Few-shot learning aims to learn transferable knowledge that can be generalized to new classes with scarce examples. Bayesian inference is utilized in [4] to generalize knowledge from a pretrained model to perform one-shot learning. Meta-learning based methods have been prevalent in few-shot learning these days. Metric learning based methods [16, 29, 26, 27] have achieved state-of-the-art performance in few-shot classification tasks. Matching Network [29] encodes input into deep neural features and performs weighted nearest neighbor matching to classify query images. Our proposed method is also based on matching mechanism. Prototypical Network [26] represents each class with one prototype which is a feature vector. Relation Network [27] learns a distance metric to compare the target image with a few labeled images. While optimization based methods [19, 5] are proposed for fast adaptation to new few-shot task. [11] proposes a cross-attention mechanism to learn correlations between support and query images. Above methods are focusing on the few-shot classification task while few-shot object detection problem is relatively under-explored.

小样本学习旨在学习可迁移的知识，这些知识可以推广到具有稀缺示例的新类。 [4] 中利用贝叶斯推理从预训练模型中概括知识以执行一次性学习。如今，基于元学习的方法在小样本学习中很流行。基于度量学习的方法 [16, 29, 26, 27] 在少样本分类任务中取得了最先进的性能。匹配网络 [29] 将输入编码为深度神经特征，并执行加权最近邻匹配以对查询图像进行分类。我们提出的方法也是基于匹配机制。原型网络 [26] 用一个原型表示每个类，原型是一个特征向量。关系网络 [27] 学习距离度量以将目标图像与一些标记图像进行比较。虽然提出了基于优化的方法 [19, 5] 以快速适应新的少拍任务。 [11] 提出了一种交叉注意机制来学习支持和查询图像之间的相关性。上述方法侧重于少样本分类任务，而少样本对象检测问题相对未得到充分探索。

Word
aims to：旨在
inference ：名词: 推理, 推论, 论断, 意味
is utilized in：被用于
under-explored：未充分开发的

2.3 小样本目标检测

Few-shot object detection aims to detect object from novel classes with only a few annotated training examples provided. LSTD [1] and RepMet [14] adopt a general transfer learning framework which reduces overfitting by adapting pre-trained detectors to few-shot scenarios. Recently, Meta YOLO [13] designs a novel few-shot detection model with YOLO v2 [21] that learns generalizable meta features and automatically reweights the features for novel classes by producing class-specific activating coefficients from support examples. Meta R-CNN [35] and FsDetView [34] perform similar process with base detector as Faster R-CNN. TFA [31] simply performs two-stage finetuning approach by only finetuning the classifier on the second stage and achieves better performance. MPSR [33] proposes multiscale positive sample refinement to handle scale variance problem. CoAE [12] proposes non-local RPN and focuses on one-shot detection from the view of tracking by comparing itself with other tracking methods, while our method performs cross-attention on features extracted by the backbone in a more traightforward way and targets at few-shot detection task. FSOD [3] proposes attention-RPN, multi-relation detector and contrastive training strategy to detect novel object. In our work, we adopt the similar meta-learning based framework as Meta R-CNN and further improve the performance. Moreover, with our proposed method, the class-specific prediction procedure can be successfully removed, simplifying the overall process.

小样本目标检测旨在仅使用少数带注释的训练示例从新类别中检测目标。假如。 LSTD [1] 和 RepMet [14] 采用通用的迁移学习框架，通过将预训练的检测器适应小样本场景来减少过度拟合。最近，Meta YOLO [13] 使用 YOLO v2 [21] 设计了一种新颖的小样本检测模型，该模型学习可泛化的元特征，并通过从支持示例中生成特定于类的激活系数来自动重新加权新类的特征。 Meta R-CNN [35] 和 FsDetView [34] 使用基础检测器执行与 Faster R-CNN 类似的过程。 TFA [31] 通过仅在第二阶段微调分类器来简单地执行两阶段微调方法，并获得更好的性能。 MPSR [33] 提出了多尺度正样本细化来处理尺度方差问题。 CoAE [12] 提出了 non-local RPN 并通过与其他跟踪方法进行比较，从跟踪的角度专注于单次检测，而我们的方法以更直接的方式对主干提取的特征进行交叉注意，并针对小样本检测任务。 FSOD [3] 提出了注意力 RPN、多关系检测器和对比训练策略来检测新对象。在我们的工作中，我们采用了与 Meta R-CNN 类似的基于元学习的框架，并进一步提高了性能。此外，使用我们提出的方法，可以成功删除特定于类的预测过程，从而简化整个过程。

Word
automatically ：adv. 自动地, 自然地, 无意识地, 不自觉地
achieves ：v. 实现, 做到, 获得(胜利等), 取得预期效果

3. 方法

3.1 准备工作

Problem Definition. Following setting in [13, 35], object classes are divided into base classes ${C_{base}}$ with abundant annotated data and novel classes ${C_{novel}}$ with only a few annotated samples, where ${C_{base}}$ and ${C_{novel}}$ have no intersection. We aim to obtain a few-shot detection model with the ability to detect objects from both base and novel classes in testing by leveraging generalizable knowledge from base classes. The number of instances per category for novel classes is set as $k$ ( $i . e ., k - s h o t$ ).

问题定义。 按照 [13, 35] 中的设置，目标类被分为具有丰富注释数据的基类 ${C_{base}}$ 和只有少量注释样本的新类 ${C_{novel}}$ ，其中 ${C_ {base}}$ 和 ${C_{novel}}$ 没有交集。我们的目标是通过利用基类的泛化知识，获得一个能够在测试中从基类和新类中检测对象的小样本检测模型。新类的每个类别的实例数设置为 $k$ （ $i . e ., k - s h o t$ ）。

Word
are divided into：被分成
abundant annotated data：丰富的注释数据
a few annotated samples：几个带注释的样本
intersection：n. 交叉, 相交, 十字路口, 交叉路口交集
aim to：目标是
leveraging ：利用
We aim to obtain a few-shot detection model with the ability to detect objects from both base and novel classes in testing by leveraging generalizable knowledge from base classes.
我们的目标是通过利用基类的泛化知识，获得一个能够在测试中从基类和新类中检测对象的小样本检测模型。

We align the training scheme with the episodic paradigm [29] for few-shot scenario. Given a $k - s h o t$ learning task, each episode is constructed by sampling: 1) a support set containing image-mask pairs for different classes $\left\{ {{x_i},{y_i}} \right\}_{i = 1}^N$ ,where ${x_i} \in {\mathbb{R}^{h \times w \times 3}}$ is an RGB image, ${y_i} \in {\mathbb{R}^{h \times w}}$ is a binary mask for objects of class $i$ in the support image generated from bounding box annotations and $N$ is the number of classes in the training set; 2) a query image $q$ and annotations $m$ for the training classes in the query image. The input to the model is the support pairs and query image, the output is detection prediction for query image.

我们将训练方案与小样本场景的情景范例 [29] 保持一致。给定一个 $k - s h o t$ 学习任务，每个情节都是通过采样构建的：1) 包含不同类别的图像掩码对的支持集 $\left\{ {{x_i},{y_i}} \right\} _{i = 1}^N$ ，其中 ${x_i} \in {\mathbb{R}^{h \times w \times 3}}$ 是一个 RGB 图像， ${y_i} \in {\mathbb{ R}^{h \times w}}$ 是从边界框注释生成的支持图像中 $i$ 类对象的二进制掩码， $N$ 是训练集中的类数； 2）查询图像中的训练类的查询图像 $q$ 和注释 $m$ 。模型的输入是支持对和查询图像，输出是查询图像的检测预测。

Word
the episodic paradigm：事件范式
is constructed by：构建的是, 构建了, 构建的

Basic Object Detection. The choice of base detectors is varied. [13] utlizes YOLO v2 [21] which is a one-stage detector, while [35] adopts Faster R-CNN [23] which is a two-stage detector and provides consistently better results.Therefore, we also adopt Faster R-CNN as our base detector which consists of a feature extractor, region proposal network (RPN) and the detection head (RoI head).

基本目标检测。基检测器的选择是多样的。[13]使用YOLO v2 [21]，这是一个单阶段的检测器，而[35]采用Faster R-CNN [23]，这是一个两阶段的检测器，并始终提供更好的结果。因此，我们也采用Faster R-CNN作为我们的基础检测器，它由一个特征提取器、区域建议网络（RPN）和检测头（RoI头）组成。

Word
The choice of base detectors is varied.：基检测器的选择是多样的。
utlizes ：利用
adopts ：采用
provides：提供
Therefore：adv. 因此

Feature Reweighting for Detection. We choose Meta-RCNN [35] as our baseline method. Formally, let $I$ denote an input query image, $\left\{ {{I_{si}},{M_{si}}} \right\}\left| {_{i = 1}^N} \right.$ denote support images and masks converted from bounding-box annotations, where $N$ is the number of training classes. RoI features ${Z^j}\left| {_{j = 1}^n} \right.$ is generated by the RoI pooling layer ( $n$ is the number of RoIs) and class-specific vectors ${w_i} \in {\mathbb{R}^C},i = 1,2,...,N$ are produced with a reweighting module which shares its backbone parameters with the feature extractor, where $C$ is the feature dimension. Then class-specific feature ${z_i}$ is achieved with: ${z_i} = z \otimes {w_i},i = = 1,2,...,N$ where $\otimes$ denotes channel-wise multiplication. Then class-specific prediction is performed to output the detection results. Based on this methodology, we further make a significant improvement and simplify the prediction procedure by removing the class-specific prediction.

用于检测的特征重新加权。我们选择 Meta-RCNN [35] 作为我们的基线方法。形式上，让 $I$ 表示输入查询图像， $\left\{ {{I_{si}},{M_{si}}} \right\}\left| {_{i = 1}^N} \right.$ 表示从边界框注释转换的支持图像和掩码，其中 $N$ 是训练类的数量。投资回报率特征 ${Z^j}\left| {_{j = 1}^n} \right.$ 由 RoI 池化层生成（ $n$ 是 RoI 的数量）和特定于类的向量 ${w_i} \in {\mathbb{R}^C },i = 1,2,...,N$ 由重加权模块生成，该模块与特征提取器共享其主干参数，其中 $C$ 是特征维度。然后类特定的特征 ${z_i}$ 实现为： ${z_i} = z \otimes {w_i},i == 1,2,...,N$ 其中 $\otimes$ 表示通道层次乘法。然后进行类别特定的预测以输出检测结果。基于这种方法，我们通过删除特定于类的预测进一步进行了重大改进并简化了预测过程。

Word
Formally：adv. 形式上, 正式地, 遵照一定格式地
class-specific：特定于类
Based on ：基于
simplify ：v. 使简化, 使简易

3.2 DCNet

As illustrated in Fig. 2, we present the Dense Relation Distillation (DRD) module with Context-aware Feature Aggregation (CFA) module to fully exploit support features and capture essential context information. The two proposed components form the final model DCNet. We will first depict the architecture of the proposed DRD module. Then we will bring out the details of the CFA module.

如图 2 所示，我们展示了具有上下文感知特征聚合 (CFA) 模块的密集关系蒸馏 (DRD) 模块，以充分利用支持特征并捕获必要的上下文信息。这两个提议的组件构成了最终模型 DCNet。我们将首先描述所提出的 DRD 模块的架构。然后我们将带出CFA模块的细节。

Word
As illustrated in Fig. 2：如图 2 所示
present ：n. 目前, 现在, 礼物, 礼品 adj. 存在, 出席, 在场, 出现 v. 出现, 提出, 显示, 提交
fully exploit ：充分利用
form ：动词: 形成, 构成, 构, 结, 组织, 组, 树立, 缔, 发展, 养。名词: 形式, 表格, 形态, 格式, 形, 形状, 式, 体, 态, 样式, 结成, 程式, 形像, 单子
bring out ：na. 公布, 说出, 出版, 上演

3.2.1 Dense Relation Distillation Module 密集关系蒸馏模块

Key and Value Embedding. Given a query image and support set, query and support features are produced by feeding them into the shared feature extractor. The input of the dense relation distillation (DRD) module is the query feature and support features. Both parts are first encoded into pairs of key and value maps through the dedicated deep encoders. The query encoder and support encoder adopt the same structure while not sharing parameters.

键和值嵌入。 给定查询图像和支持集，通过将它们输入共享特征提取器来生成查询和支持特征。密集关系蒸馏（DRD）模块的输入是查询特征和支持特征。这两个部分首先通过专用的深度编码器编码成键和值映射对。查询编码器和支持编码器采用相同的结构，但不共享参数。

Word
are produced by：是由是由以下方面产生的,
feeding them into：将它们送入，把它们送入, 把它们送进, 把它们喂给
Both parts are first encoded into：两部分首先被编码成
adopt ：动词: 采用, 采取, 通过, 采纳, 收养, 抱, 取, 树立, 认, 义

The encoder takes one or multiple feature as input and outputs two feature maps for each input feature: key and value with two parallel $\times 3$ convolution layers, which serve as reducing the dimension of the input feature to save computation cost. Specifically, key maps are used for measuring the similarities between query feature and support features, which help determine where to retrieve relevant support values. Therefore, key maps are learned to encode visual semantics for matching and value maps store detailed information for recognition. Hence, for query feature, the output is a pair of key and value maps: ${k_q} \in {\mathbb{R}^{C/8 \times H \times W}}$ , ${v_q} \in {\mathbb{R}^{C/2 \times H \times W}}$ , where $C$ is the feature dimension, $H$ is the height, and $W$ is the width of input feature map. For support features, each of the features is independently encoded into key and value maps, the output is ${k_s} \in {\mathbb{R}^{N \times C/8 \times H \times W}}$ , ${k_s} \in {\mathbb{R}^{N \times C/8 \times H \times W}}$ , where $N$ is the number of target classes (also the number of support samples). The generated key and value maps are further fed into the relation distillation part where keys maps of query and support are densely matched for addressing target objects.

编码器将一个或多个特征作为输入，并为每个输入特征输出两个特征映射：key 和 value，具有两个并行的 $3\times 3$ 卷积层，用于降低输入特征的维度以节省计算成本。具体来说，键映射用于衡量查询特征和支持特征之间的相似性，这有助于确定在哪里检索相关支持值。因此，学习关键图来编码视觉语义以进行匹配，而值图存储用于识别的详细信息。因此，对于查询特征，输出是一对键值映射： ${k_q} \in {\mathbb{R}^{C/8 \times H \times W}}$ , ${v_q} \in {\mathbb{R}^{C/2 \times H \times W}}$ ，其中 $C$ 是特征维度， $H$ 是高度， $W$ 是输入特征图的宽度。对于支持功能，每个功能都独立编码为键和值映射，输出为 ${k_s} \in {\mathbb{R}^{N \times C/8 \times H \times W}}$ ， ${k_s} \in {\mathbb{R}^{N \times C/8 \times H \times W}}$ ，其中 $N$ 是目标类的数量（也是支持样本的数量）。生成的键和值映射被进一步输入到关系蒸馏部分，其中查询和支持的键映射被密集匹配以寻址目标对象。

Word
parallel :平行 ,平行的, 平行线, 纬度
save computation cost:节省计算成本, 节省计算费用, 节约计算成本, 节省了计算成本
Specifically:adv. 具体来说, 明确地, 具体地, 特意
determine :确定, 决定, 判断, 决定了
retrieve :v. 找回, 取回, 挽回, 索回 n. 恢复
Therefore:adv. 因此
Hence:adv. 因此, 由此 int. 〈诗〉去
independently :adv. 独立地, 自由地
fed into:un. 注入

Relation Distillation. After acquiring the key/value maps of query and support features, relation distillation is performed. As illustrated in Fig. 2, soft weights for value maps of support features are computed via measuring the similarities between key maps of query feature and support features. The pixel-wise similarity is performed in a non-local manner, formulated as: $F({k_{qi}},{k_{sj}}) = \phi {({k_{qi}})^T}\phi '({k_{sj}}),$ where $i$ and $j$ are the index of the query and support location, $\phi$ , $\phi '$ denote two different linear transformations with parameters learned via back propagation during training process, forming a dynamically learned similarity function. After computing the similarity of pixel features, we perform softmax normalization to output the final weight $W$ : ${W_{ij}} = \frac{{\exp (F({k_{qi}},{k_{sj}}))}}{{\sum {\exp (F({k_{qi}},{k_{sj}}))} }}.$ Then the value of the support features are retrieved by a weighted summation with the soft weights produced and then it is concatenated with the value map of query feature. Hence, the final output is formulated as: $y = concat[{v_q},W * {v_s}],$ where $*$ denotes matrix inner-product. Noted that there are $N$ support features, which brings $N$ key-value pairs. We perform summation over $N$ output results to obtain the final result, which is a refined query feature, activated by support features where there are co-existing classes of objects in query and support images.

关系蒸馏。 在获取查询和支持特征的键/值图后，进行关系蒸馏。如图 2 所示，支持特征值图的软权重是通过测量查询特征的关键映射和支持特征之间的相似性来计算的。以非局部方式执行像素级相似性，公式如下： $F({k_{qi}},{k_{sj}}) = \phi {({k_{qi}})^T}\phi '({k_{sj}}),$ 其中 $i$ 和 $j$ 是查询和支持位置的索引， $\phi$ 、 $\phi'$ 表示两个不同的线性变换，参数在训练过程中通过反向传播学习，形成一个动态学习的相似度函数。在计算像素特征的相似度后，我们执行 softmax 归一化以输出最终权重 $W$ ： ${W_{ij}} = \frac{{\exp (F({k_{qi}},{k_{sj}}))}}{{\sum {\exp (F({k_{qi}},{k_{sj}}))} }}.$ 然后通过加权求和与产生的软权重来检索支持特征的值，然后将其与查询特征的值图连接起来。因此，最终输出公式为： $y = concat[{v_q},W * {v_s}],$ 其中 $*$ 表示矩阵内积。注意到有 $N$ 支持特征，它带来了 $N$ 个键值对。我们对 $N$ 输出结果进行求和以获得最终结果，这是一个细化的查询特征，由支持特征激活，其中查询和支持图像中的目标类共存。

Figure 2. The overall framework of our proposed DCNet. For training, the input for each episode consists of a query image and N support image-mask pairs from N classes. The shared feature extractor first produces query feature and support features. Then, the dense relation distillation (DRD) module performs dense feature match to activate co-exisiting features of input query. With proposals produced by RPN, context-aware feature aggregation (CFA) module adaptively harnesses features generated with different scales of pooling operations, capturing different levels of features for a more comprehensive representation.

图 2. 我们提出的 DCNet 的整体框架。对于训练，每一批次的输入包括一个查询图像和来自 N 个类别的 N 个支持图像-掩码对。共享特征提取器首先生成查询特征和支持特征。然后，密集关系蒸馏（DRD）模块执行密集特征匹配以激活输入查询的共存特征。通过 RPN 生成的提议，上下文感知特征聚合 (CFA) 模块自适应地利用不同规模的池化操作生成的特征，捕获不同级别的特征以获得更全面的表示。

Word
acquiring ：获取
is performed：被执行
features are computed via measuring the similarities：通过测量相似性来计算特征
denote ：动词: 表示, 意味着
are retrieved by：被检索
Hence：因此
is formulated as：被公式化为
produces ：产生

Previous trials [13, 35, 34] utilize class-wise vectors generated by global pooling of support features to modulate the query feature, which guide the feature learning from a holistic view. However, since appearance changes or occlusions are common in natural images, the holistic feature may be misleading when objects of the same class vary much between query and support samples. Also, when most parts of the objects are unseen due to the occlusions, the retrieval of local detailed features becomes substantial, which former methods completely neglect. Hence, equipped with the dense relation distillation module, pixel-level relevant information can be distilled from support features. As long as there exist some common characteristics, the pixels of query features belonging to the co-existing objects between query and support samples will be further activated, providing a robust modulated feature to facilitate the prediction of class and bounding-box.

先前的试验 [13, 35, 34] 利用支持特征的全局池化生成的类向量来调制查询特征，从整体角度指导特征学习。然而，由于外观变化或遮挡在自然图像中很常见，当同一类的对象在查询和支持样本之间差异很大时，整体特征可能会产生误导。此外，当物体的大部分由于遮挡而看不见时，局部细节特征的检索变得很重要，而以前的方法完全忽略了这一点。因此，配备了密集关系蒸馏模块，可以从支持特征中提取像素级相关信息。只要存在一些共同特征，属于查询样本和支持样本之间共存对象的查询特征的像素将被进一步激活，提供一个鲁棒的调制特征，以促进类和边界框的预测。

Word
vectors generated by：生成的向量
guide ：名: 指南, 导轨, 先导, 讲解员, 向, 路标动词: 指导, 引导, 导向, 导, 向导, 指引, 导引, 带领, 引, 向, 南针, 羑
a holistic view：整体观
However：然而
holistic ：整体的
Also：adv. 也, 同样, 此外, 而且 conj. 同“also beautiful”
due to ：介词: 由于, 由, 基于, 承, 定于
retrieval ：名词: 恢复, 取回
substantial：adj. 大量的, 价值巨大的, 重大的, 大而坚固的
Hence：因此
equipped with：配备
As long as ：只要
modulated ：调制的

Our distillation method can be seen as an extension of the non-local self-attention mechanism [28, 30]. However, instead of performing self-attention, we specially design the relation distillation model to realize information retrieval from support features to modulate the query feature, which can be treated as a cross attention.

我们的蒸馏方法可以看作是非局部自注意力机制的扩展 [28, 30]。然而，我们并没有执行自注意力，而是专门设计了关系蒸馏模型来实现从支持特征到调制查询特征的信息检索，可以将其视为交叉注意力。

Word
instead of ：介词: 顶, 而不是
be treated as：把……当作……对待；当做……来看；被当作

3.2.2 Context-aware Feature Aggregation 上下文感知特征聚合

After performing dense relation distillation, DRD module has fulfilled its duty. The refined query feature is subsequently fed into RPN where region proposals are output. Taking proposals and feature as input, RoI Align module performs feature extraction for final class prediction and bounding-box regression. Normally, pooling operation is implemented with a fixed resolution 8 in our original implementation, which is likely to cause information loss during training. For general object detection, this kind of information loss can be remedied with large scale of training data, while the problem becomes severe in few-shot detection scenarios with only a few training data available, which is inclined to induce a misleading detection results. Moreover, with scale variation amplified due to the few-shot nature, the model tends to lose the generalization ability to novel classes with adequate adaption to different scales. To this end, we propose Context-aware Feature Aggregation (CFA) module. Instead of using a fixed resolution 8, we empirically choose 4, 8 and 12 three resolutions and perform parallel pooling operation to obtain a more comprehensive feature representation. The larger resolution tends to focus on local detailed context information specially for smaller objects, while the smaller resolution targets at capturing holistic information to benefit the recognition of larger objects, providing a simple and flexible way to alleviate the scale variation problem.

在进行了密集关系蒸馏之后，DRD 模块就完成了它的任务。细化的查询特征随后被输入到 RPN 中，在那里输出区域建议。以提议和特征为输入，RoI Align 模块为最终类别预测和边界框回归执行特征提取。通常情况下，池化操作在我们最初的实现中是用固定的分辨率8来实现的，这很可能导致训练过程中的信息丢失。对于一般的目标检测，这种信息丢失可以通过大规模的训练数据来弥补，而在只有少量训练数据可用的小样本检测场景中，问题变得严重，容易导致误导性的检测结果。此外，由于小样本性质导致尺度变化被放大，该模型往往会失去对新类的泛化能力，但对不同尺度有足够的适应能力。为此，我们提出了上下文感知特征聚合（CFA）模块。我们没有使用固定的分辨率 8，而是凭经验选择 4、8 和 12 三个分辨率并进行并行池化操作以获得更全面的特征表示。较大的分辨率倾向于专注于较小物体的局部详细上下文信息，而较小的分辨率则侧重于捕获整体信息以有利于较大物体的识别，为缓解尺度变化问题提供了一种简单灵活的方法。

Word
module has fulfilled its duty：模块已经完成任务
subsequently ：副词: 后来, 以后, 既而
fed into ：送入输送到, 被送入, 送到
feature is subsequently fed into RPN where region proposals are output. ：特征随后被输入到 RPN 中，在那里输出区域提议。
Taking proposals and feature as input：将提案和特征作为输入
Normally：adv. 通常, 正常地, 正常情况下, 平常地
be remedied with：补救
severe ：adj. 极为恶劣的, 十分严重的, 严厉的, 苛刻的
is inclined to：倾向于
Moreover：而且
with scale variation amplified：放大了尺度变化
tends to：倾向于
adequate ：形容词: 充足, 足够, 适当, 够
To this end, ：un. 直到最后, 为此目的
propose ：提供
empirically ：凭经验
targets at：目标在
alleviate ：动词: 缓和, 和缓

Since each generated feature contains different level of semantic information. With the intention to efficiently aggregate features generated from different scales of RoI pooling, we further propose an attention mechanism to adaptively fuse the pooling results. As illustrated in Fig. 3, we add an attention branch for each feature which consists of two blocks. The first block contains a global average pooling. The second one contains two consecutive fc layers. Afterwards, we add a softmax normalization to the generated weights for balancing the contribution of each feature. Then the final output of the aggregated feature is the weighted summation of the three features.

由于每个生成的特征包含不同级别的语义信息。为了有效地聚合从不同规模的 RoI 池化生成的特征，我们进一步提出了一种注意力机制来自适应地融合池化结果。如图 3 所示，我们为每个由两个块组成的特征添加一个注意力分支。第一个块包含全局平均池化。第二个包含两个连续的 fc 层。之后，我们向生成的权重添加 softmax 归一化，以平衡每个特征的贡献。那么聚合特征的最终输出就是三个特征的加权求和。

Word
aggregate ：融合
fuse：融合
consecutive ：adj. 连续不断的
contribution ：n. 贡献, 捐款, 捐赠, 稿件
Afterwards：adv. 后来, 以后

Figure 3. Illustration of context-aware feature aggregation. Attention mechanism is adopted to adaptively aggregate different features, where the weights are normalized with softmax function.
图 3. 上下文感知特征聚合的图示。采用注意力机制自适应聚合不同的特征，其中权重使用 softmax 函数进行归一化。

3.3. Learning Strategy 学习策略

As illustrated in Fig. 4, we follow the training paradigm in [13, 35, 34], which consists of meta-training and meta fine-tuning. In the phase of meta-training, abundant annotated data from base classes is provided. We jointly train the feature extractor, dense relation distillation module, context-aware feature aggregation module and other basic components of detection model. In meta fine-tuning phase, we train the model on both base and novel classes. As only $k$ labeled bounding-boxes are available for the novel classes, to balance between samples from base and novel classes, we also include $k$ boxes for each base class. The training procedure is the same as the meta-training phase but with fewer iterations for model to converge.

如图 4 所示，我们遵循 [13, 35, 34] 中的训练范式，其中包括元训练和元微调。在元训练阶段，提供了大量来自基类的注释数据。我们联合训练特征提取器、密集关系蒸馏模块、上下文感知特征聚合模块和检测模型的其他基本组件。在元微调阶段，我们在基类和新类上训练模型。由于只有 $k$ 个标记的边界框可用于新类，为了平衡来自基类和新类的样本，我们还为每个基类包含了 $k$ 个框。训练过程与元训练阶段相同，但模型收敛的迭代次数更少。

Word
jointly ：副词: 共同
components ：成分
fewer iterations:更少的迭代
converge:v. 集中, 汇集, 聚集, （向某一点）相交

Figure 4. Demonstration of learning strategy of meta-learning based few-shot detection framework. The meta learner aims to acquire meta information and help the model to generalize to novel classes.
图 4. 基于元学习的少样本检测框架的学习策略演示。元学习器旨在获取元信息并帮助模型推广到新的类。

4. Experiments 实验

In this section, we first introduce the implementation details and experimental configurations in Sec. 4.1. Then we present our detailed experimental analysis on PASCAL VOC dataset in Sec. 4.2 together with ablation studies and qualitative results. Finally, results on COCO dataset will be presented in Sec. 4.3.

在本节中，我们首先介绍第二节中的实现细节和实验配置。 4.1.然后我们在第二节中介绍了我们对 PASCAL VOC 数据集的详细实验分析。 4.2 连同消融研究和定性结果。最后，在 COCO 数据集上的结果将在 Sec. 4.3.

Word
In this section, :在这个部分，
configurations:n. 配置, 结构, 外形, 组合
Finally:adv. 终于, 最终, （用于列举）最后, 彻底地

4.1. Datasets and Settings 数据集和设置

Following the instructions in [13], we construct the few-shot detection datasets for fair comparison with other state-of-the-art methods. Moreover, to achieve a more stable few-shot detection results, we perform 10 random runs with different randomly sampled shots. Hence, all the results in theexperiments is averaged results by 10 random runs.

按照 [13] 中的说明，我们构建了小样本检测数据集，以便与其他最先进的方法进行公平比较。此外，为了获得更稳定的少镜头检测结果，我们使用不同的随机采样镜头执行 10 次随机运行。因此，实验中的所有结果都是 10 次随机运行的平均结果。

Word
instructions ： n. 指令, 教导, 教训
Moreover：adv. 此外, 而且
is averaged results：是平均结果

PASCAL VOC. For PASCAL VOC dataset, we train our model on the VOC 2007 trainval and VOC 2012 trainval sets and test the model on VOC 2007 test set. The evaluation metric is the mean Average Precision (mAP). Both the train-val sets are split by object categories, where 5 are randomly chosen as novel classes and the left 15 are base classes. We use the same split as [13], where novel classes for four splits are {“bird”, “bus”, “cow”, “motorbike” (“mbike”), “sofa”}, {“aeroplane”(“aero”, “bottle”, “cow”, “horse”, “sofa”}, {“boat”, “cat”, “motorbike”, “sheep”, “sofa”}, respectively. For few-shot object detection experiments, the few-shot dataset consists of images where $k$ object instances are available for each category and $k$ is set as 1/3/5/10. COCO. MS COCO dataset has 80 object categories, where the 20 categories overlapped with PASCAL VOC are set to be novel classes. 5000 images from the validation set noted as minival are used for evaluation while the left images in the train and validation set are used for training. The process of constructing few-shot dataset is similar to PASCAL VOC dataset and k is set as 10/30.

PASCAL VOC. 对于PASCAL VOC数据集，我们在VOC 2007 trainval和VOC 2012 trainval集上训练我们的模型，并在VOC 2007测试集上测试该模型。评估指标是平均平均精度（mAP）。这两个训练集和测试集都是按对象类别划分的，其中5个是随机选择的新类别，其余15个是基本类别。我们使用与[13]相同的拆分，其中四个拆分的新颖类分别是{“鸟”、“公共汽车”、“牛”、“摩托车”（“mbike”）、“沙发”}、{“飞机”（“aero”、“瓶子”、“牛”、“马”、“沙发”}、{“船”、“猫”、“摩托车”、“羊”、“沙发”}。对于小样本的物体检测实验，几张照片的数据集由图像组成，每个类别都有 $k$ 个物体实例， $k$ 被设定为1/3/5/10。COCO。MS COCO数据集有80个物体类别，其中与PASCAL VOC重叠的20个类别被设定为新的类别。来自验证集的5000张被注为minival的图像被用于评估，而训练集和验证集的左侧图像被用于训练。构建小样本数据集的过程与PASCAL VOC数据集相似，K被设定为10/30。

Word
evaluation metric：评价指标
noted as: 以…而闻名
minival ：迷你

Implementation Details. We perform training and testing process on images with a single scale. The shorter side of the query image is resized to 800 pixels and longer sides are less than 1333 pixels while maintaining the aspect ratio. The support image is resized to a squared image of 256 × 256. We adopt ResNet-101 [10] as feature extractor and RoI Align [8] as RoI feature extractor. The weights of the backbone is pre-trained on ImageNet [2]. After training on base classes, only the last fully-connected layer (for classification) is removed and replaced by a new one randomly initialized. It is worth noting that all parts of the model participate in learning process in the second meta fine-tuning phase without any freeze operation. We train our model with a mini-batch size as 4 with 2 GPUs. We utilize the SGD optimizer with the momentum of 0.9, and weight decay of 0.0001. For meta-training on PASCAL VOC, models are trained for 240k, 8k, and 4k iterations with learning rates of 0.005, 0.0005 and 0.00005 respectively. For meta fine-tuning on PASCAL VOC, models are trained for 1300, 400 and 300 iterations with learning rates as 0.005, 0.0005 and 0.00005 respectively. As for MS COCO dataset, during meta-training, models are trained for 56k, 14k and 10k iterations with learning rates of 0.005, 0.0005 and 0.00005 respectively. And during meta fine-tuning, model are trained for 2800, 700 and 500 iteration for 10-shot fine-tuning and 5600, 1400 and 1000 iterations for 30-shot fine-tuning. Baseline Method. Since we adopt Faster-RCNN as base detector, we choose Meta R-CNN [35] as the baseline method. Moreover, we implement it by ourselves for a more fair comparison.

实施细节。 我们在单一尺度的图像上执行训练和测试过程。查询图像的短边调整为 800 像素，长边小于 1333 像素，同时保持纵横比。支持图像被调整为 256 × 256 的平方图像。我们采用 ResNet-101 [10] 作为特征提取器，使用 RoI Align [8] 作为 RoI 特征提取器。主干的权重在 ImageNet [2] 上进行了预训练。在对基类进行训练后，只有最后一个全连接层（用于分类）被移除并替换为一个随机初始化的新层。值得注意的是，模型的所有部分都参与了第二个元微调阶段的学习过程，没有任何冻结操作。我们使用 2 个 GPU 以 4 的 mini-batch 大小训练我们的模型。我们使用动量为 0.9，权重衰减为 0.0001 的 SGD 优化器。对于 PASCAL VOC 的元训练，模型训练 240k、8k 和 4k 次迭代，学习率分别为 0.005、0.0005 和 0.00005。对于 PASCAL VOC 的元微调，模型训练 1300、400 和 300 次迭代，学习率分别为 0.005、0.0005 和 0.00005。对于 MS COCO 数据集，在元训练期间，模型训练 56k、14k 和 10k 次迭代，学习率分别为 0.005、0.0005 和 0.00005。在元微调期间，模型训练 2800、700 和 500 次迭代进行 10 次微调，5600、1400 和 1000 次迭代进行 30 次微调。基线方法。由于我们采用 Faster-RCNN 作为基础检测器，因此我们选择 Meta R-CNN [35] 作为基线方法。此外，我们自己实施它以进行更公平的比较。

Word
the aspect ratio：纵横比
all parts of：的所有部分
participate ：v. 参与, 参加

4.2 Experiments on PASCAL VOC PASCAL VOC实验

In this section, we conduct experiments on PASCAL VOC dataset. We first compare our method with the state-of-the-art methods. Then we carry out ablation studies to perform comprehensive analysis of the components of our proposed DCNet. Finally, some qualitative results are presented to provide an intuitive view of the validity of our method. For all the experiments, we run 10 trials with random support data and report the averaged performance.

在本节中，我们在 PASCAL VOC 数据集上进行实验。我们首先将我们的方法与最先进的方法进行比较。然后我们进行消融研究，对我们提出的 DCNet 的组件进行综合分析。最后，提出了一些定性结果，以提供我们方法有效性的直观视图。对于所有实验，我们使用随机支持数据运行 10 次试验并报告平均性能。

Word
conduct ：v. 实施, 执行, 表现, 引导 n. 举止, 管理方法, 经营方式, 实施办法
qualitative results：定性结果
intuitive view：直观的看法
validity ：有效性真实性, 效力, 有效

4.2.1 Comparisons with State-of-the-art Methods 与最先进方法的比较

In Table 1, we compare our method with former state-of-the-art methods which mostly report results with multiple random runs. Our proposed DCNet achieves state-of-the-art results on almost all the splits with different shots and outperforms previous methods by a large margin. Specifically, in extremely low-shot settings (i.e. 1-shot), our method outperforms others by about 10% in split 1 and 3, providing a convincing proof that our DCNet is able to capture local detailed information to overcome the variations brought by the randomly sampled training shots.

在表1中，我们将我们的方法与以前最先进的方法进行了比较，这些方法大多报告了多次随机运行的结果。我们提出的DCNet在几乎所有不同镜头的分片上都取得了最先进的结果，并以很大的优势胜过了以前的方法。具体来说，在极低的镜头设置（即1个镜头）中，我们的方法在分割1和分割3中比其他方法高出10%，提供了一个令人信服的证据，即我们的DCNet能够捕捉局部的详细信息，以克服随机抽样训练样本带来的变化。

Word
convincing ：形容词: 使人信服
by a large margin：大幅度
proof ：n. 证明, 证据, 检验, 证实 adj. 能抵御, 能防范, 可防护, 防…的 v. 给（织物等）做防护处理, 使防水（或防火等）, 印…的校样

Table 1. Few-shot object detection performance on VOC 2007 test set of PASCAL VOC dataset. We report the mAP with IoU threshold 0.5 (AP50) under three different splits for five novel classes. * denotes the results averaged over multiple random runs.
表 1. PASCAL VOC 数据集的 VOC 2007 测试集上的小样本目标检测性能。我们报告了在五个新类别的三个不同分割下 IoU 阈值为 0.5 (AP50) 的 mAP。 * 表示多次随机运行的平均结果。

4.2.2 Ablation Study 消融研究

We present results of comprehensive ablation studies to analyze the effectiveness of various components of the proposed DCNet. All ablation studies are conducted on the PASCAL VOC 2007 test set with the first novel splits. All results are averaged over 10 random runs.

我们展示了综合消融研究的结果，以分析所提出的 DCNet 的各个组件的有效性。所有的消融研究都是在 PASCAL VOC 2007 测试集上进行的，并带有第一个新的拆分。所有结果都是 10 次随机运行的平均值。

Impact of dense relation distillation module. We conduct experiments to validate the superiority of the proposed dense relation distillation (DRD) module. Specifically, we implement the baseline method for meta-learning based few-shot detection Meta R-CNN with class-specific prediction for the final box classification and regression. While the DRD module requires no extra class-specific processing. As shown in line 1 and 2 of Table 2, DCNet w/o CFA equals to Faster R-CNN equipped with DRD module, our proposed DRD module achieves consistent improvement on all novel splits with all shots number, which effectively demonstrates the supremacy of the relation distillation mechanism over the baseline method. Moreover, the improvement over baseline is significant when the shot number is low, which proves that the DRD module successfully exploits useful information from limited support data.

密集关系蒸馏模块的影响。 我们进行实验以验证所提出的密集关系蒸馏（DRD）模块的优越性。具体来说，我们实现了基于元学习的小样本检测 Meta R-CNN 的基线方法，并针对最终框分类和回归进行了特定类别的预测。而 DRD 模块不需要额外的特定于类的处理。如表 2 的第 1 行和第 2 行所示，DCNet w/o CFA 等于配备 DRD 模块的 Faster R-CNN，我们提出的 DRD 模块在所有镜头数的所有新分裂上都实现了一致的改进，这有效地证明了基线方法上的关系蒸馏机制。此外，当样本数较低时，对基线的改进是显着的，这证明 DRD 模块成功地利用了有限支持数据中的有用信息。

Word
validate ：证实
the superiority of：的优越性
Specifically：adv. 具体来说, 明确地, 具体地, 特意
extra：额外的额外, 剩余的, 补充
the supremacy of：至高无上

Table 2. Ablation study to evaluate the effectiveness of different components in our proposed method. The mAP with IoU threhold 0.5 (AP50) is reported. * denotes CFA module with attention aggregation fashion. † denotes our implementation.
表 2. 评估我们提出的方法中不同组件有效性的消融研究。报告了 IoU 阈值为 0.5 (AP50) 的 mAP。 * 表示具有注意力聚合方式的 CFA 模块。 † 表示我们的实现。

Impact of context-aware feature aggregation module. We carry out experiments to evaluate the validity of the proposed context-aware feature aggregation (CFA) module. Specifically, RoI features generated from parallel branches are aggregated with a simple summation. From line 1 and 3 of the table, with the introduction of CFA module, Meta R-CNN achieves notable gains over the baseline. Since CFA module targets at preserving detailed information in a scale-aware manner, different levels of detailed features can be retrieved to assist the prediction process.

上下文感知特征聚合模块的影响。 我们进行了实验来评估所提出的上下文感知特征聚合（CFA）模块的有效性。具体来说，从并行分支生成的 RoI 特征通过简单的求和进行聚合。从表的第 1 行和第 3 行来看，随着 CFA 模块的引入，Meta R-CNN 在基线上取得了显着的进步。由于 CFA 模块的目标是以尺度感知的方式保存详细信息，因此可以检索不同级别的详细特征以帮助预测过程。

Word
carry out :. 实行, 开展, 完成
notable :n. 名人, 重要人物 adj. 值得注意的, 显著的, 重要的
assist :帮助协助, 辅助, 辅佐

Impact of different RoI pooling resolutions. To further evaluate the impact of different RoI pooling resolutions, we perform explicit experiments to show the detailed performance. As shown in Table 3, solely adopting larger pooling resolution could yield better performance. However, only when aggregating features generated with all three resolutions, the best performance could be obtained.

不同 RoI 池化分辨率的影响。 为了进一步评估不同 RoI 池化分辨率的影响，我们进行了明确的实验以显示详细的性能。如表 3 所示，单独采用更大的池化分辨率可以获得更好的性能。但是，只有聚合所有三种分辨率生成的特征时，才能获得最佳性能。

Word
explicit :adj. 清楚明白的, 易于理解的, 明确的, 直言的
solely :adv. 只, 仅, 唯, 单独地
yield :n. 产量, 产出, 利润 v. 屈服, 让步, 放弃, 提供

Table 3. The impact of different RoI pooling resolutions. The experiments are conducted on VOC 2007 test set of PASCAL VOC dataset with novel split1 and AP50 on 10-shot task averaged from 10 random runs is reported.
表 3. 不同 RoI 池化分辨率的影响。实验是在 PASCAL VOC 数据集的 VOC 2007 测试集上进行的，报告了新的 split1 和 AP50 对 10 次随机运行的平均值的 10-shot 任务。

Impact of attentive aggregation fashion for CFA module. Based on the plain CFA module, we further propose an attention-based aggregation mechanism to adaptively fuse different RoI features. As presented in line 3 and line 4 of Table 2, the attention aggregation mechanism can further boost the performance of the model, which promotes the plain CFA module with a more comprehensive feature representation, effectively balancing the contributions of each extracted features. Finally, with the combination of DRD module and CFA module, we present DCNet, which achieves the best performance according to Table 2.

注意力聚合方式对 CFA 模块的影响。 基于普通的 CFA 模块，我们进一步提出了一种基于注意力的聚合机制，以自适应地融合不同的 RoI 特征。如表 2 的第 3 行和第 4 行所示，注意力聚合机制可以进一步提升模型的性能，从而促进具有更全面特征表示的普通 CFA 模块，有效平衡每个提取特征的贡献。最后，结合 DRD 模块和 CFA 模块，我们提出了 DCNet，它根据表 2 实现了最佳性能。

4.2.3 Qualitative Results 定性结果

To further comprehend the effect of dense relation distillation (DRD) module, we visualize features before and after DRD module. As shown in Fig. 5 (a), after relation distillation, query features can be activated to facilitate the subsequent detection procedure. Moreover, different from former meta-learning based methods which performs prediction in a class-wise manner, our proposed DRD module can model relations between query and support features in all classes at the same time as shown in the second line of Fig. 5 (a). The DRD module enables the model to focus more on the query objects under the guidance of support information. Additionally, we also visualize the effect of CFA module presented in Fig. 5 (b). With a relatively large or small query object as input, DCNet w/o CFA suffers from false classification or missing detection , while the introduction of CFA module could effectively resolve this issue.

为了进一步理解密集关系蒸馏 (DRD) 模块的效果，我们将 DRD 模块前后的特征可视化。如图5（a）所示，在关系蒸馏之后，可以激活查询特征以方便后续的检测过程。此外，与以前以类方式执行预测的基于元学习的方法不同，我们提出的 DRD 模块可以同时建模所有类中查询和支持特征之间的关系，如图 5 的第二行所示（一种）。 DRD 模块使模型在支持信息的指导下更加关注查询对象。此外，我们还可视化了图 5（b）中呈现的 CFA 模块的效果。以相对较大或较小的查询对象作为输入，DCNet w/o CFA 存在误分类或漏检的问题，而引入 CFA 模块可以有效解决这个问题。

Word
facilitate :v. 促进, 促使, 使便利
subsequent :adj. 随后的, 后来的, 之后的, 接后的

4.3. Experiments on MS COCO MS COCO 上的实验

We evaluate 10/30-shot setups on MS COCO benchmark and report the averaged performance with the standard COCO metrics over 10 runs with random shots. The results on novel classes can be seen in Table 4. Despite the challenging nature of COCO dataset with large number of categories, our proposed DCNet achieves state-of-the-art performance on most of the metrics.

我们在 MS COCO 基准测试中评估 10/30-shot 设置，并报告使用标准 COCO 指标在 10 次随机测试中运行的平均性能。新类别的结果可以在表 4 中看到。尽管 COCO 数据集具有大量类别的挑战性，但我们提出的 DCNet 在大多数指标上都实现了最先进的性能。

Table 4. Few-shot object detection performance on COCO minival of MS COCO dataset. We report the mean Averaged Precision and mean Averaged Recall on the 20 novel classes of COCO. * denotes the results averaged over multiple random runs.
表 4. MS COCO 数据集的 COCO minival 上的小样本目标检测性能。我们报告了 20 个新型 COCO 的平均平均精度和平均平均召回率。 * 表示多次随机运行的平均结果。

Figure 5. (a). Visualizations of features before and after dense relation distillation module. (b). Visualizations of effect of context-aware feature aggregation module.
图 5. (a)。密集关系蒸馏模块前后特征的可视化。 (b)。上下文感知特征聚合模块效果的可视化。

5. Conclusions 结论

In this paper, we have presented the Dense Relation Distillation Network with Context-aware Aggregation (DCNet) to tackle few-shot object detection problem. Dense relation distillation module adopts dense matching strategy between query and support features to fully exploit support information. Furthermore, context-aware feature aggregation module adaptively harnesses features from different scales to produce a more comprehensive feature representation. The ablation experiments demonstrate the effectiveness of each component of DCNet. Our proposed DCNet achieves state-of-the-art results on two benchmark datasets, i.e. PASCAL VOC and MS COCO.

在本文中，我们提出了具有上下文感知聚合（DCNet）的密集关系蒸馏网络来解决少镜头对象检测问题。密集关系蒸馏模块采用查询和支持特征之间的密集匹配策略来充分利用支持信息。此外，上下文感知特征聚合模块自适应地利用不同尺度的特征来产生更全面的特征表示。消融实验证明了 DCNet 的每个组件的有效性。我们提出的 DCNet 在两个基准数据集上取得了最先进的结果，即 PASCAL VOC 和 MS COCO。

你可能感兴趣的:(深度学习,神经网络)

Umi-OCR 实践教程：离线、免费、高效的图像文字识别工具几道之旅人工智能智能体及数字员工 ocr 人工智能
一、工具简介Umi-OCR是一款开源、免费且支持离线运行的OCR（光学字符识别）工具，适用于Windows和Linux系统。它基于深度学习技术，能够高效提取图像中的文字，支持多语言识别、批量处理、截屏识别等功能，尤其适合对隐私敏感或网络受限的场景。核心亮点：离线运行：无需联网，保护隐私。多引擎支持：提供Paddle（高性能）和Rapid（低配兼容）两种引擎。批量处理：支持图片、PDF、电子书等多格
基于ChatGPT、GIS与Python机器学习的地质灾害风险评估、易发性分析、信息化建库及灾后重建高级实践 weixin_贾防洪评价风险评估滑坡泥石流地质灾害
第一章、ChatGPT、DeepSeek大语言模型提示词与地质灾害基础及平台介绍【基础实践篇】1、什么是大模型？大模型（LargeLanguageModel,LLM）是一种基于深度学习技术的大规模自然语言处理模型。代表性大模型：GPT-4、BERT、T5、ChatGPT等。特点：多任务能力：可以完成文本生成、分类、翻译、问答等任务。上下文理解：能理解复杂的上下文信息。广泛适配性：适合科研、教育、行
anythingLLM 使用教程惟贤箬溪穷玩Ai AIGC 人工智能
一、anythingLLM简介anythingLLM是一款灵活且功能强大的语言模型，它基于先进的深度学习架构构建，旨在为用户提供多样化的自然语言处理服务。其设计理念注重通用性和可扩展性，能够适应多种领域和任务，无论是文本生成、智能问答，还是翻译、摘要提取等，都能展现出出色的性能。与同类模型相比，anythingLLM具有训练数据丰富、模型优化程度高的优势，能够生成更符合逻辑、更具实用性的文本内容。
深度解析大模型推理框架：原理、应用与实践百度_开发者中心人工智能大模型自然语言处理
在当今数据驱动的时代，大模型推理框架已经成为人工智能领域的重要支柱。本文将通过简明扼要、清晰易懂的方式，带领读者深入了解大模型推理框架的原理、应用领域和实践经验，帮助读者更好地掌握这一技术，并在实际工作中发挥其价值。一、大模型推理框架简介大模型推理框架是指一种基于深度学习技术的推理框架，主要用于解决大规模数据集下的复杂问题。该框架通过对海量数据进行高效的训练和推理，能够快速地对各种复杂场景进行分析
大模型推理框架：从理论到实践的全面解析百度_开发者中心人工智能大模型自然语言处理
在数据驱动的时代，深度学习技术已经渗透到各个行业，从图像识别到自然语言处理，从推荐系统到智能客服，其应用无处不在。然而，深度学习模型的训练和推理过程往往涉及大量数据和复杂计算，传统的计算框架难以满足需求。因此，大模型推理框架应运而生，成为解决这一问题的关键。一、大模型推理框架基本概念大模型推理框架是一种基于深度学习技术的推理框架，它通过对海量数据进行高效的训练和推理，能够快速地对各种复杂场景进行分
回归任务训练--MNIST全连接神经网络（Mnist_NN）豆芽819 深度学习框架PyTorch pytorch 深度学习人工智能机器学习回归
importtorchimportnumpyasnpimportloggingfromtorch.utils.dataimportTensorDataset,DataLoaderfromtorch.utils.dataimportDataLoader#配置日志logging.basicConfig(level=logging.INFO,format='%(asctime)s-%(levelname
Yolo系列之Yolo的基本理解是十一月末 YOLO python 开发语言 yolo
YOLO的基本理解目录YOLO的基本理解1YOLO1.1概念1.2算法2单、多阶段对比2.1FLOPs和FPS2.2one-stage单阶段2.3two-stage两阶段1YOLO1.1概念YOLO(YouOnlyLookOnce)是一种基于深度学习的目标检测算法，由JosephRedmon等人于2016年提出。它的核心思想是将目标检测问题转化为一个回归问题，通过一个神经网络直接预测目标的类别和位
PyTorch基础知识讲解（一）完整训练流程示例苏雨流丰机器学习 pytorch 人工智能 python 机器学习深度学习
文章目录Tutorial1.数据处理2.网络模型定义3.损失函数、模型优化、模型训练、模型评价4.模型保存、模型加载、模型推理Tutorial大多数机器学习工作流程涉及处理数据、创建模型、优化模型参数和保存训练好的模型。本教程向你介绍一个用PyTorch实现的完整的ML工作流程，并提供链接来了解这些概念中的每一个。我们将使用FashionMNIST数据集来训练一个神经网络，预测输入图像是否属于以下
AI进化论：从图灵测试到智能革命的临界点 A达峰绮人工智能数据处理经验分享 AIGC AI人工智能
智能觉醒的起源密码（1943-2010）在曼彻斯特维多利亚大学的实验室里，1948年"Baby"计算机完成人类首个存储程序运行实验时，艾伦·图灵正在构思《计算机器与智能》。这篇划时代论文提出的"模仿游戏"测试，为人工智能奠定了哲学基础。1956年达特茅斯会议上，麦卡锡正式提出"人工智能"概念，当时学界乐观预测"二十年内机器将完成人类所有工作"。神经网络的发展轨迹充满戏剧性：1958年罗森布拉特发明
大语言模型学习路线：从入门到实战大模型官方资料语言模型学习人工智能产品经理自然语言处理搜索引擎
大语言模型学习路线：从入门到实战在人工智能领域，大语言模型（LargeLanguageModels,LLMs）正迅速成为一个热点话题。本学习路线旨在为有基本Python编程和深度学习基础的学习者提供一个清晰、系统的大模型学习指南，帮助你在这一领域快速成长。本学习路线更新至2024年02月，后期部分内容或工具可能需要更新。适应人群已掌握Python基础具备基本的深度学习知识学习步骤本路线将通过四个核
深度学习与目标检测系列(六) 本文约(4.5万字) | 全面解读复现ResNet | Pytorch | 小酒馆燃着灯深度学习目标检测 pytorch 人工智能 ResNet 残差连接残差网络
文章目录解读Abstract—摘要翻译精读主要内容Introduction—介绍翻译精读背景RelatedWork—相关工作ResidualRepresentations—残差表达翻译精读主要内容ShortcutConnections—短路连接翻译精读主要内容DeepResidualLearning—深度残差学习ResidualLearning—残差学习翻译精读ResNet目的以前方法本文改进本质
深度学习与目标检测系列(三) 本文约(4万字) | 全面解读复现AlexNet | Pytorch | 小酒馆燃着灯深度学习目标检测 pytorch AlexNet 人工智能
文章目录解读Abstract-摘要翻译精读主要内容1.Introduction—前言翻译精读主要内容：本文主要贡献：2.TheDataset-数据集翻译精读主要内容：ImageNet简介：图像处理方法：3.TheArchitecture—网络结构3.1ReLUNonlinearity—非线性激活函数ReLU翻译精读传统方法及不足本文改进方法本文的改进结果3.2TrainingonMultipleG
使用 Milvus 进行向量数据库管理与实践 qahaj milvus 数据库 python
技术背景介绍在当今的AI与机器学习应用中，处理和管理大量的嵌入向量是一个常见的需求。Milvus是一个开源向量数据库，专门用于存储、索引和管理深度神经网络以及其他机器学习模型生成的大规模嵌入向量。它的高性能和易用性使其成为处理向量数据的理想选择。核心原理解析Milvus的核心功能体现在其强大的向量索引和搜索能力。它支持多种索引算法，包括IVF、HNSW等，使其能够高效地进行大规模向量的相似性搜索操
物理学不存在了？诺贝尔物理学奖颁给了人工智能资讯新鲜事人工智能
2024年10月8日，瑞典皇家科学院宣布，将2024年诺贝尔物理学奖授予美国普林斯顿大学教授约翰·J·霍普菲尔德（JohnJ.Hopfield）和加拿大多伦多大学教授杰弗里·E·辛顿（GeoffreyE.Hinton），以表彰他们“在人工神经网络机器学习方面的基础性发现和发明”。辛顿在接受电话采访时表示：“完全没想到”。实话实说，在结果出来前，大家也都没想到。因为在外界预测里，今年的诺贝尔物理学奖
计算机视觉技术探索：美颜SDK如何利用深度学习优化美颜、滤镜功能？美狐美颜sdk 美颜SDK 美颜API 直播美颜SDK 计算机视觉深度学习直播美颜SDK 美颜sdk 第三方美颜sdk 美颜api
时下，计算机视觉+深度学习正在重塑美颜技术，通过智能人脸检测、AI滤镜、深度美肤、实时优化等方式，让美颜效果更加自然、精准、个性化。那么，美颜SDK如何结合深度学习来优化美颜和滤镜功能？本文将深入解析AI在美颜技术中的应用，并探讨其未来发展趋势。一、深度学习如何赋能美颜SDK？1.AI人脸检测与关键点识别：精准捕捉五官在美颜过程中，首先需要精准检测人脸位置和五官特征点，确保美颜效果不会失真。深度学
深度学习模型性能全景评估与优化指南 niuTaylor 深度学习人工智能
深度学习模型性能全景评估与优化指南一、算力性能指标体系1.核心算力指标对比指标计算方式适用场景硬件限制TOPS(TeraOperationsPerSecond)每秒万亿次整数运算量化模型推理NVIDIAJetsonNano仅支持FP16/FP32TFLOPS(TeraFLoating-pointOPerationsperSecond)TFLOPS=Cores×FLOPs/Cycle×Frequen
利用Python和深度学习方法实现手写数字识别的高精度解决方案——从数据预处理到模型优化的全流程解析快撑死的鱼 Python算法精解 python 深度学习开发语言
利用Python和深度学习方法实现手写数字识别的高精度解决方案——从数据预处理到模型优化的全流程解析在人工智能的众多应用领域中，手写数字识别是一项经典且具有重要实际应用价值的任务。随着深度学习技术的飞速发展，通过构建和训练神经网络模型，手写数字识别的精度已经可以达到99%以上。本文将以Python为主要编程语言，结合深度学习的核心技术，详细解析手写数字识别的实现过程，并探讨如何进一步优化模型以提高
强化学习中的深度卷积神经网络设计与应用实例数字扫地僧计算机视觉 cnn 人工智能神经网络
I.引言强化学习（ReinforcementLearning，RL）是机器学习的一个重要分支，通过与环境的交互来学习最优策略。深度学习，特别是深度卷积神经网络（DeepConvolutionalNeuralNetworks，DCNNs）的引入，为强化学习在处理高维度数据方面提供了强大工具。本文将探讨强化学习中深度卷积神经网络的设计原则及其在不同应用场景中的实例。II.深度卷积神经网络在强化学习中的
腾讯云大模型知识引擎与DeepSeek：打造懒人专属的谷歌浏览器翻译插件大富大贵7 程序员知识储备1 程序员知识储备2 程序员知识储备3 腾讯云云计算
摘要：随着人工智能技术的飞速发展，越来越多的前沿技术和工具已走入日常生活。翻译工具作为跨语言沟通的桥梁，一直处于技术创新的风口浪尖。本文探讨了腾讯云大模型知识引擎与DeepSeek结合谷歌浏览器插件的可能性，旨在为用户提供一种便捷、高效的翻译体验。通过应用深度学习、自然语言处理和知识图谱技术，该插件不仅能实时翻译网页内容，还能根据上下文进行智能推荐，实现精准的语境转换。本文将详细阐述其设计思路、技
PyTorch深度学习框架60天进阶学习计划 - 第28天：多模态模型实践（二）凡人的AI工具箱深度学习 pytorch 学习 AI编程人工智能 python
PyTorch深度学习框架60天进阶学习计划-第28天：多模态模型实践（二）5.跨模态检索系统应用场景5.1图文匹配系统的实际应用应用领域具体场景优势电子商务商品图像搜索、视觉购物用户可以上传图片查找相似商品或使用文本描述查找商品智能媒体内容推荐、图片库搜索通过内容的语义理解提供更精准的推荐和搜索社交网络基于内容的帖子推荐理解用户兴趣，提供更相关的内容推荐教育技术多模态教学资源检索教师和学生可以更
PyTorch深度学习框架60天进阶学习计划 - 第28天：多模态模型实践（一）凡人的AI工具箱深度学习 pytorch 学习 AI编程人工智能 python
PyTorch深度学习框架60天进阶学习计划-第28天：多模态模型实践（一）引言：跨越感知的边界欢迎来到我们的PyTorch学习旅程第28天！今天我们将步入AI世界中最激动人心的领域之一：多模态学习。想象一下，如果你的模型既能"看"又能"读"，并且能够理解图像与文字之间的联系，这将为我们打开怎样的可能性？今天我们将专注于构建图文匹配系统，学习如何使用CLIP（ContrastiveLanguage
10.2 如何解决从复杂 PDF 文件中提取数据的问题？墨染辉大语言模型 pdf
10.2如何解决从复杂PDF文件中提取数据的问题？解决方案：嵌入式表格检索解释：嵌入式表格检索是一种专门针对从复杂PDF文件中的表格提取数据的技术。它结合了表格识别、解析和语义理解，使得从复杂结构的表格中检索信息成为可能。具体步骤：表格检测和识别：目标：在PDF页面中准确地定位和识别表格区域。方法：使用计算机视觉和深度学习技术，如卷积神经网络（CNN）或其他先进的图像处理算法。效果：能够检测出页面
TensorFlow深度学习实战项目：从入门到精通点我头像干啥 Ai 深度学习 tensorflow 人工智能
引言深度学习作为人工智能领域的一个重要分支，近年来取得了显著的进展。TensorFlow作为Google开源的深度学习框架，因其强大的功能和灵活的架构，成为了众多开发者和研究者的首选工具。本文将带领大家通过一个实战项目，深入理解TensorFlow的使用方法，并掌握深度学习的基本流程。1.TensorFlow简介1.1TensorFlow是什么？TensorFlow是一个开源的机器学习框架，由Go
国外7个最佳大语言模型 (LLM) API推荐幂简集成 API新理念语言模型人工智能自然语言处理
大型语言模型(LLM)API将彻底改变我们处理语言的方式。在深度学习和机器学习算法的支持下，LLMAPI提供了前所未有的自然语言理解能力。通过利用这些新的API，开发人员现在可以创建能够以前所未有的方式理解和响应书面文本的应用程序。下面，我们将比较从Bard到ChatGPT、PaLM等市场上顶级LLMAPI。我们还将探讨整合这些LLM的潜在用例，并考虑其对语言处理的影响。什么是大语言模型(LLM)
【深度学习】DeepSeek模型介绍与部署 Nerous_ 深度学习深度学习人工智能
原文链接：DeepSeek-V31.介绍DeepSeek-V3，一个强大的混合专家(MoE)语言模型，拥有671B总参数，其中每个token激活37B参数。为了实现高效推理和成本效益的训练，DeepSeek-V3采用了多头潜在注意力(MLA)和DeepSeekMoE架构，这些架构在DeepSeek-V2中得到了充分验证。此外，DeepSeek-V3首次提出了无辅助损失的负载平衡策略，并设置了多to
【深度学习】 PyTorch一文详解 Nerous_ 深度学习深度学习 pytorch 人工智能机器学习 python
“PyTorchisadeeplearningframeworkthatprioritizessimplicityandflexibility,makingitthego-tochoiceforbothresearchersanddevelopers.”—Anonymous1.PyTorch简介1.1PyTorch的背景与发展PyTorch是由Facebook人工智能研究院（FAIR）开发的一个开
【DNN量化工具】QKeras 工具简介 kanhao100 笔记 dnn 人工智能神经网络
QKeras工具简介QKeras是一个用于量化深度学习模型的Keras扩展库，旨在使深度学习模型的量化（即将模型的浮点权重转换为低精度格式）变得简单而高效。QKeras主要目标是优化模型的存储和推理速度，特别适用于需要在资源受限的设备（如移动设备和嵌入式系统）上运行深度学习模型的场景。QKeras的主要特点量化支持：QKeras提供了对不同类型量化的支持，包括权重量化和激活量化。用户可以根据需求选
Softmax温度调节与注意力缩放：深度神经网络中的平滑艺术 Mark White dnn 人工智能神经网络
Softmax温度调节与注意力缩放：深度神经网络中的平滑艺术在深度学习的精密机械中，有些细微的调整机制往往被视为理所当然，却实际上蕴含着深刻的数学洞察和巧妙的工程智慧。今天，我们将探讨两个看似独立却本质相通的机制：生成模型中的温度参数与Transformer注意力机制中的缩放因子。这两个设计都围绕着同一个核心概念——softmax分布的平滑控制。Softmax函数：概率分布的催化剂在深入讨论之前，
QKeras、Brevitas和QONNX量化工具对比 kanhao100 笔记深度学习边缘计算
QKeras、Brevitas和QONNX量化工具对比一、引言在深度学习模型部署领域，量化技术已成为提升模型执行效率的关键手段。通过将浮点权重转换为低精度表示，量化能显著减小模型体积、降低内存占用并加速推理过程。对于资源受限的设备（如移动设备、嵌入式系统和边缘计算设备），量化技术尤为重要。本文深入对比三款主流量化工具：QKeras、Brevitas和QONNX，从用户实际应用角度剖析它们的技术特点
Umi-OCR：解锁高效文字识别的新时代水熠芝Dark-Haired
Umi-OCR：解锁高效文字识别的新时代Umi-OCR一款强大而高效的文字识别工具项目地址:https://gitcode.com/Resource-Bundle-Collection/6adda项目介绍在数字化浪潮席卷全球的今天，文字识别技术已成为提升工作效率和生活质量的关键工具。Umi-OCR，作为一款基于深度学习技术的开源文字识别工具，凭借其强大的功能和高效的性能，迅速成为众多用户的首选。无
redis学习笔记——不仅仅是存取数据 Everyday都不同 returnSource expire/del incr/lpush 数据库分区 redis
最近项目中用到比较多redis，感觉之前对它一直局限于get/set数据的层面。其实作为一个强大的NoSql数据库产品，如果好好利用它，会带来很多意想不到的效果。（因为我搞java，所以就从jedis的角度来补充一点东西吧。PS：不一定全，只是个人理解，不喜勿喷） 1、关于JedisPool.returnSource(Jedis jeids) 这个方法是从red
SQL性能优化-持续更新中。。。。。。 atongyeye oracle sql
1 通过ROWID访问表--索引你可以采用基于ROWID的访问方式情况,提高访问表的效率, , ROWID包含了表中记录的物理位置信息..ORACLE采用索引(INDEX)实现了数据和存放数据的物理位置(ROWID)之间的联系. 通常索引提供了快速访问ROWID的方法,因此那些基于索引列的查询就可以得到性能上的提高. 2 共享SQL语句--相同的sql放入缓存 3 选择最有效率的表
[JAVA语言]JAVA虚拟机对底层硬件的操控还不完善 comsci JAVA虚拟机
如果我们用汇编语言编写一个直接读写CPU寄存器的代码段，然后利用这个代码段去控制被操作系统屏蔽的硬件资源，这对于JVM虚拟机显然是不合法的，对操作系统来讲，这样也是不合法的，但是如果是一个工程项目的确需要这样做，合同已经签了，我们又不能够这样做，怎么办呢？那么一个精通汇编语言的那种X客，是否在这个时候就会发生某种至关重要的作用呢？ &n
lvs- real 男人50 LVS
#!/bin/bash # # Script to start LVS DR real server. # description: LVS DR real server # #. /etc/rc.d/init.d/functions VIP=10.10.6.252 host='/bin/hostname' case "$1" in sta
生成公钥和私钥 oloz DSA 安全加密
package com.msserver.core.util; import java.security.KeyPair; import java.security.PrivateKey; import java.security.PublicKey; import java.security.SecureRandom; public class SecurityUtil {
UIView 中加入的cocos2d，背景透明 374016526 cocos2d glClearColor
要点是首先pixelFormat:kEAGLColorFormatRGBA8，必须有alpha层才能透明。然后view设置为透明glView.opaque = NO;[director setOpenGLView:glView];[self.viewController.view setBackgroundColor:[UIColor clearColor]];[self.viewControll
mysql常用命令香水浓 mysql
连接数据库 mysql -u troy -ptroy 备份表 mysqldump -u troy -ptroy mm_database mm_user_tbl > user.sql 恢复表（与恢复数据库命令相同） mysql -u troy -ptroy mm_database < user.sql 备份数据库 mysqldump -u troy -ptroy
我的架构经验系列文章 - 后端架构 - 系统层面 agevs JavaScript jquery css html5
系统层面：高可用性所谓高可用性也就是通过避免单独故障加上快速故障转移实现一旦某台物理服务器出现故障能实现故障快速恢复。一般来说，可以采用两种方式，如果可以做业务可以做负载均衡则通过负载均衡实现集群，然后针对每一台服务器进行监控，一旦发生故障则从集群中移除；如果业务只能有单点入口那么可以通过实现Standby机加上虚拟IP机制，实现Active机在出现故障之后虚拟IP转移到Standby的快速
利用ant进行远程tomcat部署 aijuans tomcat
在javaEE项目中，需要将工程部署到远程服务器上，如果部署的频率比较高，手动部署的方式就比较麻烦，可以利用Ant工具实现快捷的部署。这篇博文详细介绍了ant配置的步骤（http://www.cnblogs.com/GloriousOnion/archive/2012/12/18/2822817.html），但是在tomcat7以上不适用，需要修改配置，具体如下： 1.配置tomcat的用户角色
获取复利总收入 baalwolf 获取
public static void main(String args[]){ int money=200; int year=1; double rate=0.1; &
eclipse.ini解释 BigBird2012 eclipse
大多数java开发者使用的都是eclipse，今天感兴趣去eclipse官网搜了一下eclipse.ini的配置，供大家参考，我会把关键的部分给大家用中文解释一下。还是推荐有问题不会直接搜谷歌，看官方文档，这样我们会知道问题的真面目是什么，对问题也有一个全面清晰的认识。 Overview 1、Eclipse.ini的作用 Eclipse startup is controlled by th
AngularJS实现分页功能 bijian1013 JavaScript AngularJS 分页
对于大多数web应用来说显示项目列表是一种很常见的任务。通常情况下，我们的数据会比较多，无法很好地显示在单个页面中。在这种情况下，我们需要把数据以页的方式来展示，同时带有转到上一页和下一页的功能。既然在整个应用中这是一种很常见的需求，那么把这一功能抽象成一个通用的、可复用的分页（Paginator）服务是很有意义的。 &nbs
[Maven学习笔记三]Maven archetype bit1129 ArcheType
archetype的英文意思是原型，Maven archetype表示创建Maven模块的模版，比如创建web项目，创建Spring项目等等. mvn archetype提供了一种命令行交互式创建Maven项目或者模块的方式， mvn archetype 1.在LearnMaven-ch03目录下，执行命令mvn archetype:gener
【Java命令三】jps bit1129 Java命令
jps很简单，用于显示当前运行的Java进程，也可以连接到远程服务器去查看 [hadoop@hadoop bin]$ jps -help usage: jps [-help] jps [-q] [-mlvV] [<hostid>] Definitions: <hostid>: <hostname>[:
ZABBIX2.2 2.4 等各版本之间的兼容性 ronin47
zabbix更新很快，从2009年到现在已经更新多个版本，为了使用更多zabbix的新特性，随之而来的便是升级版本，zabbix版本兼容性是必须优先考虑的一点客户端AGENT兼容 zabbix1.x到zabbix2.x的所有agent都兼容zabbix server2.4：如果你升级zabbix server，客户端是可以不做任何改变，除非你想使用agent的一些新特性。 Zabbix代理（p
unity 3d还是cocos2dx哪个适合游戏？ brotherlamp unity自学 unity教程 unity视频 unity资料 unity
unity 3d还是cocos2dx哪个适合游戏？问：unity 3d还是cocos2dx哪个适合游戏？答：首先目前来看unity视频教程因为是3d引擎，目前对2d支持并不完善，unity 3d 目前做2d普遍两种思路，一种是正交相机，3d画面2d视角，另一种是通过一些插件，动态创建mesh来绘制图形单元目前用的较多的是2d toolkit，ex2d，smooth moves，sm2，
百度笔试题：一个已经排序好的很大的数组，现在给它划分成m段，每段长度不定，段长最长为k，然后段内打乱顺序，请设计一个算法对其进行重新排序 bylijinnan java 算法面试百度招聘
import java.util.Arrays; /** * 最早是在陈利人老师的微博看到这道题： * #面试题#An array with n elements which is K most sorted，就是每个element的初始位置和它最终的排序后的位置的距离不超过常数K * 设计一个排序算法。It should be faster than O(n*lgn)。
获取checkbox复选框的值 chiangfai checkbox
<title>CheckBox</title> <script type = "text/javascript"> doGetVal: function doGetVal() { //var fruitName = document.getElementById("apple").value;//根据
MySQLdb用户指南 chenchao051 mysqldb
原网页被墙，放这里备用。 MySQLdb User's Guide Contents Introduction Installation _mysql MySQL C API translation MySQL C API function mapping Some _mysql examples MySQLdb
HIVE 窗口及分析函数 daizj hive 窗口函数分析函数
窗口函数应用场景：（1）用于分区排序（2）动态Group By （3）Top N （4）累计计算（5）层次查询一、分析函数用于等级、百分点、n分片等。函数说明 RANK() &nbs
PHP ZipArchive 实现压缩解压Zip文件 dcj3sjt126com PHP zip
PHP ZipArchive 是PHP自带的扩展类，可以轻松实现ZIP文件的压缩和解压，使用前首先要确保PHP ZIP 扩展已经开启，具体开启方法就不说了，不同的平台开启PHP扩增的方法网上都有，如有疑问欢迎交流。这里整理一下常用的示例供参考。一、解压缩zip文件 01 02 03 04 05 06 07 08 09 10 11
精彩英语贺词 dcj3sjt126com 英语
I'm always here 我会一直在这里支持你 &nb
基于Java注解的Spring的IoC功能 e200702084 java spring bean IOC Office
java模拟post请求 geeksun java
一般API接收客户端（比如网页、APP或其他应用服务）的请求，但在测试时需要模拟来自外界的请求，经探索，使用HttpComponentshttpClient可模拟Post提交请求。此处用HttpComponents的httpclient来完成使命。 import org.apache.http.HttpEntity ; import org.apache.http.HttpRespon
Swift语法之 ---- ?和!区别 hongtoushizi ?swift !
转载自： http://blog.sina.com.cn/s/blog_71715bf80102ux3v.html Swift语言使用var定义变量，但和别的语言不同，Swift里不会自动给变量赋初始值，也就是说变量不会有默认值，所以要求使用变量之前必须要对其初始化。如果在使用变量之前不进行初始化就会报错： var stringValue : String //
centos7安装jdk1.7 jisonami jdk centos
安装JDK1.7 步骤1、解压tar包在当前目录 [root@localhost usr]#tar -xzvf jdk-7u75-linux-x64.tar.gz 步骤2：配置环境变量在etc/profile文件下添加 export JAVA_HOME=/usr/java/jdk1.7.0_75 export CLASSPATH=/usr/java/jdk1.7.0_75/lib
数据源架构模式之数据映射器 home198979 PHP 架构数据映射器 datamapper
前面分别介绍了数据源架构模式之表数据入口、数据源架构模式之行和数据入口数据源架构模式之活动记录，相较于这三种数据源架构模式，数据映射器显得更加“高大上”。一、概念数据映射器（Data Mapper）：在保持对象和数据库（以及映射器本身）彼此独立的情况下，在二者之间移动数据的一个映射器层。概念永远都是抽象的，简单的说，数据映射器就是一个负责将数据映射到对象的类数据。 &nb
在Python中使用MYSQL pda158 mysql python
缘由　　近期在折腾一个小东西须要抓取网上的页面。然后进行解析。将结果放到数据库中。　　了解到 Python在这方面有优势，便选用之。　　由于我有台 server上面安装有 mysql，自然使用之。在进行数据库的这个操作过程中遇到了不少问题，这里记录一下，大家共勉。　　 python中mysql的调用　　百度之后能够通过MySQLdb进行数据库操作。
单例模式 hxl1988_0311 java 单例设计模式单件
package com.sosop.designpattern.singleton; /* * 单件模式：保证一个类必须只有一个实例，并提供全局的访问点 * * 所以单例模式必须有私有的构造器，没有私有构造器根本不用谈单件 * * 必须考虑到并发情况下创建了多个实例对象 * */ /** * 虽然有锁，但是只在第一次创建对象的时候加锁，并发时不会存在效率
27种迹象显示你应该辞掉程序员的工作 vipshichg 工作
1、你仍然在等待老板在2010年答应的要提拔你的暗示。 2、你的上级近10年没有开发过任何代码。 3、老板假装懂你说的这些技术，但实际上他完全不知道你在说什么。 4、你干完的项目6个月后才部署到现场服务器上。 5、时不时的，老板在检查你刚刚完成的工作时，要求按新想法重新开发。 6、而最终这个软件只有12个用户。 7、时间全浪费在办公室政治中，而不是用在开发好的软件上。 8、部署前5分钟才开始测试。