rosqin

《A Model of Saliency-based Visual Attention for Rapid Scene Analysis》翻译和笔记

原文链接：A Model of Saliency-based Visual Attention for Rapid Scene Analysis

以机翻为主，人工校对。

摘要

A visual attention system, inspired by the behavior and the neuronal architecture of the early primate visual system, is presented. Multiscale image features are combined into a single topographical saliency map. A dynamical neural network then selects attended locations in order of decreasing saliency. The system breaks down the complex problem of scene understanding by rapidly selecting, in a computationally efficient manner, conspicuous locations to be analyzed in detail.

受灵长类早期视觉系统的行为和神经结构启发，本文提出了一种视觉注意力系统，将多尺度的图像特征组合成单一的地形状的显著图。然后，使用动态神经网络按照显著性降低的顺序选择注意力位位置。该系统通过快速选择需要详细分析的显著位置，通过高效的计算解决了复杂的场景理解问题。

I. 前言

Primates have a remarkable ability to interpret complex scenes in real time, despite the limited speed of the neuronal hardware available for such tasks. Intermediate and higher visual processes appear to select a subset of the available sensory information before further processing [1], most likely to reduce the complexity of scene analysis [2]. This selection appears to be implemented in the form of a spatially circumscribed region of the visual field, the so-called “focus of attention”, which scans the scene both in a rapid, bottom-up, saliency-driven and taskindependent manner as well as in a slower, top-down, volitioncontrolled and task-dependent manner [2]

尽管可用于此类任务的神经元硬件速度有限，但灵长类动物具有出色的实时解释复杂场景的能力。中级和高级视觉过程似乎在进一步处理之前选择了可用的感官信息的子集[1]，最有可能降低场景分析的复杂性[2]。这种选择似乎是以视野的空间外接区域（即所谓的“关注焦点”）的形式实现的，该区域以快速，自下而上，显著性驱动且独立于任务的方式扫描场景以及以较慢，自上而下，由自愿控制和依赖任务的方式[2]。

Models of attention include “dynamic routing” models, in which information from only a small region of the visual field can progress through the cortical visual hierarchy. The attended region is selected through dynamic modifications of cortical connectivity, or through the establishment of specific temporal patterns of activity, under both top-down (task-dependent) and bottom-up (scene-dependent) control [3], [2], [1].

注意力模型包括一些“动态路由”模型，在这种模型中，来自视野小区域的信息可以通过皮质视觉层次结构进行传递。在自上而下（取决于任务）和自下而上（取决于场景）的控制下，通过动态修改皮质连接性或通过建立特定的活动时间模式来选择参与区域[3]，[2] ，[1]。

The model proposed here (Fig. 1) builds on a second biologically-plausible architecture, proposed by Koch and Ullman [4] and at the basis of several models [5], [6]. It is related to the so-called “feature integration theory”, proposed to explain human visual search strategies [7]. Visual input is first decomposed into a set of topographic feature maps. Different spatial locations then compete for saliency within each map, such that only locations which locally stand out from their surround can persist. All feature maps feed, in a purely bottom-up manner, into a master “saliency map”, which topographically codes for local conspicuity over the entire visual scene. In primates, such a map is believed to be located in the posterior parietal cortex [8] as well as in the various visual maps in the pulvinar nuclei of the thalamus [9]. The model’s saliency map is endowed with internal dynamics which generate attentional shifts. This model consequently represents a complete account for bottom-up saliency, and does not require any top-down guidance to shift attention. This framework provides a massively parallel method for the fast selection of a small number of interesting image locations to be analyzed by more complex and time-consuming ob ject recognition processes. Extending this approach, in “guided search” feedback from higher cortical areas (e.g., knowledge about targets to be found) was used to weight the importance of different features [10], such that only those with high weights could reach higher processing levels.

图1 模型的大致结构

这里提出的模型（图1）建立在第二种生物学上合理的体系结构上，该体系是由Koch和Ullman提出的[4]，并且是在几种模型[5]，[6]的基础上提出的。它与所谓的“特征整合理论”有关，旨在解释人类视觉搜索策略[7]。首先将视觉输入分解为一组地形特征图。然后，不同的空间位置争夺每个地图内的显着性，因此只有在其周围局部突出的位置才能持续存在。所有功能图都以纯自下而上的方式馈入主“显着图”，该图在地形上编码了整个视觉场景中的局部显眼性。在灵长类动物中，这种图谱被认为位于顶叶后皮质[8]以及丘脑的髓核中的各种视觉图谱[9]。该模型的显着性图具有内部动力，该动力会引起注意力转移。因此，此模型代表了自下而上的显着性的完整说明，不需要任何自上而下的指导即可转移注意力。该框架提供了一种大规模并行方法，用于快速选择少量有趣的图像位置，以通过更复杂且耗时的对象识别过程进行分析。扩展此方法后，在来自较高皮层区域的“引导搜索”反馈（例如，有关要找到的目标的知识）中，对不同特征的重要性进行加权[10]，从而只有那些具有较高权重的特征才能达到更高的处理水平。

II. 模型

Input is provided in the form of static color images, usually digitized at 640 $\times$ 480 resolution. Nine spatial scales are created using dyadic Gaussian pyramids [11], which progressively lowpass filter and subsample the input image, yielding horizontal and vertical image reduction factors ranging from 1:1 (scale 0) to 1:256 (scale 8) in eight octaves.

输入以静态彩色图像的形式提供，通常以640 $\times$ 480分辨率进行数字化。利用[11]并矢高斯金字塔建立了9个空间尺度，逐步低通滤波并对输入图像进行次采样，在8个八度音阶中产生水平和垂直图像的降阶因子，从1:1（尺度0）到1:256（尺度8）不等。

Each feature is computed by a set of linear “center-surround” operations akin to visual receptive fields (Fig. 1): Typical visual neurons are most sensitive in a small region of the visual space (the center), while stimuli presented in a broader, weaker antagonistic region concentric with the center (the surround) inhibit the neuronal response. Such architecture, sensitive to local spatial discontinuities, is particularly well-suited to detecting locations which locally stand out from their surround, and is a general computational principle in the retina, lateral geniculate nucleus and primary visual cortex [12]. Center-surround is implemented in the model as the difference between fine and coarse scales: The center is a pixel at scale $\in \{2,3,4\}$ , and the surround is the corresponding pixel at scale $s=c+\delta$ , with $\delta \in \{3,4\}$ . Across-scale difference between two maps, denoted $\ominus$ below, is obtained by interpolation to the finer scale and point-by-point subtraction. Using several scales not only for $c$ , but also for $\delta = s - c$ , yields truly multiscale feature extraction, by including different size ratios between the center and surround regions (contrary to previously used fixed ratios [5]).

每个特征都是通过一组类似于视觉感受野的线性“中心-周围”操作来计算的（图1）：典型的视觉神经元在视觉空间的一小部分（中心）最敏感，而刺激则呈现在视觉空间中。与中心（周围）同心的较宽，较弱的拮抗区域抑制神经元反应。这种对局部空间不连续性敏感的体系结构特别适合于检测从周围区域局部突出的位置，并且是视网膜，外侧膝状核和初级视皮层的一般计算原理[12]。中心环绕在模型中实现为细刻度和粗刻度之间的差异：中心是比例尺 $\in \{2,3,4\}$ 的像素，周围是比例为 $s=c+\delta$ 的相应像素，其中 $\delta \in \{3,4\}$ 。通过内插至更细的比例和逐点减法，可以得到两个地图之间的跨比例差异，以下用 $\ominus$ 表示。通过在中心区域和周围区域之间包括不同的大小比例（与以前使用的固定比例相反[5]），不仅对c，而且对 $\delta = s - c$ 使用多个比例，都可以产生真正的多尺度特征提取。

A. Extraction of early visual features 提取早期视觉特征

With $r$ , $g$ and $b$ being the red, green and blue channels of the input image, an intensity image $I$ is obtained as $I = (r + g + b) = 3$ . $I$ is used to create a Gaussian pyramid $I(\sigma)$ , where $\sigma \in [0,..,8]$ is the scale. The $r$ , $g$ and $b$ channels are normalized by $I$ in order to decouple hue from intensity. However, because hue variations are not perceivable at very low luminance (and hence are not salient), normalization is only applied at the locations where $I$ is larger than 1/10 of its maximum over the entire image (other locations yield zero $r$ ; $g$ and $b$ ). Four broadly tuned color channels are created: $R = r - (g + b) / 2$ for red, $G = g - (r + b) / 2$ for green, $B = b - (r + g) / 2$ for blue, and $Y = (r + g) / 2 - ∣ r - g ∣ / 2 - b$ for yellow (negative values are set to zero). Four Gaussian pyramids R( $\sigma$ ); G( $\sigma$ ); B( $\sigma$ ) and Y( $\sigma$ ) are created from these color channels.

用 $r$ ， $g$ 和 $b$ 分别表示输入图像的红色，绿色和蓝色通道，获得的亮度图像为 $I = (r + g + b) / 3$ 。 $I$ 用于创建高斯金字塔 $I(\sigma)$ ，其中 $\sigma \in [0,..,8]$ 是尺度因子。 $r$ ， $g$ 和 $b$ 通道通过 $I$ 进行归一化，以使色调与亮度脱钩。但是，由于在非常低的亮度下无法感知色相变化（因此也不显着），因此仅在 $I$ 大于整个图像最大值的1/10的位置进行归一化（其他位置产生 $r$ 、 $g$ 和 $b$ ）。创建了四个扩展颜色通道： $R = r - (g + b) / 2$ 表示红色， $G = g - (r + b) / 2$ 表示绿色， $B = b - (r + g) / 2$ 代表蓝色， $Y = (r + g) / 2 - ∣ r - g ∣ / 2 - b$ 代表黄色（负值设置为零）。四个高斯金字塔 $R(\sigma)$ 、 $G(\sigma)$ 、 $B(\sigma)$ 和 $R(\sigma)$ 从这些颜色通道创建。

Center-surround differences (defined previously) between a “center” fine scale c and a “surround” coarser scale s yield the feature maps. The first set of feature maps is concerned with intensity contrast, which in mammals is detected by neurons sensitive either to dark centers on bright surrounds, or to bright centers on dark surrounds [12]. Here, both types of sensitivities are simultaneously computed (using a rectification) in a set of six maps I

特征图通过小尺度“中心” $c$ 和“周围” 大尺度 $s$ 之间的中心-环绕差异（先前用 $\ominus$ 定义）来生成。第一组特征图与亮度的对比度有关，强度对比度在哺乳动物中是由对明亮周围的暗中心或黑暗周围的亮中心敏感的神经元检测到的[12]。在这里，在由六图一组的 $\mathcal{I}(c,s)$ 中同时（使用校正）计算了两种类型的灵敏度。其中 $\in \{2,3,4\}$ ， $s=c+\delta, \delta \in \{3,4\}$ 。
$\mathcal{I}(c,s)=|I(c)\ominus I(s)| \tag{1}$

A second set of maps is similarly constructed for the color channels, which in cortex are represented using a so-called “color double-opponent” system: In the center of their receptive field, neurons are excited by one color (e.g., red) and inhibited by another (e.g., green), while the converse is true in the surround. Such spatial and chromatic opponency exists for the red/green, green/red, blue/yellow and yellow/blue color pairs in human primary visual cortex [13]. Accordingly, maps $\mathcal{RG}(c,s)$ are created in the model to simultaneously account for red/green and green/red double opponency (Eq. 2), and $\mathcal{BY}(c,s)$ for blue/yellow and yellow/blue double opponency (Eq. 3):

类似地，为颜色通道构建第二组映射，在皮层中使用所谓的“颜色双对手”系统表示该映射：在其接受域的中心，神经元被一种颜色（例如红色）激发。并被另一个（例如绿色）禁止，而在环绕中则相反。在人类初级视觉皮层中，红色/绿色，绿色/红色，蓝色/黄色和黄色/蓝色对存在这样的空间和色彩对立[13]。因此，在模型中创建了地图 $\mathcal{RG}(c,s)$ ，以同时说明红色/绿色和绿色/红色双对像度（公式2），而 $\mathcal{RG}(c,s)$ 表示蓝色/黄色和黄色/蓝色双对像度（公式3）：

$\mathcal{RG}(c,s)=\left|\left(R(c) - G(c)\right) \ominus \left(R(s) - G(s)\right)\right| \tag{2}$ $\mathcal{BY}(c,s)=\left|\left(B(c) - Y(c)\right) \ominus \left(B(s) - Y(s)\right)\right| \tag{3}$

Local orientation information is obtained from I using oriented Gabor pyramids $O(\sigma,\theta)$ , where $\sigma \in [0,..8]$ represents the scale and $\theta \in \{0\degree, 45\degree, 90\degree, 135\degree \}$ is the preferred orientation [11]. Gabor filters, which are the product of a cosine grating and a 2D Gaussian envelope, approximate the receptive field sensitivity profile (impulse response) of orientation-selective neurons in primary visual cortex [12].) Orientation feature maps, $,\theta)$ , encode, as a group, local orientation contrast between the center and surround scales:

局部方位信息由I使用定向Gabor金字塔 $O(\sigma,\theta)$ 获得，其中 $\sigma \in [0,..,8]$ 表示尺度， $\theta \in \{0\degree, 45\degree, 90\degree, 135\degree \}$ 为首选方位[11]。Gabor滤波器是余弦光栅和二维高斯包络线的乘积，它近似于初级视觉皮层[12]中定向选择性神经元的接受场敏感度轮廓(脉冲响应)。方向特征图 $O(c,s,\theta)$ 编码为一组，中心和环绕尺度之间的局部方向对比（公式4）：
$O(c,s,\theta)=|O(c,\theta)\ominus O(s,\theta)| \tag{4}$

In total, 42 feature maps are computed: Six for intensity, 12 for color and 24 for orientation.

总共需要计算42个特征图：6个和亮度相关；12个和颜色相关；24个和方向相关。

B. The Saliency Map 把特征图组合为显著图

The purpose of the saliency map is to represent the conspicuity - or “saliency” - at every ocation in the visual field by a scalar quantity, and to guide the selection of attended locations, based on the spatial distribution of saliency. A combination of the feature maps provides bottom-up input to the saliency map, modeled as a dynamical neural network.

显著图的目的是通过一个标量表示视场中每个位置的显著性，并基于显著性的空间分布来指导关注位置的选择。特征图的组合为显著图提供自底向上的输入，并建模为动态神经网络。

One difficulty in combining different feature maps is that they represent a priori not comparable modalities, with different dynamic ranges and extraction mechanisms. Also, because all 42 feature maps are combined, salient objects appearing strongly in only a few maps may be masked by noise or less salient objects present in a larger number of maps.

结合不同的特征图的一个困难是，它们代表了一种 先验的 而不是可比较的模式，具有不同的动态范围和提取机制。此外，由于所有42个特征图都被合并，因此仅在少数几个图中存在的显著特征可能会被其余大量特征图中存在的噪声或显著性较低的特征所掩盖。

In the absence of top-down supervision, we propose a map normalization operator, $\mathcal{N}(\cdot)$ , which globally promotes maps in which a small number of strong peaks of activity (conspicuous locations) is present, while globally suppressing maps which contain numerous comparable peak responses. $\mathcal{N}(\cdot)$ consists of (Fig. 2): 1) Normalizing the values in the map to a fixed range $[0, . ., M]$ , in order to eliminate modality-dependent amplitude differences; 2) finding the location of the map’s global maximum $M$ and computing the average $\bar m$ of all its other local maxima; 3) globally multiplying the map by $\bar m)^2$ .

在缺乏自上而下的监督的情况下，我们提出了一种特征图归一化运算符 $\mathcal{N}(\cdot)$ ，该运算符可在全局范围加强少量强激活峰（明显位置）的特征，同时在全局范围抑制那些包含很多可对这个强激活峰造成干扰的峰值响应。 $\mathcal{N}(\cdot)$ 由以下几部分组成（图2）：1）将特征中的值归一化为固定范围 $[0, . ., M]$ 的值，以消除依赖于模态的幅度差异； 2）查找特征的全局最大值 $M$ 的位置，并计算其所有其他局部最大值的平均值 $\bar m$ ； 3）将特征的值全部乘以 $\bar m)^2$ 。

图2 归一化操作

Only local maxima of activity are considered such that $\mathcal{N}(\cdot)$ compares responses associated with meaningful “activitation spots” in the map and ignores homogenous areas. Comparing the maximum activity in the entire map to the average over all activation spots measures how different the most active location is from the average. When this difference is large, the most active location stands out, and we strongly promote the map. When the difference is small, the map contains nothing unique and is suppressed. The biological motivation behind the design of $\mathcal{N}(\cdot)$ is that it coarsely replicates cortical lateral inhibition mechanisms, in which neighboring similar features inhibit each other via specific, anatomically-defined connections [15].

仅考虑活动的局部最大值，以便 $\mathcal{N}(\cdot)$ 只对与特征图中有意义的“激活点”相关的响应进行比较，而忽略同质区域。将整个特征图中的最大活动与所有激活点的平均值进行比较，可以衡量最活跃的位置与平均值之间的差异。当差异很大时，最活跃的位置会脱颖而出，因此我们会大力提升该特征。当差异很小时，特征图将不包含任何独特的内容并被抑制。 $\mathcal{N}(\cdot)$ 设计背后的生物学动机是，它粗略地复制了皮质侧向抑制机制，其中相邻的相似特征通过特定的、解剖学上定义的连接彼此抑制[15]。

Feature maps are combined into three “conspicuity maps”, $\bar\mathcal{I}$ for intensity (Eq. 5), $\bar\mathcal{C}$ for color (Eq. 6), and $\bar\mathcal{O}$ orientation (Eq. 7), at the scale $(\sigma = 4)$ of the saliency map. They are obtained through across-scale addition, $\oplus$ , which consists of reduction of each map to scale 4 and point-by-point addition:

特征图在 $\sigma = 4$ 的情况下组合为三个“醒目图（conspicuity maps）”， $\bar\mathcal{I}$ 代表亮度（公式 5）， $\bar\mathcal{C}$ 代表颜色（公式 6）和 $\bar\mathcal{O}$ 代表方向（公式 7）。它们是通过跨比例加法 $\oplus$ 获得的，其中包括将每个特征图缩小为 $s = 4$ 并逐点添加：

$\bar\mathcal{I}=\oplus_{c=2}^{4}\oplus_{s=c+3}^{c+4}\mathcal{N}\left(\mathcal{I}(c,s)\right)\tag{5}$ $\bar\mathcal{C}=\oplus_{c=2}^{4}\oplus_{s=c+3}^{c+4}\left[\mathcal{N}\left(\mathcal{RG}(c,s)\right)+\mathcal{N}\left(\mathcal{BY}(c,s)\right)\right]\tag{6}$

For orientation, four intermediary maps are first created by combination of the six feature maps for a given $\theta$ , and are then combined into a single orientation conspicuity map:

对于方向，首先将给定 $\theta$ 的6个特征地图组合创建4个中间地图，然后组合成单个方向的显著图：

$\bar\mathcal{O}=\sum_{\theta \in \{0\degree,45\degree,90\degree,135\degree\}}\mathcal{N}\left(\oplus_{c=2}^{4}\oplus_{s=c+3}^{c+4}\mathcal{N}\left(\mathcal{O}(c,s,\theta)\right)\right)\tag{7}$

The motivation for the creation of three separate channels, $\bar\mathcal{I}$ , $\bar\mathcal{C}$ and $\bar\mathcal{O}$ , and their individual normalization is the hypothesis that similar features compete strongly for saliency, while different modalities contribute independently to the saliency map. The three conspicuity maps are normalized and summed into the final input $\mathcal{S}$ to the saliency map:

创建三个独立的通道 $\bar\mathcal{I}$ ， $\bar\mathcal{C}$ 和 $\bar\mathcal{O}$ 以及对其进行各自归一化的动机是假设相似的特征在显著性上产生强烈竞争，而不同的形态也独立影响显著性。将三个显著性图归一化，并求和为显著性图的最终输入 $\mathcal{S}$ （公式8）：
$\mathcal{S}=\frac{1}{3}\left(\mathcal{N}(\bar\mathcal{I})+\mathcal{N}(\bar\mathcal{C})+\mathcal{N}(\bar\mathcal{O})\right)\tag{8}$

At any given time, the maximum of the saliency map (SM) defines the most salient image location, to which the focus of attention (FOA) should be directed. We could now simply select the most active location as defining the point where the model should next attend to. However, in a neuronally-plausible implementation, we model the SM as a 2D layer of leaky integrate-and-fire neurons at scale 4. These model neurons consist of a single capacitance which integrates the charge delivered by synaptic input, of a leakage conductance, and of a voltage threshold. When threshold is reached, a prototypical spike is generated, and the capacitive charge is shunted to zero [14]. The SM feeds into a biologically-plausible 2D “winner-take-all” (WTA) neural network [4], [1] at scale $\sigma = 4$ , in which synaptic interactions among units ensure that only the most active location remains, while all other locations are suppressed.

在任何给定时间，显著图（SM）的最大值定义了关注焦点（FOA）应指向的最显着图像位置。我们现在可以简单地选择最活跃的位置来定义模型下一次应关注的点。但是，在神经元似是而非的实现方式中，我们将SM建模为第4标度的泄漏 整合并发射 神经元的2D层。这些模型神经元由单个电容组成，它整合了由突触输入、泄漏电导和电压阈值所传递的电荷。当阈值达到时，产生一个典型的脉冲，电容电荷被分流到零[14]。 SM在尺度 $\sigma = 4$ 上馈入生物学上可行的二维“赢家通吃”（WTA）神经网络[4]，[1]，其中各单元之间的突触相互作用确保仅保留最活跃的位置，而所有其他位置均被抑制。

The neurons in the SM receive excitatory inputs from $\mathcal{S}$ and are all independent. The potential of SM neurons at more salient locations hence increases faster (these neurons are used as pure integrators and do not fire). Each SM neuron excites its corresponding WTA neuron. All WTA neurons also evolve independently of each other, until one (the “winner”) first reaches threshold and fires. This triggers three simultaneous mechanisms (Fig. 3): 1) The FOA is shifted to the location of the winner neuron; 2) the global inhibition of the WTA is triggered and completely inhibits (resets) all WTA neurons; 3) local inhibition is transiently activated in the SM, in an area with the size and new location of the FOA; this not only yields dynamical shifts of the FOA, by allowing the next most salient location to subsequently become the winner, but it also prevents the FOA from immediately returning to a previously attended location. Such an “inhibition of return” has been demonstrated in human visual psychophysics [16]. In order to slightly bias the model to subsequently jump to salient locations spatially close to the currently attended location, a small excitation is transiently activated in the SM, in a near surround of the FOA (“proximity preference” rule of Koch and Ullman [4]).

SM中的神经元从 $\mathcal{S}$ 接收兴奋性输入，并且都是独立的。因此，SM神经元在更多显着位置的电势增加更快（这些神经元被用作纯积分器，不会激发）。每个SM神经元都激发其相应的WTA神经元。所有WTA神经元也彼此独立地进化，直到一个（“优胜者”）首先达到阈值并触发。这触发了三个同时发生的机制（图3）：1）FOA转移到了胜利者神经元的位置； 2）触发对WTA的全局抑制，并完全抑制（重置）所有WTA神经元； 3）在具有FOA的大小和新位置的区域中，SM中的局部抑制被瞬时激活；通过允许下一个最显着的位置随后成为赢家，这不仅产生了FOA的动态变化，而且还阻止了FOA立即返回到以前参加的位置。这种“抑制返回”已在人类视觉心理物理学中得到证明[16]。为了稍微偏向模型，使其随后跳转到空间上接近当前关注位置的显着位置，在FOA的近处（Koch和Ullman的“邻近偏好”规则[4]中，在SM中短暂激活了一个小的激励。）

图3 模型处理自然图像的操作示例。并行特征提取会生成三个对比图，分别是颜色对比 $\bar\mathcal{C}$ 、强度对比 $\bar\mathcal{I}$ 和方向对比 $\bar\mathcal{O}$ 。~这些被组合以形成对显着图（SM）的输入 $\mathcal{S}$ 。最明显的位置是橙色电话亭，它在 $\bar\mathcal{C}$ 中显得很强烈。它成为第一个被关注的位置（模拟时间为92毫秒）。在返回抑制反馈在显着图中抑制该位置之后，依次选择下一个最显着的位置。

Since we do not model any top-down attentional component, the FOA is a simple disk whose radius is fixed to one sixth of the smaller of the input image width or height. The time constants, conductances, and firing thresholds of the simulated neurons were chosen (see ref. [17] for details) so that the FOA jumps from one salient location to the next in approximately 30-70 ms (simulated time), and that an attended area is inhibited for approximately 500-900 ms (Fig. 3), as has been observed psychophysically [16]. The difference in the relative magnitude of these delays proved sufficient to ensure thorough scanning of the image, and prevented cycling through only a limited number of locations. All parameters are fixed in our implementation [17], and the system proved stable in time for all images studied.

由于我们没有对任何自上而下的注意力成分进行建模，因此FOA是一个简单的圆盘，其半径固定为输入图像宽度或高度中较小者的六分之一。选择模拟神经元的时间常数，电导和激发阈值（有关详细信息，请参见参考文献[17]），以便FOA在大约30 ~ 70 ms（模拟时间）内从一个显着位置跳到下一个显着位置，并且正如心理上观察到的那样，有人照看的区域被抑制了大约500 ~ 900 ms（图3）。这些延迟的相对大小差异被证明足以确保彻底扫描图像，并防止仅在有限数量的位置循环。所有参数在我们的实现中都是固定的[17]，并且该系统对于所研究的所有图像在时间上都证明是稳定的。

C. Comparison with spatial frequency content models 与空间频率内容模型的比较

Reinagel and Zador [18] recently used an eye-tracking device to analyze the local spatial frequency distributions along eye scan paths generated by humans while free-viewing grayscale images. They found the spatial frequency content at the fixated locations to be significantly higher than, on average, at random locations. Although eye tra jectories can differ from attentional tra jectories under volitional control [1], visual attention is often thought as a pre-occulomotor mechanism, strongly in uencing free-viewing. It was hence interesting to investigate whether our model would reproduce the findings of Reinagel and Zador.

Reinagel和Zador [18]最近使用一种眼动追踪设备来分析人在自由观看灰度图像时沿人类产生的眼扫描路径的局部空间频率分布。他们发现，固定位置的空间频率含量明显高于随机位置的平均水平。尽管在自主控制下眼球轨迹与注意力轨迹可能有所不同[1]，但视觉注意力通常被认为是一种前共动机制，对自由观看有很强的诱导作用。因此，研究我们的模型是否能够重现Reinagel和Zador的发现很有趣。

We constructed a simple measure of spatial frequency content (SFC): At a given image location, a $16\times 16$ image patch is extracted from each $I (2)$ , $R (2)$ , $G (2)$ , $B (2)$ and $Y (2)$ map, and 2D Fast Fourier Transforms (FFTs) are applied to the patches. For each patch, a threshold is applied to compute the number of non-negligible FFT coefficients; the threshold corresponds to the FFT amplitude of a just perceivable grating (1% contrast). The SFC measure is the average of the numbers of non-negligible coefficients in the five corresponding patches. The size and scale of the patches were chosen such that the SFC measure is sensitive to approximately the same frequency and resolution ranges as our model; also, our SFC measure is computed in the RGB channels as well as in intensity, like the model. Using this measure, an SFC map is created at scale 4 for comparison with the saliency map (Fig. 4).

图4 彩色图像（a），相应的显著图输入（b），空间频率内容（SFC）示图（c），显着性图输入大于其最大值的98％的位置（d；黄色圆圈）的示例，以及SFC高于其最大值的98％（d；红色方块）的图像块。显著图对噪声非常鲁棒，而SFC则不然。

我们构造了一个简单的空间频率含量（SFC）度量：在给定的图像位置，从每个 $I (2)$ 、 $R (2)$ 、 $G (2)$ 、 $B (2)$ 和 $Y (2)$ 中提取一个 $16\times 16$ 的图像块，对进行2D快速傅里叶变换（FFT）。对于每个图像块，都用一个阈值来计算不可忽略的FFT系数的数量；阈值对应于可感知光栅的FFT幅度（对比度为1％）。 SFC度量是五个对应色块中不可忽略系数的数量的平均值。选择图片块的大小和尺度，以使SFC测量对与我们的模型大致相同的频率和分辨率范围敏感；同样，像模型一样，我们的SFC度量是在RGB通道以及强度中计算的。使用此度量，将在比例尺4处创建SFC映射，以与显著性映射进行比较（图4）。

III. 结果和讨论

Although the concept of a saliency map has been widely used in focus-of-attention models [1], [3], [7], little detail is usually provided about its construction and dynamics. Here we examine how the feedforward feature extraction stages, the map combination strategy, and the temporal properties of the saliency map all contribute to the overall system performance.

尽管显著图的概念已广泛用于关注焦点模型[1]，[3]，[7]，但通常很少提供有关其构造和动力学和动态的细节息。在这里，我们检查前馈特征提取阶段，图组合策略以及显著图的时间属性对整体系统性能的影响。

A. General performance 一般性能

The model was extensively tested with artificial images to ensure proper functioning. For example, several ob jects of same shape but varying contrast with the background were attended to in order of decreasing contrast. The model proved very robust to the addition of noise to such images (Fig. 5), particularly if the properties of the noise (e.g., its color) were not directly con icting with the main feature of the target.

噪声对检测性能的影响，以 $768\times 512$ 的场景为例，（两个人）因其独特的色彩对比度而凸显出来。在目标被发现之前，错误检测的平均值 $±S.E. \pm S.E.$ 为噪声采样50次时的噪声密度的函数。该系统对噪声有很强的鲁棒性，不会直接干扰目标的主要特征（左：强度噪声和彩色目标）。当噪声具有与目标相似的特性时，它将削弱目标的显着性，并且系统首先会关注其他特征（此处是强度的粗略变化）的显著对象。

该模型通过人工图像进行了广泛的测试，以确保正常运行。例如，按照对比度递减的顺序处理了几个形状相同但与背景对比度不同的对象。事实证明，向这些图像添加噪声，该模型的表现非常稳健(图5)，特别是当噪声的属性（例如其颜色）没有直接与目标的主要特征发生冲突时。

The model was able to reproduce human performance for a number of pop-out tasks [7], using images of the type shown in Fig. 2. When a target differed from an array of surrounding distractors by its unique orientation (like in Fig. 2), color, intensity or size, it was always the first attended location, irrespectively of the number of distractors. Contrarily, when the target differed from the distractors only by a conjunction of features (e.g., it was the only red-horizontal bar in a mixed array of red-vertical and green-horizontal bars), the search time necessary to find the target increased linearly with the number of distractors. Both results have been widely observed in humans [7], and are discussed in Section III-B.

使用图2所示的图像，该模型能够重现人类在许多弹出式任务[7]中的表现。当一个目标因其独特的方向(如图2所示)、颜色、强度或大小与周围一系列干扰物不同时，它总是成为第一个被关注的位置，而不管干扰物的数量是多少。相反，当目标与干扰物之间仅存在一些特征差异时(例如，它是红-垂直和绿-水平条混合数组中唯一的红-水平条)，寻找目标所需的搜索时间随干扰物数量线性增加。这两个结果在人类[7]中被广泛观察到，并在第III-B节中讨论。

We also tested the model with real images, ranging from natural outdoor scenes to artistic paintings, and using $\mathcal{N}(\cdot)$ to normalize the feature maps (Fig. 3 and ref. [17]). With many such images, it is difficult to ob jectively evaluate the model, because no ob jective reference is available for comparison, and observers may disagree on which locations are the most salient. However, in all images studied, most attended locations were objects of interest, such as faces, flags, persons, buildings or vehicles.

我们还使用从自然室外场景到艺术绘画的真实图像对模型进行了测试，并使用 $\mathcal{N}(\cdot)$ 来对特征图进行归一化（图3和参考文献[17]）。对于许多这样的图像，很难客观地评估模型，因为没有客观的参考可用于比较，并且观察者可能会在哪个位置最显着上存在分歧。但是，在所研究的所有图像中，被关注最多的位置是感兴趣的对象，例如面部，旗帜，人物，建筑物或车辆。

Model predictions were compared to the measure of local SFC, in an experiment similar to that of Reinagel and Zador [18], using natural scenes with salient traffic signs (90 images), red soda can (104 images), or vehicle’s emergency triangle (64 images). Similar to Reinagel and Zador’s findings, the SFC at attended locations was significantly higher than the average SFC, by a factor decreasing from $2.5\pm 0.05$ at the first attended location to $1.6 \pm 0.05$ at the 8th attended location. Although this result does not necessarily indicate similarity between human eye fixations and the model’s attentional tra jectories, it indicates that the model, like humans, is attracted to “informative” image locations, according to the common assumption that regions with richer spectral content are more informative. The SFC map was similar to the saliency map for most images (e.g., Fig. 4.1). However, both maps differed substantially for images with strong, extended variations of illumination or color (e.g., due to speckle noise): While such areas exhibited uniformly high SFC, they had low saliency because of their uniformity (Figs. 4.2, 4.3). In such images, the saliency map was usually in better agreement with our sub jective perception of saliency. Quantitatively, for the 258 images studied here, the SFC at attended locations was significantly lower than the maximum SFC, by a factor decreasing from $0.90 \pm 0.02$ at the first attended location to $0.55 \pm 0.05$ at the 8th attended location, While the model was attending to locations with high SFC, these were not necessarily the locations with highest SFC. It consequently seems that saliency is more than just a measure of local SFC. The model, which implements within-feature spatial competition captured sub jective saliency better than the purely local SFC measure.

在一个类似Reinagel和Zador[18]的实验中，模型预测与局部SFC的测量相比较，使用具有显著交通标志的自然场景(90张图像)、红色苏打罐(104张图像)或车辆的紧急三角形(64张图像)。与Reinagel和Zador的研究结果相似，在第一个参与者参与的地点，证券交易委员会显著高于平均水平，从2.5+0.05下降到1.6 +0.05，在第8个参与者参与的地点。虽然这一结果并不一定表明人眼注视和模型的注意轨迹之间的相似性，但它表明模型和人类一样，会被“信息化”的图像位置所吸引，这是根据普遍的假设，光谱含量越丰富的区域信息越丰富。SFC图与大多数图像的显著性图相似(如图4.1)。然而，对于光照或颜色变化强烈、扩展的图像(例如，由于散斑噪声)，这两种地图存在显著差异:虽然这些区域显示出均匀的高SFC，但由于它们的均匀性，它们的显著性较低(图4.2、4.3)。在这些图像中，显著性图通常与我们对显著性的主观感知更一致。从数量上看，在这里研究的258张图像中，参加活动地点的SFC显著低于最大SFC，从第一次参加活动地点的0.90 + 0.02下降到第8次参加活动地点的0.55 + 0.05:而模型位置高的证监会,这些并不是证监会最高的位置。因此,卓越不仅仅是一个衡量当地证监会。模型,实现within-feature空间竞争捕获子jective显著比纯粹的当地证监会措施。

在类似于Reinagel和Zador [18]的实验中，使用具有明显交通标志的自然场景（90张图像），红色汽水罐（104张图像）或车辆紧急三角形（18张），将模型预测与本地SFC的测量值进行了比较。 64张图片）。与Reinagel和Zador的发现相似，被关注位置的SFC值明显高于平均SFC，第一关注位置的SFC是平均值的 $2.5\pm 0.05$ 倍，到第八关注位置的 $1.6 \pm 0.05$ 倍逐渐递减。虽然这一结果并不一定表明人眼注视和模型的注意轨迹之间的相似性，但它表明模型和人类一样，会被“信息化”的图像位置所吸引，这是根据普遍的假设，光谱含量越丰富的区域信息越丰富。对于大多数图像，SFC图与显著图相似(如图4.1)。
然而，对于光照或颜色变化强烈、扩展的图像(例如，由于散斑噪声)，这两种地图存在显著差异:虽然这些区域显示出均匀的高SFC，但由于它们的均匀性，它们的显著性较低(图4.2、4.3)。在这些图像中，显著性图通常与我们对显著性的主观感知更好地吻合。从数量上看，对于此处研究的258张图像，被关注位置的SFC显著低于最大SFC，从第一被关注位置的 $0.90 \pm 0.02$ 降到第八第一被关注位置的 $0.55 \pm 0.05$ 。当模型关注的是SFC较高的位置时，这些不一定是SFC最高的位置。因此，显著性似乎不仅仅是衡量局部SFC的指标。与仅使用局部SFC度量相比，实现功能内空间竞争的模型更能体现目标的显著性。 _{这段有点拗口}

B. Strengths and limitations 优势和局限性

We have proposed a model whose architecture and components mimic the properties of primate early vision. Despite its simple architecture and feedforward feature extraction mechanisms, the model is capable of strong performance with complex natural scenes. For example, it quickly detected salient trafficc signs of varied shapes (round, triangular, square, rectangular), colors (red, blue, white, orange, black) and textures (letter markings, arrows, stripes, circles), although it had not been designed for this purpose. Such strong performance reinforces the idea that a unique saliency map, receiving input from early visual processes, could effectively guide bottom-up attention in primates [4], [10], [5], [8]. From a computational viewpoint, the ma jor strength of this approach lies in the massively parallel implementation, not only of the computationally expensive early feature extraction stages, but also of the attention focusing system. More than previous models based extensively on relaxation techniques [5], our architecture could easily allow for real-time operation on dedicated hardware.

我们提出了一个模型，该模型的体系结构和组件模仿灵长类动物早期视力的特性。尽管其简单的体系结构和前馈特征提取机制，但该模型在复杂的自然场景下仍具有出色的性能。例如，它具有各种形状（圆形，三角形，正方形，矩形），颜色（红色，蓝色，白色，橙色，黑色）和纹理（字母标记，箭头，条纹，圆圈）的显着交通标志，尽管它具有不是为此目的而设计的。如此强大的性能强化了这样的想法，即独特的显着性图（从早期视觉过程中接收输入）可以有效地引导灵长类动物自下而上的注意力[4]，[10]，[5]，[8]。从计算的角度来看，这种方法的主要优势在于大规模并行实现，不仅是计算上昂贵的早期特征提取阶段，而且是注意力集中系统。与以前基于扩展技术[5]的模型相比，我们的体系结构可以轻松地在专用硬件上进行实时操作。

The type of performance which can be expected from this model critically depends on one factor: Only ob ject features explicitely represented in at least one of the feature maps can lead to pop-out, that is, rapid detection independently of the number of distracting ob jects [7]. Without modifying the preattentive feature extraction stages, our model cannot detect conjunctions of features. While our system immediately detects a target which differs from surrounding distractors by its unique size, intensity, color or orientation (properties which we have implemented because they have been very well characterized in primary visual cortex), it will fail at detecting targets salient for unimplemented feature types (e.g., T junctions or line terminators, for which the existence of specific neural detectors remains controversial). For simplicity, we also have not implemented any recurrent mechanism within the feature maps, and hence cannot reproduce phenomena like contour completion and closure, important for certain types of human pop-out [19]. In addition, at present our model does not include any magnocellular motion channel, known to play a strong role in human saliency [5].

该模型可以预期的性能类型主要取决于一个因素：只有在至少一个特征图中明确表示的物体特征才能导致弹出，即与分散物体数量无关的快速检测主题[7]。如果不修改注意力集中的特征提取阶段，我们的模型将无法检测特征的合取。尽管我们的系统会立即检测到一个与周围干扰物不同的目标，该目标的独特大小，强度，颜色或方向（我们已经实现了这些属性，因为它们已经在主要视觉皮层中很好地表征了），但它无法检测到未实现的目标特征类型（例如，T结或线终止符，其特定的神经检测器的存在仍然引起争议）。为简单起见，我们还没有在特征图中实现任何递归机制，因此无法重现轮廓完成和闭合之类的现象，这对于某些类型的人弹出窗口很重要[19]。另外，目前我们的模型不包括任何在人类显着性中起重要作用的大细胞运动通道[5]。

A critical model component is the normalization $\mathcal{N}(\cdot)$ , which provided a general mechanism for computing saliency in any situation. The resulting saliency measure implemented by the model, although often related to local SFC, was closer to human saliency because it implemented spatial competition between salient locations. Our feed-forward implementation of $\mathcal{N}(\cdot)$ is faster and simpler than previously proposed iterative schemes [5]. Neuronally, spatial competition effects similar to N have been observed in the non-classical receptive field of cells in striate and extrastriate cortex [15].

关键模型组件是归一化 $\mathcal{N}(\cdot)$ ，它为在任何情况下计算显著性提供了一种通用机制。该模型实现的显着性度量尽管通常与局部SFC有关，但由于它在显著位置之间进行了空间竞争，因此更接近于人类显著性。我们的 $\mathcal{N}(\cdot)$ 前馈实现比以前提出的迭代方案更快，更简单[5]。在神经元上，在纹状体和纹状体皮层的细胞的非经典感受野中观察到了类似于N的空间竞争效应[15]。

In conclusion, we have presented a conceptually simple computational model for saliency-driven focal visual attention. The biological insight guiding its architecture proved efficient in reproducing some of the performances of primate visual systems. The efficiency of this approach for target detection critically depends on the features types implemented. The framework presented here can consequently be easily tailored to arbitrary tasks through the implementation of dedicated feature maps.

总之，我们为显着性驱动的焦点视觉注意力提出了一个概念上简单的计算模型。指导其架构的生物学见解被证明可以有效地复制灵长类视觉系统的某些性能。这种用于目标检测的方法的效率主要取决于实现的功能类型。因此，通过实现专用特征图，可以轻松地为任意任务定制此处介绍的框架。

笔记需要用来写论文，暂时不公开。

你可能感兴趣的:(论文相关)

JAVA论文相关技术介绍（JAVA技术） Curry Peng java 开发语言
3.JAVA技术Java是一种广泛使用的编程语言，具有以下显著特点和优势：面向对象编程（OOP）：支持封装、继承和多态等特性，使代码更具模块化、可维护性和可扩展性。平台独立性：Java程序可以在不同的操作系统和硬件平台上运行，只需安装相应的Java运行环境（JRE）。安全性：提供了强大的安全机制，包括访问控制、字节码验证等，确保程序的安全性和稳定性。丰富的类库：拥有大量预先编写好的类和接口，涵盖了
【心得】科研上的一些里程碑式的心得溢流眼泪学习
【心得】科研上的一些里程碑式的心得前言里程碑心得PartI.论文相关PartII.课程相关PartIII.实验相关前言大致按照时间顺序记录一下自己在科研上的重要心得算是一个分享和记录性质的内容※不同人有不同的观点也是正常的可以对比一下每一条，自己是否有主动意识到里程碑心得PartI.论文相关意识到自己在搞科研，而不是在学习或者读书。主观态度很重要意识到需要看学术论文了，而不是随便的普通视频、书籍等
周四 2021-06-03 07:50 - 23:10 雨 08h56m 么得感情的日更机器
一时间记录二概述早上醒，论文相关，洗漱，吃饭上午看小说，吃饭下午论文相关，小说，毕业材料，看小说，宿舍聚餐晚上玩手机，日总结，洗澡，日常任务，睡觉三总结(Summary)今天主要毕业论文的一天，然后期间看小说，宿舍聚餐，今天的是2021-6-13补上的，其他时间都在狂浪，没有学习。四反思今天收获了什么？哪里做的不好？做得不好的原因是什么？以后怎样避免或改进？4.1今日收获论文看小说宿舍聚餐
深度了解Transformer【1】小菜学AI 基础相关深度学习人工智能 nlp
深度了解Transformer【1】前言：Transformer由论文《AttentionisAllYouNeed》提出，谷歌云TPU推荐的参考模型。论文相关的Tensorflow的代码可以从GitHub获取，其作为Tensor2Tensor包的一部分。哈佛的NLP团队也实现了一个基于PyTorch的版本，并注释该论文。本文采取逐一的介绍论文里面的核心概念，希望让普通读者也能轻易理解。论文下载：A
论文相关 tekiiiiiiii
https://blog.csdn.net/myofficials/article/details/79044969https://blog.csdn.net/myofficials/article/details/79057283https://blog.csdn.net/myofficials/article/details/79064484https://blog.csdn.net/wang
头疼的论文如何修改？分享三个修改技巧最爱M416
随着毕业越来越近，各大学校已经开始安排大四学生毕业论文相关事宜：选择论文导师、确定论文选题。众所周知，论文的发表从确认主题、查阅文献、科研试验、撰写论文、修改润色到最后的投递发表是一个漫长而艰辛的过程。然而一篇论文的完成并不是“炼狱”的结束，甚至可以说是痛苦的开始，反复的论文修改过程让大多数学子的毕业之旅“苦味横生”。下面为大家提供以下三个方向，方便大家修改自己的论文。一、修改参考文献第一步：是查
等论文写完后，需要做的事情绿萌萌
等论文写完了，沉下心来，做以下几件事：1.好好研究下自媒体，主要有：知乎、、豆瓣、微信公众号、bilibili。2.出一系列经历的一些视频教程：a.写论文相关：论文写作用到的工具-文字识别、公式编辑、文献管理（endnote）、文库下载软件、office、ppt模版；如何论文排版（wps）、如何使用endnote（下载、安装、引入文献、常见问题解决办法）。b.经营自媒体的个人经验：注册、定位、引流
语文分析蒲柔冰
在半期考试的成绩下来之后，我对自己的语文成绩是挺满意的，但是有些地方还是存在问题。这意味着，我仍然还有一部分进步的空间。首先，在语文早自习复习的时候，我将重点放在了生字词和文言文的原文、注释及翻译上，却没有注意议论文相关的知识。一般在七点半的听写过后，我就会一遍又一遍地翻看课文后的生字词，又或是去读课文下方的词语解释。但同时也忽略了一个很重要的知识点——议论文，尤其是关于论证方法的答题模板，我并没
无标题文章东方纤云_0bec
前言本人自幼喜欢读各种书籍，小说也读了不少，以前从玄幻，再到穿越，武侠，修仙，都市，历史，架空，都没少读，尤其高中时期还写了好几百章小说，但是碍于思想幼稚一直不好发布，之前的手稿也变成废纸了，如今本人大学也毕业3年了，总觉的儿时的梦想不能就此放弃，给自己一个机会，于是决定把自己关在小黑屋里埋头写一本小说，这本书的主要内容还是关于世界观，进化学说，还有我自己研究出的数学空间论文相关题材，脑洞很大，建
arXiv学术论文相关谁陪你落日流年论文
arXiv(发音同archive)是一个提供学术文章在线发表的服务器，领域涵盖物理学、数学、非线性科学、计算机科学、定量生命科学、计量金融学和统计学。arXiv名中的“X”对应于希腊字母“χ”（大写为“Χ”，发音chi)。故arXiv的本意即archive（文献库）。发表arXiv的论文不需要通过审核(peerreview)，因此被用作发表手稿或者预出版的论文。由于论文的审核工作一般需要几个月的时
Day 130-当日更变成一种习惯爱文字的心理人
不知不觉日更已经持续四个多月。当日更变成一种习惯，每天必须要做时，我发现自己越发爱上文字，爱上和自己的对话，尤其是在夜深人静毫无睡意的时候。此时睡不着是因为在李先生的催促和鼓励下看了好几篇和论文相关的文献，收获还是有的。比如，发现了好几个之前被我忽略掉的变量，找到了其他学校硕士论文的研究模式，有了新的做论文的思路。突然发现有些事情不是自己做不好，想不明白，而是自己不愿意去做不愿意去想。用李先生的话
生活简记丨你的论文写的怎么样啦小橘子橘子吖
今天是怎么回事，连着做了两个与论文相关的梦，我有点慌了……昨晚从实验室返回宿舍的路上，小雏菊给我发了一张图片，我点开扫了一眼，是一篇手写的文字。“呀，该不会是给我写的情书叭！”心中窃喜，我要回去偷偷地看。“我写的很认真哟！”紧接着传来她的消息。我心里更加确信无疑——情书！回到宿舍我迫不及待的放下东西，翻出那张图片，只见上面写的《XXXX结课论文》。我脑瓜子嗡嗡作响，这是啥！论文还要手写吗？害，睡梦
[期刊科普]期刊划分和分区：北大核心、南大核心、SCI、万方维普知网 Yuki_fx 杂谈期刊论文 SCI 北大核心知网
文章目录论文文献情况前言期刊情况按照期刊主办单位划分按照期刊主办单位级别划分按照类别划分按照内容划分按照是否合法划分按照核心目录、学术地位划分国内期刊遴选体系国外期刊遴选体系SCI期刊分区两种分区方法的差异后话查看期刊分区国内三大中文文献数据库的特点和差异从收录量和收录期刊起始年限上看从检索功能上看从文献收录范围上看参考资料论文文献情况前言进入大学之后，就会接触论文相关方面的内容，或者听闻某某老师
【科研工具】-论文相关子衿JDD 工具笔记论文阅读
科研工具1论文检索2论文阅读3论文写作4论文发表1论文检索计算机类英文文献检索数据库DBLP:只有论文基本信息（标题、作者等）；下载论文：知网\IEEE\ACM\SCI-Hub等，记得创建文件夹（检索词条、日期等）。2论文阅读readpaper：可以边看边搜参考文献；福昕阅读器:查看本地pdf。阅读顺序：先读学位论文了解基本概念、再读最新会议期刊论文。3论文写作标题、摘要、引言、相关研究、实验、总
02 Fast R-CNN阅读总结努力的小黑吖公众号文章目标检测深度学习算法
事先声明本文只是笔者对经典论文《FastR-CNN》的个人阅读总结，可能存在错误，请留言批评一起学习进步看这篇文章前，建议阅读01R-CNN阅读总结R-CNN系列论文之FastR-CNN1论文相关信息2什么是FastR-CNN？3FastR-CNN要解决的问题是什么？4FastR-CNN的贡献或创新点（以及一些个人看法）5结论及感悟6参考文献1论文相关信息标题：FastR-CNN时间：2015出版
华南农业大学计算机考研难吗,2020年华南农业大学计算机应用技术考研经验分享... 小股量化华南农业大学计算机考研难吗
该楼层疑似违规已被系统折叠隐藏此楼查看此楼专业目录①101|思想政治理论②201|英语一③301|数学一④830|计算机基础综合复试参考科目①01|综合素质面试②02|专业知识考核或84|毕业设计/论文相关问题④06|专业英语二、参考书1.《数据结构(C语言版)》严蔚敏清华大学出版社2.《数据结构题集(C语言版)》严蔚敏清华大学出版社3.《算法与数据结构考研试题精析》陈守孔机械工业出版社4.《计算
轻型目标检测模型算法相关资料风海skd 目标检测算法目标跟踪
文章目录轻量级目标检测的基本概念相关研究综述论文相关算法算法代码小结基本介绍轻量级目标检测是计算机视觉领域中的一个重要任务，旨在从图像或视频中检测出特定类别的物体并标定它们的位置。与传统的目标检测方法相比，轻量级目标检测侧重于在保持高检测准确性的同时，尽可能减少模型的计算和内存开销，以适应资源受限的环境，如移动设备、嵌入式系统等。以下是轻量级目标检测的一般步骤和一些常见的方法：数据准备和标注：收集
计算机必备网站&程序员必备&大学牲&编程科研人员源城编程哥工具教程深度学习程序人生文档资料前端后端
计算机人士必备网站-持续更新目录一、论文相关二、计算机教程三、私活四、健身五、在线运行代码六、自动化七、一些官网1.软著2.VUE脚手架3.aliyun4.Node.js5.leetcode6.kaggle:深度学习7.GITHUB八、英语学习网站九、合格的程序员老公养成目录这里整理了程序员必备的工具和网站，自用，整理不易，欢迎收藏加关注。有其它好的软件欢迎在评论区留言。持续更新一、论文相关最近在
系统架构师论文总结【持续更新】梦幻通灵软考系统架构软件工程
系统架构师考试是对计算机从业人员，以考代评的重要考试，近几年一直在参加考试，屡战屡败，后又屡败屡战，记录总结论文相关的知识点，方便考前查看。一、2010年论文1）论软件的静态演化和动态演化及其应用。2）论数据挖掘技术的应用3）论大规模分布式系统缓存设计策略。4）论软件可靠性评价二、2011年论文1）论模型驱动架构在系统开发中的应用2）论企业集成平台架构设计3）论企业架构管理与应用4）论软件需求获取
论文阅读（42）The More You Know: Using Knowledge Graphs for Image Classification 续袁
1.论文相关2.摘要人类与基于模型学习的计算机视觉算法区分开来的一个特点是，能够获取关于世界的知识，并利用这些知识对视觉世界进行推理。人类可以了解物体的特性以及它们之间的关系，从而学习各种各样的视觉概念，通常只用很少的例子。本文研究了结构化先验知识在知识图谱形式下的应用，表明利用该知识可以提高图像分类的性能。我们在最近关于图端到端学习的工作的基础上，引入了图搜索神经网络（GraphSearchNe
git push时出现error: src refspec master does not match any Jessie++ git github visual studio code
gitpush时出现error:srcrefspecmasterdoesnotmatchany问题方法参考附录问题今天第一次使用Git命令，想把已发表的论文相关代码上传至Github，但在最后一步gitpushoriginmaster时出现error:srcrefspecmasterdoesnotmatchany，尝试了gitpulloriginmaster，出现fatal:couldn’tfin
2022-4-15晨间日记木子冀
今天是什么日子起床：6：30就寝：0：30天气：晴心情：良好纪念日：任务清单昨日完成的任务，最重要的三件事：1.搜集论文资料。2.学习论文相关知识。3.和论文指导老师取得联系，准备第二学历毕业论文。和同事沟通人事科项目事宜，准备论文和结题汇报。改进：习惯养成：周目标·完成进度学习·信息·阅读健康·饮食·锻炼本周饮食习惯较好。每天坚持锻炼身体。人际·家人·朋友和家人沟通。工作·思考最美好的三件事1.
科研小白成长记35——one more round 塔希提经验分享
[论文相关]实习了大半年了，发了一篇顶会workshop，已经数不清多少次靠着workshop去参加顶会了。这感觉就像是在体育馆外面听ladygaga的演唱会一样，听到了，又没完全听到。顶会的门怕不是都被我敲破了，就是进不去。好在心态无敌，这得益于我遇到了一堆很好的mentor和崇尚鼓励式教育的导师，在我论文写到一大半对论文突然没了信念感想要放弃时，他们跟我说无论如何忍着恶心将它写完，后来渐渐跟自
提高边缘分割精度-边缘区域Dice损失函数维度攻城狮深度学习深度学习 python 人工智能损失函数图像分割
文章目录1.前言2.损失函数2.1介绍2.2代码实现2.3用法1.前言提高边缘分割的准确率对于图像分割具有重要意义，而准确的边缘可以很好表现结构特征和细节特征。下面我实现了论文相关的损失函数代码。论文：CTS-Net:ASegmentationNetworkforGlaucomaOpticalCoherenceTomographyRetinalLayerImages2.损失函数2.1介绍如下图所示
Scotch: Combining SGX and SMM to Monitor Cloud Resource Usage【TEE的应用】粥粥粥少女的拧发条鸟 TEE TEE 网络安全
目录摘要引言贡献背景SMMXenCreditScheduler与资源核算SGX威胁模型SchedulerattacksResourceinterferenceattacksVMEscapeattacks架构ResourceAccountingWorkflowCostofAccounting具体的部署和评估见论文相关工作ResourceAccounting基于SMM的方法基于SGX的系统其他VM多租
亮温模型--相关概念我这一次亮温模型相关自然语言处理深度学习线性代数
亮温模型–相关概念本博文是亮温模型（BrightnessTemperatureModel）学习的第1篇论文，主要记录亮温模型学习的相关内容。目录亮温模型--相关概念1.亮温模型相关论文相关论文列表：2.亮温模型中重要概念和背景知识2.1亮度温度2.2微波辐射计（MRM）2.3微波辐射计的四个频率通道2.4太阳时角2.5HMKSF2.6最近邻插值2.7HCS和LECS2.8样条插值方法2.9
[萝卜图书馆]英文文献下载教程（210308版）腮红小狗
首先：知网的英文文献只提供检索，官网不提供下载！知网官网也没有付费下载英文文献的渠道！需要您根据他给的检索信息自己找途径下载！♥我们只提供数据库入口及相关教程♥♥不！提！供！一对一！一篇一篇！帮找文献的服务♥♥检索能力有限实在找不到请更换目标文献♥♥作为大学生或研究生请自己多动手多动脑♥方法一：适用于无检索目标，想找论文相关的英文文献。第一步：选择Elsevier入口（优先使用靠前的入口），Els
毕设论文相关小技巧沐雨浥尘
博客原文——毕设论文相关小技巧写毕业论文过程中的一些小技巧，希望对你有用~公式自动编号MathTypeTips图片自动编号使用Mendeley插入参考文献插入矢量图善用样式这点只是强调，大多数人应该都懂，不赘述。先把论文的样式做好，后面真的可以省很多事，一般也不需要自己从头开始，在往届的模板上根据今年新要求改即可。2019届模板下载提取码：xdvv公式自动编号视频介绍首先，绝大多数本硕博论文、专利
【论文阅读26】GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks Alan and fish 论文阅读
论文相关论文标题：GradNorm:GradientNormalizationforAdaptiveLossBalancinginDeepMultitaskNetworks（深度多任务网络中自适应损失平衡的梯度归一化研究）发表时间：2017领域：多任务学习损失函数优化发表期刊：相关代码：无数据集：无摘要深度多任务网络，即一个神经网络产生多个预测输出，可以比单任务网络提供更好的速度和性能，但正确的训
我为什么学习写作 H电话绿豆沙H
说起写作我还是真的有点惭愧，一直不敢写作，也不太会写作的我，为什么终于鼓起勇气开始学习写作呢？一，工作原因：自己是做论文相关工作的，在办公室所有的人都会对客户的文章去点评，修改。而我从来不敢妄加评判，即使帮客户修改文章也会非常担心自己写的不够好。自己再给客户修改的安排不了了。记得曾经做的自己最出彩的事，就是帮客户修改抄袭率，愣是从50%的抄袭修改到了三十以下，虽然只有20%的重复，但是难度非常大。
log4j对象改变日志级别 3213213333332132 java log4j level log4j对象名称日志级别
log4j对象改变日志级别可批量的改变所有级别，或是根据条件改变日志级别。 log4j配置文件： log4j.rootLogger=ERROR,FILE,CONSOLE,EXECPTION #log4j.appender.FILE=org.apache.log4j.RollingFileAppender log4j.appender.FILE=org.apache.l
elk+redis 搭建nginx日志分析平台 ronin47 elasticsearch kibana logstash
elk+redis 搭建nginx日志分析平台 logstash,elasticsearch,kibana 怎么进行nginx的日志分析呢？首先，架构方面，nginx是有日志文件的，它的每个请求的状态等都有日志文件进行记录。其次，需要有个队列，redis的l
Yii2设置时区 dcj3sjt126com PHP timezone yii2
时区这东西，在开发的时候，你说重要吧，也还好，毕竟没它也能正常运行，你说不重要吧，那就纠结了。特别是linux系统，都TMD差上几小时，你能不痛苦吗？win还好一点。有一些常规方法，是大家目前都在采用的1、php.ini中的设置，这个就不谈了，2、程序中公用文件里设置，date_default_timezone_set一下时区3、或者。。。自己写时间处理函数，在遇到时间的时候，用这个函数处理（比较
js实现前台动态添加文本框，后台获取文本框内容 171815164 文本框
<%@ page language="java" import="java.util.*" pageEncoding="UTF-8"%> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://w
持续集成工具 g21121 持续集成
持续集成是什么？我们为什么需要持续集成？持续集成带来的好处是什么？什么样的项目需要持续集成？... 持续集成(Continuous integration ,简称CI)，所谓集成可以理解为将互相依赖的工程或模块合并成一个能单独运行
数据结构哈希表(hash)总结永夜-极光数据结构
1.什么是hash 来源于百度百科: Hash，一般翻译做“散列”，也有直接音译为“哈希”的，就是把任意长度的输入，通过散列算法，变换成固定长度的输出，该输出就是散列值。这种转换是一种压缩映射，也就是，散列值的空间通常远小于输入的空间，不同的输入可能会散列成相同的输出，所以不可能从散列值来唯一的确定输入值。简单的说就是一种将任意长度的消息压缩到某一固定长度的消息摘要的函数。
乱七八糟程序员是怎么炼成的
eclipse中的jvm字节码查看插件地址： http://andrei.gmxhome.de/eclipse/ 安装该地址的outline 插件后重启，打开window下的view下的bytecode视图 http://andrei.gmxhome.de/eclipse/ jvm博客： http://yunshen0909.iteye.com/blog/2
职场人伤害了“上司” 怎样弥补 aijuans 职场
由于工作中的失误，或者平时不注意自己的言行“伤害”、“得罪”了自己的上司，怎么办呢？　　在职业生涯中这种问题尽量不要发生。下面提供了一些解决问题的建议：　　一、利用一些轻松的场合表示对他的尊重　　即使是开明的上司也很注重自己的权威，都希望得到下属的尊重，所以当你与上司冲突后，最好让不愉快成为过去，你不妨在一些轻松的场合，比如会餐、联谊活动等，向上司问个好，敬下酒，表示你对对方的尊重，
深入浅出url编码 antonyup_2006 应用服务器浏览器 servlet weblogic IE
出处：http://blog.csdn.net/yzhz 杨争 http://blog.csdn.net/yzhz/archive/2007/07/03/1676796.aspx 一、问题：编码问题是JAVA初学者在web开发过程中经常会遇到问题，网上也有大量相关的
建表后创建表的约束关系和增加表的字段百合不是茶标的约束关系增加表的字段
下面所有的操作都是在表建立后操作的,主要目的就是熟悉sql的约束,约束语句的万能公式 1,增加字段(student表中增加姓名字段) alter table 增加字段的表名 add 增加的字段名增加字段的数据类型 alter table student add name varchar2(10); &nb
Uploadify 3.2 参数属性、事件、方法函数详解 bijian1013 JavaScript uploadify
一.属性属性名称默认值说明 auto true 设置为true当选择文件后就直接上传了，为false需要点击上传按钮才上传。 buttonClass ” 按钮样式 buttonCursor ‘hand’ 鼠标指针悬停在按钮上的样子 buttonImage null 浏览按钮的图片的路
精通Oracle10编程SQL(16)使用LOB对象 bijian1013 oracle 数据库 plsql
/* *使用LOB对象 */ --LOB(Large Object)是专门用于处理大对象的一种数据类型，其所存放的数据长度可以达到4G字节 --CLOB/NCLOB用于存储大批量字符数据，BLOB用于存储大批量二进制数据，而BFILE则存储着指向OS文件的指针 /* *综合实例 */ --建立表空间 --#指定区尺寸为128k,如不指定，区尺寸默认为64k CR
【Resin一】Resin服务器部署web应用 bit1129 resin
工作中，在Resin服务器上部署web应用，通常有如下三种方式：配置多个web-app 配置多个http id 为每个应用配置一个propeties、xml以及sh脚本文件配置多个web-app 在resin.xml中,可以为一个host配置多个web-app <cluster id="app&q
red5简介及基础知识白糖_ 基础
简介 Red5的主要功能和Macromedia公司的FMS类似，提供基于Flash的流媒体服务的一款基于Java的开源流媒体服务器。它由Java语言编写，使用RTMP作为流媒体传输协议，这与FMS完全兼容。它具有流化FLV、MP3文件，实时录制客户端流为FLV文件，共享对象，实时视频播放、Remoting等功能。用Red5替换FMS后,客户端不用更改可正
angular.fromJson boyitech AngularJS AngularJS 官方API AngularJS API
angular.fromJson 描述: 把Json字符串转为对象使用方法: angular.fromJson(json); 参数详解: Param Type Details json string JSON 字符串返回值: 对象, 数组, 字符串或者是一个数字示例: <!DOCTYPE HTML> <h
java-颠倒一个句子中的词的顺序。比如： I am a student颠倒后变成：student a am I bylijinnan java
public class ReverseWords { /** * 题目：颠倒一个句子中的词的顺序。比如： I am a student颠倒后变成：student a am I.词以空格分隔。 * 要求： * 1.实现速度最快,移动最少 * 2.不能使用String的方法如split,indexOf等等。 * 解答：两次翻转。 */ publ
web实时通讯 Chen.H Web 浏览器 socket 脚本
关于web实时通讯，做一些监控软件。由web服务器组件从消息服务器订阅实时数据，并建立消息服务器到所述web服务器之间的连接，web浏览器利用从所述web服务器下载到web页面的客户端代理与web服务器组件之间的socket连接，建立web浏览器与web服务器之间的持久连接；利用所述客户端代理与web浏览器页面之间的信息交互实现页面本地更新，建立一条从消息服务器到web浏览器页面之间的消息通路
[基因与生物]远古生物的基因可以嫁接到现代生物基因组中吗? comsci 生物
大家仅仅把我说的事情当作一个IT行业的笑话来听吧..没有其它更多的意思如果我们把大自然看成是一位伟大的程序员,专门为地球上的生态系统编制基因代码,并创造出各种不同的生物来,那么6500万年前的程序员开发的代码,是否兼容现代派的程序员的代码和架构呢?
oracle 外部表 daizj oracle 外部表 external tables
oracle外部表是只允许只读访问，不能进行DML操作，不能创建索引，可以对外部表进行的查询，连接，排序，创建视图和创建同义词操作。 you can select, join, or sort external table data. You can also create views and synonyms for external tables. Ho
aop相关的概念及配置 daysinsun AOP
切面(Aspect): 通常在目标方法执行前后需要执行的方法（如事务、日志、权限），这些方法我们封装到一个类里面，这个类就叫切面。连接点（joinpoint） spring里面的连接点指需要切入的方法，通常这个joinpoint可以作为一个参数传入到切面的方法里面（非常有用的一个东西）。通知（Advice）通知就是切面里面方法的具体实现，分为前置、后置、最终、异常环
初一上学期难记忆单词背诵第二课 dcj3sjt126com english word
middle 中间的，中级的 well 喔，那么；好吧 phone 电话，电话机 policeman 警察 ask 问 take 拿到；带到 address 地址 glad 高兴的，乐意的 why 为什么 China 中国 family 家庭 grandmother (外)祖母 grandfather (外)祖父 wife 妻子 husband 丈夫 da
Linux日志分析常用命令 dcj3sjt126com linux log
1.查看文件内容 cat -n 显示行号 2.分页显示 more Enter 显示下一行空格显示下一页 F 显示下一屏 B 显示上一屏 less /get 查询"get"字符串并高亮显示 3.显示文件尾 tail -f 不退出持续显示 -n 显示文件最后n行 4.显示头文件 head -n 显示文件开始n行 5.内容排序 sort -n 按照
JSONP 原理分析 fantasy2005 JavaScript jsonp jsonp 跨域
转自 http://www.nowamagic.net/librarys/veda/detail/224 JavaScript是一种在Web开发中经常使用的前端动态脚本技术。在JavaScript中，有一个很重要的安全性限制，被称为“Same-Origin Policy”（同源策略）。这一策略对于JavaScript代码能够访问的页面内容做了很重要的限制，即JavaScript只能访问与包含它的
使用connect by进行级联查询 234390216 oracle 查询父子 Connect by 级联
使用connect by进行级联查询 connect by可以用于级联查询，常用于对具有树状结构的记录查询某一节点的所有子孙节点或所有祖辈节点。来看一个示例，现假设我们拥有一个菜单表t_menu，其中只有三个字段：
一个不错的能将HTML表格导出为excel,pdf等的jquery插件 jackyrong jquery插件
发现一个老外写的不错的jquery插件，可以实现将HTML 表格导出为excel,pdf等格式，地址在： https://github.com/kayalshri/ 下面看个例子，实现导出表格到excel,pdf <html> <head> <title>Export html table to excel an
UI设计中我们为什么需要设计动效 lampcy UI UI设计
关于Unity3D中的Shader的知识首先先解释下Unity3D的Shader，Unity里面的Shaders是使用一种叫ShaderLab的语言编写的，它同微软的FX文件或者NVIDIA的CgFX有些类似。传统意义上的vertex shader和pixel shader还是使用标准的Cg/HLSL 编程语言编写的。因此Unity文档里面的Shader，都是指用ShaderLab编写的代码，
如何禁止页面缓存 nannan408 html jsp cache
禁止页面使用缓存~ ------------------------------------------------ jsp:页面no cache： response.setHeader("Pragma","No-cache"); response.setHeader("Cache-Control","no-cach
以代码的方式管理quartz定时任务的暂停、重启、删除、添加等 Everyday都不同定时任务管理 spring-quartz
【前言】在项目的管理功能中，对定时任务的管理有时会很常见。因为我们不能指望只在配置文件中配置好定时任务就行了，因为如果要控制定时任务的 “暂停” 呢？暂停之后又要在某个时间点 “重启” 该定时任务呢？或者说直接 “删除” 该定时任务呢？要改变某定时任务的触发时间呢？ “添加” 一个定时任务对于系统的使用者而言，是不太现实的，因为一个定时任务的处理逻辑他是不
EXT实例 tntxia ext
（1）增加一个按钮 JSP: <%@ page language="java" import="java.util.*" pageEncoding="UTF-8"%> <% String path = request.getContextPath(); Stri
数学学习在计算机研究领域的作用和重要性 xjnine Math
最近一直有师弟师妹和朋友问我数学和研究的关系，研一要去学什么数学课。毕竟在清华，衡量一个研究生最重要的指标之一就是paper,而没有数学，是肯定上不了世界顶级的期刊和会议的，这在计算机学界尤其重要！你会发现，不论哪个领域有价值的东西，都一定离不开数学！在这样一个信息时代，当google已经让世界没有秘密的时候，一种卓越的数学思维，绝对可以成为你的核心竞争力. 无奈本人实在见地