牛牛阳

ClusterNet: Detecting Small Objects in Large Scenes by Exploiting Spatio-Temporal Information

ClusterNet: Detecting Small Objects in Large Scenes by Exploiting Spatio-Temporal Information 阅读笔记

摘要
摘要
1.Introduction

1.1WAMI中的目标检测
1.2空间与时间信息

2.相关工作

2.1帧间差分法和背景减除法
2.2. Region Proposal Networks
2.3. Spatio-Temporal CNNs

3. ClusterNet & FoveaNet: Two-Stage CNN

3.1. Region Proposal: Exploiting Motion
3.2. FoveaNet: Predicting Object Locations

4. Experimental Setup
5. Results

5.1. Single-Frame & Background Subtraction（单帧和背景减除法）
5.2. ClusterNet & FoveaNet

6. Conclusion

阅读笔记)
Object detection in wide area motion imagery (WAMI)

摘要

Object detection in wide area motion imagery (WAMI) has drawn the attention of the computer vision research community for a number of years. WAMI proposes a number of unique challenges including extremely small object sizes, both sparse and densely-packed objects, and extremely large search spaces (large video frames). Nearly all state-of-the-art methods in WAMI object detection report that appearance-based classifiers fail in this challenging data and instead rely almost entirely on motion information in the form of background subtraction or frame-differencing. In this work, we experimentally verify the
failure of appearance-based classifiers in WAMI, such as Faster R-CNN and a heatmap-based fully convolutional neural network (CNN), and propose a novel two-stage spatio-temporal CNN which effectively and efficiently combines both appearance and motion information to significantly surpass the state-of-the-art in WAMI object detection. To reduce the large search space, the first stage (ClusterNet) takes in a set of extremely large video frames, combines the motion and appearance information within the convolutional architecture, and proposes regions of objects of interest (ROOBI). These ROOBI can contain from one to clusters of several hundred objects due to the large video frame size and varying object density in WAMI. The second stage (FoveaNet) then estimates the centroid location of all objects in that given ROOBI simultaneously via heatmap estimation. The proposed method exceeds state-of-the-art results on the WPAFB 2009 dataset by 5-16% for moving objects and nearly 50% for stopped objects, as well as being the first proposed method in wide area motion imagery to detect completely stationary objects.

摘要

广域运动图像（WAMI）中的物体检测已经引起了计算机视觉研究界的关注多年。 WAMI提出了许多独特的挑战，包括极小的对象大小，稀疏和密集的对象，以及极大的搜索空间（大视频帧）。在WAMI对象检测中几乎所有最先进的基于外观的分类器方法都在这个具有挑战性的数据中失败，基本上都是依赖于背景减法或帧间差分法来获得运动信息。在这项工作中，我们通过实验验证WAMI中基于外观的分类器的失败，例如Faster R-CNN和基于热图的完全卷积神经网络（CNN），并提出了一种新颖的两阶段时空CNN，有效地将外观和运动信息结合起来，显着超越了WAMI物体检测的最新技术水平。为了减少大的搜索空间，第一阶段（Clusternet）接收一组极大的视频帧，在卷积体系结构中组合运动和外观信息，并提出感兴趣对象的区域（ROOBI）。由于WAMI中的大视频帧大小和不同的对象密度，这些ROOBI可以包含一个到几百个对象的簇。然后，第二阶段（FoveaNet）通过热力图估计给定ROOBI中所有对象的质心位置。对于移动物体，所提出的方法超过WPAFB 2009数据集的最新结果5-16％，停止物体接近50％，同时也是广域运动图像中首次提出的检测完全静止物体的方法。

1.Introduction

Object detection is a large and active area of research in computer vision. In wide area motion imagery (WAMI),performing object detection has drawn the attention of the computer vision community for a number of years the computer vision community for a number of years [14, 19, 23, 28, 31]. Numerous applications exist in both the civilian and military domains. In urban planning, applications include automatic traffic monitoring, driver behavior analysis, and road verification for assisting both scene understanding and land use classification. Civilian and military security is another area to benefit with applications including military reconnaissance, detection of abnormal or dangerous behavior, border protection, and surveillance of restricted areas. With increases in the use and affordability of drones and other unmanned aerial platforms, the desire for building a robust system to detect objects in wide-area and low-resolution aerial videos has developed considerably in recent years.[14, 19, 23, 28, 31].
物体检测是计算机视觉研究的一个重要且活跃的领域。在广域运动图像（WAMI）中，执行物体检测已经引起了计算机视觉社区多年来计算机视觉社区的注意[14,19,23,28,31]。民用和军用领域都有许多应用。在城市规划中，应用包括自动交通监控，驾驶员行为分析和道路验证，以协助场景理解和土地利用分类。民用和军事安全是受益于军事侦察，异常或危险行为检测，边境保护和限制区域监视等应用的另一个领域。随着无人机和其他无人机平台的使用和可承受性的增加，近年来建立一个强大的系统来检测广域和低分辨率航拍视频中的物体的愿望已经大大增加。[14,19,23,28]。

1.1WAMI中的目标检测

The goal of object detection in images or videos is to place a bounding box (i.e. the tightest fitting rectangle which contains all elements of an object while containing as few extraneous elements as possible) around all objects in the scene. Object detection in WAMI differs from the typical object detection problem in three major ways: 1)Ground-truth (i.e. human-generated) annotations are single (x, y) coordinates placed at the objects’ centers, rather than a bounding box. Therefore, scale and orientation invariance must be learned in order to locate objects’ centers, but this information cannot be provided during supervised training. 2) In typical object detection datasets, images or video frames most often contain only one to three objects, with no more than 15 objects, while these objects take up a large percentage of the image or video frame [13]. In WAMI,video frames can contain thousands of small objects, each object accounting for less than 0.000007% of the total pixels in a given frame. Quantitative analysis of this is shown in Fig. 2. 3) Majority of object detection frameworks deal with images at 256 x 256 up to 500 x 500 pixel resolutions.Video frames in WAMI are significantly larger, typically on the order of several, to hundreds of, megapixels. This creates an extremely large search space, especially given the extremely small typical object size in WAMI being on the order of 9×18 pixels. An example WAMI video frame with ground-truth annotations is shown in Fig. 1.
图像或视频中的对象检测的目标是在场景中的所有对象周围放置边界框（即，包含对象的所有元素的最紧密拟合矩形，同时包含尽可能少的外来元素）。WAMI中的对象检测与典型的对象检测问题在三个主要方面不同：1）ground-truth（即人类生成的）注释是放置在对象中心的单个（x，y）坐标，而不是边界框。因此，必须学习尺度和方向不变性以便定位物体的中心，但是在监督训练期间不能提供这种信息。2）在典型的物体检测数据集中，图像或视频帧通常只包含一到三个物体，不超过15个物体，而这些物体占据了图像或视频帧的很大一部分[13]。在WAMI中，视频帧可以包含数千个小对象，每个对象占给定帧中总像素的0.000007％。对此的定量分析如图2所示。3）大多数对象检测框架处理256 x 256到500 x 500像素分辨率的图像。WAMI中的视频帧明显更大，通常在几百到几万像素的数量级。这创建了一个非常大的搜索空间，特别是考虑到WAMI中极小的典型对象大小大约为9×18像素。WAMI的ground-truth如图一所示。

1.2空间与时间信息

For the past several years, object detection has been dominated by detectors relying solely on spatial and appearance information (e.g. Faster R-CNN [20], ResNet [5],YOLO 9000 [18]). These methods extract low-to-high level spatial and appearance features from images to predict and classify objects. However, it has been stated in numerous recent works [21, 24, 28, 29] that these appearance and machine-learning-based methods fail in WAMI due to several unique challenges. 1) Extremely small objects averaging 9 × 18 pixels in size. 2) High intra-class variation,ranging from deep black to bright white and from semitrucks to small cars with the typical vehicle color (i.e. silver/gray) exactly matching the background, as well as dramatic changes in camera gain cause significant changes in objects’ appearance between consecutive frames. 3) Lacking color and with low resolution, videos are single-channel gray-scale with often blurred/unclear object boundaries. 4)Low frame rates of roughly 1.25 Hz make exploiting temporal information a challenge. Moving objects travel a significant distance between consecutive frames, most often with no overlap to the previous frame. Also, since the aerial recording platform is moving, background objects have significant motion causing strong parallax effects and frame-registration errors, leading to false-positive detections. Moving mosaic seams, where multiple cameras are stitched together to form a single sensor, sweep across the video, leading to even more false positives. Several of these challenges are shown in Fig. 3.
在过去的几年中，物体检测一直由仅依赖于空间和外观信息的探测器主导（例如，更快的R-CNN [20]，ResNet [5]，YOLO 9000 [18]）。这些方法从图像中提取低到高级别的空间和外观特征，以预测和分类对象。然而，在最近的许多着作[21,24,28,29]中已经说明，由于几个独特的挑战，这些外观和基于机器学习的方法在WAMI中失败。1）极小的物体，平均尺寸为9×18像素。 2）高级的类内变化，从深黑色到亮白色和从半卡车到小型汽车，典型的车辆颜色（即银色/灰色）与背景完全匹配，以及相机增益的显着变化导致连续帧之间物体外观的显着变化。3）缺乏颜色和低分辨率，视频是单通道灰度级，通常模糊/不清楚对象边界。4）大约1.25Hz的低帧速率使得利用时间信息成为挑战。移动物体在连续帧之间行进相当大的距离，最常见的是与前一帧没有重叠。此外，由于空中记录平台正在移动，背景物体具有显着的运动，导致强烈的视差效应和帧配准误差，从而导致误报检测。移动马赛克接缝，将多个相机拼接在一起形成单个传感器，扫过视频，导致更多的误检。其中一些挑战如图3所示。
Due to the aforementioned reasons, all state-of-the-art object detection methods in WAMI are motion-based[17, 28, 29], which use background subtraction or frame differencing to find the objects in the videos. However, as with the appearance-based methods, motion-based approaches suffer from their own costly drawbacks. Frame differencing and background subtraction at their core, rely heavily on the video frame registration. Small errors in frame registration can induce large failures in the final results and attempting to remove false positives is often a big part of these methods. In addition to frame registration, background subtraction requires computing median background images over a large number of frames for the entire video. This combined with the ignorance of appearance information leads to an inefficient use of information across multiple video frames. Yet, the biggest drawback is the complete inability to detect stopped vehicles. All state-of-the-art methods, due to their sole reliance on temporal information, cannot detect vehicles which are not moving relative to the background.
由于上述原因，WAMI中所有现有的对象检测方法都是基于运动的[17,28,29]，它们使用背景减法或帧差分来找到视频中的对象。然而，与基于外观的方法一样，基于运动的方法遭受其自身的昂贵缺点。帧间差分法和背景减除法很大程度上依赖于视频帧配准（？？）配准帧中的小错误可能导致最终结果中的大失败，并且尝试去除误报通常是这些方法的重要部分。除了帧注册之外，背景减法还需要在整个视频的大量帧上计算中值背景图像。这同对外观信息的不敏感相结合导致跨多个视频帧的信息的低效使用。然而，最大的缺点是完全无法检测到停止的车辆。由于其仅依赖于时间信息，所有最先进的方法都无法检测到相对于背景不移动的车辆。
Some recent works [3, 6, 8, 22, 27] have attempted to begin combining spatial and temporal information in various ways for object detection and action recognition. These methods include connecting detections across frames using tracking methods or optical flow, using a sliding window or out-of-the-box detector to perform detection then simply classify this result using some temporal information, as well as combining the outputs of a single frame CNN and optical flow input to a CNN. However, all of these methods rely either on a single-frame detector, which uses no temporal information, or uses a sliding window to check all possible locations in a video frame for object proposals, and thus do not fully exploit temporal information for the task of object detection in video. This is discussed further in Section 2.3.
最近的一些工作[3,6,8,22,27]已经尝试以各种方式开始组合空间和时间信息以用于对象检测和动作识别。这些方法包括使用跟踪方法或光流连接帧间检测，使用滑动窗口或开箱即用的检测器执行检测，然后使用一些时间信息简单地对此结果进行分类，以及将单帧CNN的输出和光流输入组合到CNN。然而，所有这些方法都依赖于单帧检测器，其不使用时间信息，或者使用滑动窗口来检查视频帧中的所有可能位置以用于对象提议，因此不完全利用该任务的时间信息。视频中的物体检测。这将在2.3节中进一步讨论。
3.贡献
The proposed two-stage, spatio-temporal convolutional neural network (CNN) predicts the location of multiple objects simultaneously, without using single-frame detectors or sliding-window classifiers. We show that, consistent with findings in several other works, single-frame detectors fail in this challenging WAMI data, and it is known that sliding window classifiers are terribly inefficient. The novelty of this paper is as follows: 1) Our method effectively utilizes both spatial and temporal information from a set of video frames to locate multiple objects simultaneously in WAMI. 2) This approach removes the need for computing background subtracted images, thus reducing the computational burden and the effect of errors in frame registration.3) The two-stage network shows the potential to reduce the extremely large search space present in WAMI data with a minimal effect on accuracy. 4) The proposed method is capable of detecting completely stationary vehicles in WAMI,where no other work yet published can do so. 5) The proposed method significantly outperforms the state-of-the-art in WAMI with a 5-16% relative improvement in F 1 score on moving object detection and a nearly 50% relative improvement for stopping vehicles, while reducing the average error distance of true positive detections from the previous state-of-the-art 5.5 pixels to roughly 2 pixels.
所提出的两阶段时空卷积神经网络（CNN）同时预测多个对象的位置，而不使用单帧检测器或滑动窗口分类器。我们表明，与其他几项工作的发现一致，单帧探测器在这个具有挑战性的WAMI数据中失败，并且已知滑动窗口分类器非常低效。本文的新颖性如下：1）我们的方法有效地利用来自一组视频帧的空间和时间信息在WAMI中同时定位多个对象。2）该方法消除了计算背景减影图像的需要，从而减少了计算负担和帧匹配中的错误的影响。3）两阶段网络显示了减少WAMI数据中存在的极大搜索空间的可能性，而对准确性的影响最小。 4）所提出的方法能够在WAMI中检测完全静止的车辆，其中尚未公布的其他工作可以这样做。5）所提出的方法明显优于WAMI的最新技术，运动物体检测的F 1得分相对改善5-16％，静止物体相对改善近50％，同时减少平均误差距离从先前最先进的5.5像素到大约2个像素。

2.相关工作

2.1帧间差分法和背景减除法

As stated in Section 1.2, due to the difficulties in WAMI and the reported failures of appearance and machine-learning-based-methods, all state-of-the-art methods in WAMI are based on either frame-differencing or back-ground subtraction. Both methods require video frames to be registered to a single coordinate system. Reilly et al. [19] detects Haris corners in two frames, computes the SIFT features around those corners, and matches the points using descriptors. A frame-to-frame homography is then fit, using RANSAC or a similar method, and used to warp images to a common reference frame. Frame differencing is the process of computing pixel-wise differences in intensities between consecutive frames. Both two-frame and three-frame differencing methods have been proposed in literature with a number of variations [9, 16, 24, 28, 30]. Background subtraction methods focus on obtaining a background model for each frame, then subtract each video frame from its corresponding background model. These methods suffer heavily from false positives introduced by the issues discussed in Section 1.2 and cannot detect stationary vehicles. Slowing vehicles also cause a major problem as they are prone to cause split detections in frame differencing [29] while registration errors and parallax effects are increased in background subtraction models, which use more frames than frame differencing. Sudden and dramatic changes in camera gain cause illumination changes which in-turn cause problems for background modeling and frame differencing methods that require consistent global illumination [24].
如第1.2节所述，由于WAMI的困难以及基于外观和基于机器学习的方法的失败，WAMI中所有最先进的方法都基于帧间差分法或背景减除法。两种方法都要求将视频帧映射到单个坐标系。Reilly et al. [19]在两个帧中检测Haris角，计算这些角周围的SIFT特征，并使用描述符匹配点。然后使用RANSAC或类似方法拟合帧到帧的单应性，并用于将图像扭曲到公共参考帧。帧间差分法是计算连续帧之间的像素强度的差异的过程。在具有许多变化的文献中已经提出了两帧和三帧差分方法[9,16,24,28,30]。背景减法方法着重于获得每个帧的背景模型，然后从其对应的背景模型中减去每个视频帧。这些方法严重受到第1.2节讨论的问题引入的误报的影响，无法检测静止车辆。减速车辆也会引起一个主要问题，因为它们容易在帧差分中引起分裂检测[29]，而配准误差和视差效应在背景减法模型中增加，后者使用比帧差分更多的帧。相机增益的突然和剧烈变化导致照明变化，这反过来又导致背景建模和帧差分方法的问题，这些方法需要一致的全局照明[24]。

2.2. Region Proposal Networks

Region proposal networks (RPN), such as Faster R-CNN[20], which has in some ways become the standard in object detection, have shown the ability to generate object proposals with high accuracy and efficiency. Unfortunately, Faster R-CNN fails in WAMI due to four main reasons. 1) Faster R-CNN acts only on single frames, thus does not exploit the available temporal information, which proves to be extremely important. 2) WAMI video frames are extremely large, thus cannot be sent in their entirety to a Faster R-CNN network on any reasonable number of GPUs. This requires spatially-chipping videos into smaller sections and checking these sections individually, dramatically hurting the computational efficiency benefit supposed to be provided by a RPN. 3) If one changed the RPN stage of Faster R-CNN to extremely downsample the images in the earliest layers in order to fit the large WAMI video frames within GPU memory, object proposals would become impossible.
Due to the extremely small object size combined with the areas of high object density means any significant amount of downsampling in the network immediately makes object locations indistinguishable, as they are often separated by only a few pixels or even less. 4) WAMI data is ill-suited for Faster R-CNN as the ground-truth locations are single points, not bounding boxes. We experimentally verify that Faster R-CNN fails in WAMI, even when given the benefit of spatially-chipping the video frames to manageable sizes.
区域提议网络（RPN），例如Faster R-CNN [20]，在某些方面已成为对象检测的标准，已经显示出以高准确度和高效率生成对象提议的能力。不幸的是，由于四个主要原因，更快的R-CNN在WAMI中失败。 1）更快的R-CNN仅作用于单帧，因此不能利用可用的时间信息，这被证明是非常重要的。2）WAMI视频帧非常大，因此把整幅图像完整的送入Fast Rcnn中，在合理数目的GPU上。这需要将视频帧切片到较小的部分并单独检查这些部分，这极大地损害了RPN应该提供的计算效率优势。3）如果将Faster R-CNN中，在前几层的图像进行下采样以便将大WAMI视频帧送人到GPU存储器内，但是这样在RPN阶段，但是对象提议将变得不可能。由于极小的物体尺寸与高物体密度区域相结合意味着网络中任何大量的下采样都会立即使物体位置难以区分，因为它们通常仅相隔几个像素或甚至更少。4）WAMI数据不适合Faster R-CNN，因为ground-truth 位置是单点，而不是边界框。我们通过实验验证了更快的R-CNN在WAMI中失败，即使在将视频帧空间削减到可管理的大小的情况下也是如此。

2.3. Spatio-Temporal CNNs

In the past few years, partially due to the enormous success of deep learning methods in a vast array of problems,several works have been proposed for combining spatial and temporal information in various ways within deep learning frameworks. Baccouche et al. [3] and Ji et al. [6] both propose using 3D CNNs for action recognition. Simonyan and Zisserman [27] propose a ”two-stream” CNN, one branch receiving individual video frames as input and the other receiving optical flow image stacks where the output of the two streams are combined at the end of the network. Kang et al. [8] proposes several methods to connect object detections in individual frames across time, including using tracking algorithms, optical-flow-guided propagation, and a long short-term memory (LSTM) sub-network. Rozantsev et al. [22] detects flying drones using sliding-window proposals, input to two CNNs multiple times to align each frame, then performs binary classification of the object or non-object in the sliding window.
在过去几年中，部分由于深度学习方法在大量问题中取得了巨大成功，已经提出了一些在深度学习框架内以各种方式组合空间和时间信息的工作。Baccouche et al. [3] and Ji et al. [6]都提出使用3D卷积进行动作识别。Simonyan和Zisserman [27]提出了“双流”CNN，一个分支接收单独的视频帧作为输入，另一个接收光流图像，其中两个流的输出在网络的末端组合。Kang et al. [8]提出了几种方法来连接各个帧中的对象检测，包括使用跟踪算法，光流引导传播和长期短期记忆（LSTM）子网络。Rozantsev et al. [22] 使用滑动窗口提议检测飞行无人机，多次输入两个CNN以对齐每个帧，然后在滑动窗口中执行对象或非对象的二分类。
Our proposed work differs from all of the above in several key ways. Baccouche et al. and Ji et al. both use stacks of frames as input to a 3D CNN. However, these works do not perform object detection. Both first assume an object of interest is already detected and perfectly centered in each input video frame. To accomplish this, these works use out-of-the-box single-frame human detector algorithms to find the objects of interest in their videos. Our method proposes to solve this object detection problem where singleframe detectors fail, in the challenging WAMI domain. Simonyan and Zisserman keep spatial and temporal information separate during feature extraction, simply combin-
ing the extracted features at the end of the network. As stated, single-frame detectors fail in WAMI. Also, due to the extremely large object displacements between consecutive frames, the optical flow stream would likely struggle significantly. The work by Kang et al. also relies on first acquiring single-frame object detections before applying their tracking or LSTM methods. The work by Rozant-sev et al. is the only one of these methods which does not rely on single-frame detections, instead opting for a sliding window to first generate its object proposals before using a 3D CNN for classification. However, sliding-window-based methods are extremely inefficient. Our work proposes to generate all object proposals simultaneously using a multi-frame, two-stage CNN for videos in WAMI in a more computationally efficient manner than background subtraction or sliding windows, effectively combining both spatial and temporal information in a deep-learning-based algorithm.
我们提出的工作在几个关键方面与上述所有工作不同。 Baccouche等人和Ji等人，两者都使用帧堆栈作为3D CNN的输入。但是，这些工作不执行对象检测。两者都首先假设已经检测到感兴趣的对象并且在每个输入视频帧中完美地居中。为了实现这一目标，这些工作使用开箱即用的单帧人体探测器算法来查找其视频中感兴趣的对象。我们的方法提出在具有挑战性的WAMI域中解决物体检测中单帧检测器失效的问题。Simonyan和Zisserman在特征提取期间将空间和时间信息分开，在网络的末端简单地组合，并提取特征。如上所述，单帧检测器在WAMI中失败。而且，由于连续帧之间的极大的物体位移，光流特征也比较差。Kang等人的工作还依赖于在应用其跟踪或LSTM方法之前首先获取单帧对象检测。 Rozant-sev等人的工作是这些方法中唯一不依赖于单帧检测的方法，而是选择滑动窗口在使用3D CNN进行分类之前首先生成其对象提议。但是，基于滑动窗口的方法效率极低。我们的工作建议使用多帧同时生成所有对象提议，在WAMI中的视频中使用两阶段CNN，比背景减法计算效率更高比滑动窗口更有效，在基于深度学习的算法中有效地结合了空间和时间信息

3. ClusterNet & FoveaNet: Two-Stage CNN

We propose a new region proposal network which combines spatial and temporal information within a deep CNN to propose object locations. Where in Faster R-CNN, each 3 × 3 region of the output map of the RPN proposes nine possible objects, our network generalizes this to propose regions of objects of interest (ROOBI) containing varying amounts of objects, from a single object to potentially over 300 objects, for each 4 × 4 region of the output map of the RPN. We then focus the second stage of the network on each proposed ROOBI to predict the location of all object(s) simultaneously for the ROOBI, again combining spatial and temporal information in this network. This two-stage approach is loosely inspired by biological vision where a large field of vision takes in information, then cues, one of the strongest being motion-detection, determine where to focus the much smaller fovea centralis.
我们提出了一种新的区域提议网络，该网络在深度CNN内组合空间和时间信息以提出对象位置。在Faster R-CNN中，RPN的输出图的每个3×3区域提出九个可能的对象，我们的网络概括了这一点，以提出包含不同数量对象的感兴趣对象区域（ROOBI），对于每个RPN的输出映射的每个4×4区域，从单个对象到可能超过300个对象。然后，我们将网络的第二阶段集中在每个提议的ROOBI上，以同时为ROOBI预测所有对象的位置，再次组合该网络中的空间和时间信息。这种两阶段方法的灵感来自于生物视觉，最强大的运动检测之一，其中大视野接收信息，然后提示，确定聚焦较小的中央凹中心的位置。

3.1. Region Proposal: Exploiting Motion

To reduce the extremely large search space in WAMI,several works proposed using road-overlay maps. This dramatically reduces the search area but severely limits to applicability of the method. Road maps must be known in advance and must be fit perfectly to each video frame, in addition to removing the possibility for detecting objects which do not fall on the road. Instead, we proposed a method to learn this search space reduction, without any prior knowledge of road maps. We created a fully-convolutional neural network shown in Fig. 4 which dramatically downsamples the very large WAMI video frames using convolutional strides and max pooling. To exploit temporal information,rather than sending an individual frame to the CNN, we input consecutive adjoining frames with the frame we want to generate proposals for. These adjoining and central frames are input to a 2D convolutional network. The advantage of using a 2D CNN over a 3D CNN as in [3, 6] is the preservation of the temporal relationship between frames. Each frame learns its own convolutional filter set, then these are combined to produce feature maps which maximize information related to the frame we care most about (in our case we chose to train the network to maximize the central frame). Instead of a sliding temporal convolution, our method uses the following equation,
为了减少WAMI中极大的搜索空间，提出了一些使用道路覆盖图的工作。为了减少WAMI中极大的搜索空间，提出了一些使用道路覆盖图的工作。这大大减少了搜索范围，但严重限制了该方法的适用性。道路地图必须事先知道，并且必须完美地适合每个视频帧，此外还不能检测到不会落在路面上的物体。相反，我们提出了一种方法来学习这种搜索空间缩减的方法，而不需要任何先前的路线图知识。我们创建了一个完全卷积的神经网络，如图4所示，它使用卷积步幅和最大池来显着地对非常大的WAMI视频帧进行采样。

为了利用时间信息，而不是将单个帧发送到CNN，我们输入我们想要为其生成提议的帧以及连续的相邻帧。这些相邻和中央帧被输入到2D卷积网络。如[3,6]中那样在3D CNN上使用2D CNN的优点是保持帧之间的时间关系。每个帧都学习自己的卷积滤波器集，然后将它们组合起来，通过最大化与我们最关心的帧相关的信息（在我们的例子中，我们选择训练网络以最大化中心帧）产生特征图。我们的方法使用以下等式代替滑动时间卷积：

其中 $V_{n}是第存储在堆栈中的第n^{th}视频帧，K_{n}是对用第n帧的，大小为$ { $k_{h},k_{w}$ }的卷积核，来生成我们要的特征图 $f_{m}∈R^{m}$ ,M是特征图的集合， $n \in N$ 是是输入到网络的时间帧集合中的帧，并且 $b_{m}$ 是特征图m的学习偏差。这个公式不同于标准2D单帧CNN和3D CNN，允许我们通过欧几里德的反向传播或输出得分图与ground-truth热力图之间的交叉熵损失来选择我们想要最大化的帧。
All further layers in the network beyond the first perform the task of refining this information to the desired output.As shown by Schwartz-Ziv and Tishby [26], the amazing success of deep neural networks lie in their ”information bottleneck” ability to refine information through the layers guided by backpropagation, reducing the high-entropy input to a low-entropy output. Therefore we chose to provide our temporal information to the network in the earliest layer,providing the maximum possible information at the earliest stage, allowing the remainder of the layers to refine this information to the desired output proposals.
网络中除第一层之外的所有其他层执行将该信息细化到期望输出的任务。如Schwartz-Ziv和Tishby [26]所示，深度神经网络的惊人成功在于它们的“信息瓶颈”能力，即通过反向传播引导的层来细化信息，将高熵输入减少到低熵输出。因此，我们选择在最早的层中向网络提供我们的时间信息，在最早阶段提供最大可能的信息，允许其余层将该信息细化为期望的输出提议。
We formulated the problem in two different ways. In one, we estimate ROOBIs, or object locations in the second stage, via a heatmap-based formulation using the Euclidean loss between the network output and a heatmap created in the following manner,
我们以两种不同的方式考虑了这个问题。一方面我们使用热力图和网络输出的欧几里得损失获得基于热力图的损失公式，通过基于热图的公式估算第二阶段的ROOBI或对象位置。
其中n ∈ N是（x，y）的ground-truth坐标，d是下采样的量倍数，σ是高斯模糊适合每个变换对象位置的方差。这使得损失具有平滑的梯度，以便估计对象/区域位置而不是空间中的单个点。然后将H剪裁为1，以便用单个对象和数百个对象的簇来均等地加权区域。通过对我们的两个类的高斯热图进行阈值处理来创建分割图。分割公式，使用softmax-交叉熵损失，当对象位置互斥时。因此，如果对象位置不重叠，则可以使用单个输出预测大量对象类的位置。如果位置不相互排斥，则可以使用高斯热图公式，其中每类对象具有相应的热图，并且网络产生该数量的输出。这些实验的结果显示出非常相似的结果，证明在给定特定问题的情况下可以使用任一种公式，因此允许我们的方法具有更大的灵活性和更广泛的可能应用。

3.2. FoveaNet: Predicting Object Locations

The FoveaNet stage of our two-stage CNN works on the principle of the effective receptive field of neurons in ClusterNet. Each output neuron in the final 1 × 1 convolutional layer essentially gives a vote, whether there is a vehicle or cluster of vehicles within that given region or whether there are none. These neurons vote based on the information of the neurons they are connected to in the previous layer which in turn are connected back to the layer before them and so on until the initial input. FoveaNet calculates the region of input information each neuron in the final layer is using to make its final vote. For any Cluster-Net output values above a set threshold, this input region, across all input frames, is sent through FoveaNet for high-resolution analysis, as FoveaNet has only a single down-sample in the network. The effect is ClusterNet allows us to ignore large regions of the search space while focusing a small high-resolution fovea centralis over regions which contain at least one to several hundred vehicles, illustrated in Fig. 5. FoveaNet then predicts the location all of vehicles within that region to a high degree of accuracy for the given temporal frame of interest.
我们的两阶段CNN的FoveaNet阶段基于ClusterNet中神经元的有效感受域原理。最终的1×1卷积层中的每个输出神经元给出投票，无论在该给定区域内是否存在车辆或车辆群或者是否存在。这些神经元基于它们在前一层中连接的神经元的信息进行投票，而前一层又连接回它们之前的层，依此类推，直到初始输入。FoveaNet计算最终层中每个神经元用于进行最终投票的输入信息区域。对于高于设定阈值的任何Cluster-Net输出值，跨所有输入帧的此输入区域将通过FoveaNet发送以进行高分辨率分析，因为FoveaNet在网络中只有一个下采样。效果是ClusterNet允许我们忽略搜索那些空间大的区域，同时聚焦在一个小的高分辨率中央凹集中，至少包含一到几百辆车的区域上，如图5所示，然后，FoveaNet预测该区域内所有车辆的位置，以达到给定时间范围内的高精度。

Since our FoveaNet input can be much smaller thanks to ClusterNet reducing the search space, we opted to use large kernels within the convolutional layers of FoveaNet, decreasing in size to the final 1 × 1 convolutional layer, see Fig 4. This was inspired by the recent work by Peng et al. [15] as well as a large amount of experimentation. For the options of large kernels ascending in size, descending in size, or fixed in size, as well as small kernels, we found the proposed network to consistently perform the best.
由于ClusterNet减少了搜索空间，我们的FoveaNet输入可以小得多，我们选择在FoveaNet的卷积层内使用大的卷积核，将大小减小到最终的1×1卷积层，见图4。对于卷积核的打小，我们发现我们提出的网络始终如一地表现最佳。

4. Experimental Setup

Experiments were performed on the WPAFB 2009 dataset [1]. This dataset is the benchmark by which all methods in WAMI compare as it is one of the most varied and challenging, as well as one of the only publicly available with human-annotated vehicle locations. The video is taken from a single sensor, comprised of six slightly-overlapping cameras, covering an area of over 19 sq. km.,at a frame rate of roughly 1.25 Hz. The average vehicle in these single-channel images make up only approximately 9×18 out of the over 315 million pixels per frame, with each pixel corresponding to roughly 1/4 meter. With almost 2.4 million vehicle detections spread across only 1, 025 frames of video, there averages out to be well over two thousand vehicles to detect in every frame.
实验在WPAFB 2009数据集[1]上进行。该数据集是WAMI中所有方法进行比较的基准，因为它是最具变化性和挑战性的数据集之一，也是唯一一个公开提供人类注释的车辆位置的数据集。该视频来自一个传感器，由六个略微重叠的摄像机组成，覆盖面积超过19平方公里，帧速率约为1.25赫兹。这些单通道图像中的每个车辆平均仅占大约9×18，每个像素对应于大约1/4米，每帧超过3.15亿像素，检测到240万个车辆在1,025帧的视频中，平均每帧都有超过两千辆车被检测。
Frames are registered to compensate for camera motion following the method by Reilly et al. [19] as discussed in Section 2.1. After registration, eight areas of interest (AOI) were cropped out in accordance to those used is testing other state-of-the-art methods [4, 17, 28, 29], allowing for a proper comparison of results. AOIs 01−04 are 2278×2278 pixels, covering different types of surroundings and varying levels of traffic. AOI 34 is 4260 × 2604. AOI 40 is 3265 × 2542. AOI 41 is 3207 × 2892. AOI 42 is simply a sub-region of AOI 41 but was included to test our method against the one proposed by Prokaj et al. [17] on persistent detections where slowing and stopped vehicles were not removed from the ground truth, even though Prokaj et al. uses tracking methods to maintain detections. All other AOIs have any vehicle which moved fewer than 15 pixels (2/3 a car length) over the course of 5 frames removed as to be consistent in testing against other methods for moving object detection. All cropped AOIs are shown with their ground-truth and our results in the supplemental materials.
根据Reilly等人的方法，对帧进行配准以补偿相机运动。 [19]如2.1节所述。匹配后，根据使用的方法裁剪出八个感兴趣的区域（AOI），测试其他最先进的方法[4,17,28,29]，以便对结果进行适当的比较。 AOI 01-04是2278×2278像素，覆盖不同类型的环境和不同的交通水平。 AOI 34是4260×2604.AO 40是3265×2542.AO 41是3207×2892.AO 42仅仅是AOI 41的子区域，但是包括在内以测试我们的方法与Prokaj等人提出的方法。 [17]关于持续性检测，其中减速和停止的车辆没有从基本事实中移除，即使Prokaj等人。使用跟踪方法来维护检测。所有其他AOI都有任何车辆在移除5帧的过程中移动少于15个像素（车辆长度为2/3），以便与其他移动物体检测方法的测试保持一致。所有裁剪的AOI都显示其真实性在我们在补充材料中的结果。
Data was split into training and testing splits in the following way. For training, only tiles which contain vehicles were included. The splits were as follows: AOIs 02, 03, and 34 were trained on AOIs 40, 41, and 42; AOIs 01 and 40 were trained on AOIs 34, 41, and 42; and AOIs 04, 41, and 42 were trained on 34 and 40. Both ClusterNet and FoveaNet were trained separately from scratch using Caffe [7]. ClusterNet used stochastic gradient descent with Nesterov momentum, a base learning rate of 0.01, a batch size of 8, and decreased the learning rate by a factor of 0.1 upon validation loss plateaus. FoveaNet used Adam [11] with a base learning rate of 0.00001 and a batch size of 32. Training and testing was performed on a single Titan X GPU.
数据按以下方式分为训练和测试分组。对于训练，仅包括包含车辆的分组。分组如下：AOI 02,03和34在AOI 40,41和42上训练; AOI 01和40在AOI 34,41和42上接受过培训; 并且AOI 04,41和42在34和40训练。ClusterNet和FoveaNet都是使用Caffe分开训练的[7]。 ClusterNet使用随机梯度下降与Nesterov动量，基本学习率为0.01，批量大小为8，在验证loss plateaus时，将学习率降低0.1倍。FoveaNet使用Adam [11]，基本学习率为0.00001，批量大小为32.训练和测试是在单个Titan X GPU上进行的。
To turn the final network output back to single (x, y)coordinates for comparison against the ground-truth, the output is thresholded (either by set levels for creating precision-recall curves, or by Otsu thresholding to find the best threshold level during deployment). Connected components are obtained, weak responses (i.e. < 100 pixels) are removed, and large responses (i.e. > 900 pixels; assumed to be merged detections) are split into multiple detections by finding circular centers in a bounding box surrounding that connected component. The centroid of each connected component is considered as a positive detection. It should be noted merged detections are quite rare; completely removing this component saw a F 1 score decrease of less than 0.01 across all AOIs. Completely removing small detection removal saw a decrease in F 1 score of 0.01 to 0.05 depending on the AOI tested; however, this parameter is quite robust. Values in the range of 60 to 180 pixels show a change of less than 0.01 in F1 score across all AOIs.
要将最终网络输出变成单个（x，y）坐标以与ground-thruth实况进行比较，输出将被阈值化（通过设置级别创建精确调用曲线，或通过Otsu阈值处理以找到最佳阈值水平部署）。获得连通分量，去除弱响应（即<100像素），并且通过在围绕该连通分量的边界框中找到圆形中心，将大响应（即> 900像素;假设为合并检测）分成多个检测。每个连通分量的质心被认为是正样本。值得注意的是，合并检测很少见; 完全去除该组分后，所有AOI的F 1评分降低均小于0.01。根据所测试的AOI，完全去除小的检测去除，F 1得分降低0.01至0.05; 但是，这个参数非常强大。在所有AOI中，在60到180像素范围内的值显示F1得分中的变化小于0.01。定量结果在精确度，召回率和F 1度量方面进行了比较。
为了与文献[28]保持一致，如果它们落在地面实况坐标的20像素（5米）范围内，则被认为是正样本。如果多个检测在此半径范围内，则采用最接近的检测，如果它们没有任何其他具有20个像素的地面实况坐标，则其余的被标记为误报。任何不在ground-truth的20个像素范围内的检测坐标也标记为误报。在20像素内没有检测的地面实况坐标被标记为副样本。

5. Results

5.1. Single-Frame & Background Subtraction（单帧和背景减除法）

为了证明时间信息的影响，我们进行了三组实验：显式，隐式和无时间信息。对于显示，我们计算了Reilly[19]等人的方法之后所有帧的中值和背景减除图像。我们训练和测试FoveaNet，输入是中间视频帧的两个副本与该帧的计算背景减除图像相结合。每个部分分成128*128个像素。这表明，我们的深层网络通过同时提供外观和时间信息，可以比单纯的背景减法表现得更好。对于隐式方法，我们使用三到五帧作为网络的输入来训练和测试我们提出的方法，以证明网络能够直接从输入图像中学习时间信息，而不需要计算中值图像和背景减法图像。我们使用单一帧作为输入对FoveaNet进行训练和测试，并使用Faster的R-CNN进行实验。我们尝试了许多配置Faster的R-CNN与VGG-16和ResNet-50，预训练和从头开始开始训练，proposal尺寸调整成WAMI数据分成256×256像素,ground-truth的边界框设置为20×20像素集，以目标对象为中心。所有这些实验在AOI 41上测试的最高精确召回曲线如图6所示。

5.2. ClusterNet & FoveaNet

我们提出的两阶段方法的结果，与13种不同的最先进的方法相比，在图9和表2中显示了7种不同的AOIs。在AOI 42上，不去除静止物体，结果如图8、图7所示。我们的最终结果衡量了ClusterNet提供的计算效率改进和对检测得分的影响。如果FoveaNet必须检查给定输入的每个区域，那么获得预测对象(x, y)位置的时间大约为每帧3秒。表1显示了来自ClusterNet的平均加速以及所有AOIs中F1值的平均变化。

表1:不同阈值水平下使用ClusterNet的加速百分比和F1-measure减少。较高的阈值排除了较大部分的输入空间，但如果提高得太高，则会对F1分数产生负面影响。

表2:F1当前最先进的方法得分。如果原始作品中没有报告F1值，则使用[28]和/或[29]中报告的值。注意，AOI 42是持久检测的结果(没有车辆从地面真值中移除)，并与目前文献中仅有的另一种持久检测WAMI方法进行了比较。

图7:AOI 42的持久检测结果。视频帧号用黄色标记在每一列的底部。顶行:四次高亮显示图像区域。在第一幅图中，建筑阴影下的黑色汽车几乎是肉眼看不见的。在最后一帧显示，由于运动视差，白色车辆几乎完全被建筑遮挡。中间一行:ground truth真实热图。底部行:输出热图没有任何后处理。

图8:持续检测AOI 42的准确率召回曲线(即不去掉ground truth真值坐标)。

图9:7个裁剪过的AOIs上的移动对象检测结果与13种最先进的方法进行比较。如果原始工作中没有报告值，则使用[28]和/或[29]中报告的值。

6. Conclusion

提出了一种用于大场景小目标检测的两级卷积神经网络，并在大区域运动图像上进行了验证。我们的方法成功地利用了外观和运动线索来同时检测数百个物体的位置。我们已经将其与13种最先进的方法进行了比较，F1评分显示，运动物体的性能改善相对为5-16%，持久检测的性能改善接近50%。去除计算中值图像和背景图像的计算盲点，以及聚类网络减少搜索空间，都是接近在线方法的关键贡献。对于未来的工作，最后的障碍之一是去除帧对齐计算删除相机的运动。

你可能感兴趣的:(ClusterNet: Detecting Small Objects in Large Scenes by Exploiting Spatio-Temporal Information)

SQL Server_查询某一数据库中的所有表的内容 qq_42772833 SQL Server 数据库 sqlserver
1.查看所有表的表名要列出CrabFarmDB数据库中的所有表（名），可以使用以下SQL语句：USECrabFarmDB;--切换到目标数据库GOSELECTTABLE_NAMEFROMINFORMATION_SCHEMA.TABLESWHERETABLE_TYPE='BASETABLE';对这段SQL脚本的解释：SELECTTABLE_NAME：这个语句的作用是从查询结果中选择TABLE_NAM
关于Mysql 中 Row size too large (＞ 8126) 错误的解决和理解秋刀prince mysql mysql 数据库
提示：啰嗦一嘴，数据库的任何操作和验证前，一定要记得先备份！！！不会有错；文章目录问题发现一、问题导致的可能原因1、页大小2、行格式2.1compact格式2.2Redundant格式2.3Dynamic格式2.4Compressed格式3、BLOB和TEXT列二、解决办法1、修改页大小（不推荐）2、修改行格式3、修改数据类型为BLOB和TEXT列4、其他优化方式（可以参考使用）4.1合理设置数据
AI大模型的架构演进与最新发展季风泯灭的季节 AI大模型应用技术二人工智能架构
随着深度学习的发展，AI大模型（LargeLanguageModels,LLMs）在自然语言处理、计算机视觉等领域取得了革命性的进展。本文将详细探讨AI大模型的架构演进，包括从Transformer的提出到GPT、BERT、T5等模型的历史演变，并探讨这些模型的技术细节及其在现代人工智能中的核心作用。一、基础模型介绍：Transformer的核心原理Transformer架构的背景在Transfo
LeetCode 53. Maximum Subarray 枯萎的海风算法与OJ C/C++leetcode
1.题目描述Findthecontiguoussubarraywithinanarray(containingatleastonenumber)whichhasthelargestsum.Forexample,giventhearray[−2,1,−3,4,−1,2,1,−5,4],thecontiguoussubarray[4,−1,2,1]hasthelargestsum=6.clicktos
【LeetCode】53. Maximum Subarray 墨染百城 LeetCode leetcode
问题描述问题链接：https://leetcode.com/problems/maximum-subarray/#/descriptionFindthecontiguoussubarraywithinanarray(containingatleastonenumber)whichhasthelargestsum.Forexample,giventhearray[-2,1,-3,4,-1,2,1,-
ElasticSearch查询超过10000条（1000页）时出现Result window is too large的问题王月亮17
问题当ES数据量较大，使用分页查询超过10000条（1000页）时，出现如下错误：Cannotexecutejestaction,responsecode:500,error:{"root_cause":[{"type":"query_phase_execution_exception","reason":"Resultwindowistoolarge,from+sizemustbelesstha
git：文件存储方式 xuanyu22 工具 git github
引言我们知道git跟踪文件会经历三个阶段：工作区，暂存区和本地仓库（参考git：理解工作区，暂存区和本地仓库），在这些阶段文件如何被储存？理解git文件的存储方式能帮助我们掌握git的工作原理。git对象在上述三个阶段，文件会以对象（object）的形式存储在.git/objects目录下，对象主要有三类：commit，tree和blob。假设初始目录如下：├──.git├──file│└──c.
Java集合类框架源码分析之 RoleList源码解析【6】 yunzhonghefei Java集合类源码分析 RoleList源码解析
该类继承于ArrayList，针对Role进行了一些扩展。其他方法和ArrayList中基本相同，源码不做针对性分析：看一下类简介：/***代表了一个roles的列表，作为方法setRoles()的参数，去创建一个关联关系，并且尝试在同一个关系中设置多个角色。*ARoleListrepresentsalistofroles(Roleobjects).Itisusedas*parameterwhen
页面报错 POST 413错误 (Request Entity Too Large) 小黑屋说YYDS 踩坑 nginx java linux 服务器
一般来说是服务器使用nginx作为反向代理出现的问题，post请求长度超过了nginx默认的缓存大小和最大客户端最大请求大小。解决方式如下，更该nginx代理配置：在nginx.conf配置文件中，找到http{}代码块，添加如下配置client_max_body_size20m;重启nginx即可。
[Unity 3d] VertexPaint （Mesh 顶点画手） - GitHub 雨落随风
一个Mesh顶点动画绘制工具。GitHub上的工程多如繁星，有些好的仓库，但凡不经意间错过了就很难找回，故稍作采撷，希望能帮助到有心人。简介：笔者今天推荐的仓库叫VertexPaint。-顶点画手ThispackageallowsyoutopaintinformationontotheverticesofameshintheUnityeditoraswellasmodifyanyattribute
详解 Pandas 的 isin 用法文刀小桂 Pandas pandas python
Pandas的isin()方法可以判断数据值是否在某个数据集合中，若与集合中的某个值相等则返回True，反之返回False。importpandasaspddf=pd.DataFrame({"title":["one","two","three","four"],"type":["small","common","middle","large"],"num":[10,20,30,40]})#1.判
渗透测试的了解锅盖'awa' 网络安全小白之路安全性测试安全
文章目录概述一、渗透测试分类1.黑盒测试／外部测试2.白盒测试／内部测试3.灰盒测试／组合测试二、渗透测试-目标分类1、主机操作系统渗透2、数据库系统渗透3、应用系统渗透4、网络设备渗透三、渗透测试过程（七个阶段）1.前期交互阶段（Pre-EngagementInteraction）2.情报搜集阶段（InformationGathering）3.威胁建模阶段（ThreatModeling）4.漏洞
Java之String类不互关就取关 java python 开发语言
一、String类常用方法1.引用类型的比较我们知道在Java中两个引用遍历是不能用"=="号来比较的，而String类重写了父类objects的equals方法，实现了引用类型的比较例子importjava.util.Scanner;publicclassMain{publicstaticvoidmain(String[]args){Stringstr1="helloworld";Strings
晓思的ScalersTalk第六轮《新概念》朗读持续力训练Day08—20210105 XS_XS
【练习材料】(喜马拉雅新概念英音第二册)☞原文Lesson8ThebestandtheworstJoeSandershasthemostbeautifulgardeninourtown.Nearlyeverybodyentersfor'TheNicestGardenCompetition'eachyear,butJoewinseverytime.BillFrith'sgardenislargert
C# Linq语句用法大全以及Lambda表达式一个小码码 c#linq 开发语言 .net
C#Linq语句用法大全以及Lambda表达式Linq：是一种用于数据查询和操作的语言集成查询（LanguageIntegratedQuery）技术。通过Linq，我们可以使用类似于SQL查询的方式来查询、筛选和操作各种类型的数据集合，包括数组、列表、集合、XML文档、数据库表等等。常见的有：LinqtoObjects：用于操作对象集合，例如数组、列表等。LinqtoXML：用于操作XML数据，支
基于深度学习的多模态信息检索 SEU-WYL 深度学习dnn 深度学习人工智能
基于深度学习的多模态信息检索（MultimodalInformationRetrieval,MMIR）是指利用深度学习技术，从包含多种模态（如文本、图像、视频、音频等）的数据集中检索出满足用户查询意图的相关信息。这种方法不仅可以处理单一模态的数据，还可以在多种模态之间建立关联，从而更准确地满足用户需求。1.多模态信息检索的挑战异构数据表示：多模态数据通常具有不同的特征和表示形式（如文本的词嵌入与图
多模态大语言模型(MLLMs)-一般架构（非常详细）零基础入门到精通，收藏这一篇就够了程序员_大白语言模型人工智能自然语言处理
多模态大语言模型(MultimodalLargeLanguageModel,MLLM），在LLM原有的强大泛化和推理能力基础上，进一步引入了多模态信息处理能力。相比于以往的多模态方法，例如以CLIP为代表的判别式，或以OFA为代表的生成式，新兴的MLLM展现出一些典型的特质，在下面这两种特质的加持下，MLLM涌现出一些以往多模态模型所不具备的能力！模型大。MLLM通常具有数十亿的参数量，更多的参数
大规模语言模型从理论到实践 vLLM推理框架实践 AGI通用人工智能之禅计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
大规模语言模型从理论到实践：vLLM推理框架实践1.背景介绍1.1问题的由来随着大规模语言模型（LargeLanguageModels,vLLMs）的发展，从简单的语言生成到复杂的多模态任务，这些模型的能力得到了显著提升。然而，如何高效地利用这些模型进行推理成为了新的挑战。传统的方法往往受限于模型的输入长度、计算资源的限制以及缺乏有效的任务分解策略。为了解决这些问题，vLLM推理框架应运而生，旨在
Arch - 演进中的架构小小工匠【凤凰架构】架构
文章目录Pre原始分布式时代1.背景与起源2.分布式系统的初步探索3.分布式计算环境（DCE）4.技术挑战与困境5.原始分布式时代的失败与教训6.未来展望单体时代优势缺陷单体架构与微服务架构的关系总结SOA时代1.SOA架构及其背景1.烟囱式架构（InformationSiloArchitecture）2.[微内核架构](https://www.oreilly.com/content/softwa
C#中关闭窗口清除内容 Vae_Mars AI对话中的其他知识 c#开发语言
1.解释说明-窗口关闭时是可以执行函数的，可以做一部分事件处理-因为删除文件夹的方案不好做，所以改成清空文件夹内的内容2.程序示例publicMainWindow(){InitializeComponent();//检查窗口关闭this.Closing+=MainWindow_Closing;}//执行具体的方法privatevoidMainWindow_Closing(objectsender,
【HDFS】【HDFS架构】【HDFS Architecture】【架构】资源存储库 hdfs 架构 hadoop
目录1Introduction介绍2AssumptionsandGoals假设和目标HardwareFailure硬件故障StreamingDataAccess流式数据访问LargeDataSets大型数据集SimpleCoherencyModel简单凝聚力模型“MovingComputationisCheaperthanMovingData”“移动计算比移动数据更便宜”PortabilityAc
如何利用命令模式实现一个手游后端架构? 隔窗听雨眠命令模式
命令模式的原理解读命令模式的英文翻译是CommandDesignPattern。在GoF的《设计模式》一书中，它是这么定义的：Thecommandpatternencapsulatesarequestasanobject,therebylettingusparameterizeotherobjectswithdifferentrequests,queueorlogrequests,andsuppo
人大金仓数据库，导出数据表的结构星空下夜猫子人大金仓kingbase 数据库字段查询数据类型必填 SQL
SELECTa."COLUMN_NAME"字段名称,a."COMMENTS"中文名称描述,c."DATA_TYPE"数据类型,IF(c."IS_NULLABLE"='YES','是','否')AS是否必填FROMINFORMATION_SCHEMA.COLUMNScRIGHTJOINall_col_commentsaONc."table_name"=a."table_name"ANDc."COLU
VBA ado 把CSV文件当做数据源进行查询操作 weixin_33937913 数据库
SubF_Sample006()'MicrosoftActiveXDataObjects2.XLibrary'F_Data.csvDimmyConAsNewADODB.ConnectionDimmyRstAsNewADODB.RecordsetDimmyCncAsStringDimmyCmdAsStringDimmyFileNameAsStringDimiAsLongmyFileName="F_D
2018-08-11-信息理论 Information theory-C9T3P3 著屹隐莳
1、probe探针2、voyage旅行者3、spectacular壮观的、精彩的4、shun避开、回避5、prestigious著名的、有声望的6、feat成就、功绩、壮举7、crumple弄皱、变皱eg：crumpleafewsheetsintoballs将纸弄皱攒成球状8、cramadj.填鸭式学的vi.狼吞虎咽地吃东西；死记硬背功课vt.填满，塞满；死记硬背；猛吃eg：...tocramit
LeetCode——363. 矩形区域不超过 K 的最大数值和(Max Sum of Rectangle No Larger Than K)[困难]——分析及代码（Java）江南土豆数据结构与算法 LeetCode Java 题解
LeetCode——363.矩形区域不超过K的最大数值和[MaxSumofRectangleNoLargerThanK][困难]——分析及代码[Java]一、题目二、分析及代码1.排序+二分查找（1）思路（2）代码（3）结果2.有序集合（1）思路（2）代码（3）结果3.直接查找（1）思路（2）代码（3）结果三、其他一、题目给你一个mxn的矩阵matrix和一个整数k，找出并返回矩阵内部矩形区域的不
C# 图形图像技术（通过Graphics绘制图像）萨达大 c#开发语言
文章目录创建Graphics对象画笔与画刷画笔画刷SolidBrush类HatchBrush类LinerGradientBrush类基本图形绘制矩形椭圆圆弧扇形创建Graphics对象privatevoidForm1_Load(objectsender,Eventargse){Graphicsghs=this.CreateGraphics();}画笔与画刷画笔构造函数publicPen(Color
Oracle分区表之MODIFY 使用在线重定义对集合表进行分区 IT皮特 oracle 数据库
您可以使用联机重新定义将未分区的集合表复制到已分区的集合表格，Oracle数据库将行插入集合表格中的相应分区。示例4-41说明了如何对Objects列中的嵌套表执行此操作；类似的例子适用于XMLType表或列中的有序集合类型表。在copy_table_dependents操作期间，您指定0或false用于复制索引和约束，因为您希望保留新定义的集合表的索引和约束。但是，集合表及其分区的名称与临时表的
C# treeview用法加根节点与子节点小黄人软件日志 C#数据结构 treeview
C#treeview加根节点与子节点privatevoidForm1_Load(objectsender,EventArgse){treeView1.Nodes.Add("1根节点");inti=0;treeView1.Nodes[i].Nodes.Add("0子节点");treeView1.Nodes[i].Nodes.Add("1子节点");treeView1.Nodes.Add("2根节点"
泽平的ScalersTalk第六轮新概念朗读持续力训练Day 10 20210402 郑泽平
练习材料：Lesson9AcoldwelcomeOnWednesdayevening,wewenttotheTownHall.ItwasthelastdayoftheyearandalargecrowdofpeoplehadgatheredundertheTownHallclock.Itwouldstriketwelveintwentyminutes'time.Fifteenminutespass
[星球大战]阿纳金的背叛 comsci
本来杰迪圣殿的长老是不同意让阿纳金接受训练的......... 但是由于政治原因,长老会妥协了...这给邪恶的力量带来了机会所以......现代的地球联邦接受了这个教训...绝对不让某些年轻人进入学院
看懂它，你就可以任性的玩耍了！ aijuans JavaScript
javascript作为前端开发的标配技能，如果不掌握好它的三大特点：1.原型 2.作用域 3. 闭包 ,又怎么可以说你学好了这门语言呢？如果标配的技能都没有撑握好，怎么可以任性的玩耍呢？怎么验证自己学好了以上三个基本点呢，我找到一段不错的代码，稍加改动，如果能够读懂它，那么你就可以任性了。 function jClass(b
Java常用工具包 Jodd Kai_Ge java jodd
Jodd 是一个开源的 Java 工具集，包含一些实用的工具类和小型框架。简单，却很强大！写道 Jodd = Tools + IoC + MVC + DB + AOP + TX + JSON + HTML < 1.5 Mb Jodd 被分成众多模块，按需选择，其中工具类模块有： jodd-core &nb
SpringMvc下载 120153216 springMVC
@RequestMapping(value = WebUrlConstant.DOWNLOAD) public void download(HttpServletRequest request,HttpServletResponse response,String fileName) { OutputStream os = null; InputStream is = null;
Python 标准异常总结 2002wmj python
Python标准异常总结 AssertionError 断言语句（assert）失败 AttributeError 尝试访问未知的对象属性 EOFError 用户输入文件末尾标志EOF（Ctrl+d） FloatingPointError 浮点计算错误 GeneratorExit generator.close()方法被调用的时候 ImportError 导入模块失
SQL函数返回临时表结构的数据用于查询 357029540 SQL Server
这两天在做一个查询的SQL，这个SQL的一个条件是通过游标实现另外两张表查询出一个多条数据，这些数据都是INT类型，然后用IN条件进行查询，并且查询这两张表需要通过外部传入参数才能查询出所需数据，于是想到了用SQL函数返回值，并且也这样做了，由于是返回多条数据，所以把查询出来的INT类型值都拼接为了字符串，这时就遇到问题了，在查询SQL中因为条件是INT值，SQL函数的CAST和CONVERST都
java 时间格式化 | 比较大小| 时区个人笔记 7454103 java eclipse tomcat c MyEclipse
个人总结！不当之处多多包含！引用 1.0 如何设置 tomcat 的时区：位置：(catalina.bat---JAVA_OPTS 下面加上) set JAVA_OPT
时间获取Clander的用法 adminjun Clander 时间
/** * 得到几天前的时间 * @param d * @param day * @return */ public static Date getDateBefore(Date d,int day){ Calend
JVM初探与设置 aijuans java
JVM是Java Virtual Machine（Java虚拟机）的缩写，JVM是一种用于计算设备的规范，它是一个虚构出来的计算机，是通过在实际的计算机上仿真模拟各种计算机功能来实现的。Java虚拟机包括一套字节码指令集、一组寄存器、一个栈、一个垃圾回收堆和一个存储方法域。 JVM屏蔽了与具体操作系统平台相关的信息，使Java程序只需生成在Java虚拟机上运行的目标代码（字节码）,就可以在多种平台
SQL中ON和WHERE的区别 avords
SQL中ON和WHERE的区别数据库在通过连接两张或多张表来返回记录时，都会生成一张中间的临时表，然后再将这张临时表返回给用户。 www.2cto.com 在使用left jion时，on和where条件的区别如下： 1、 on条件是在生成临时表时使用的条件，它不管on中的条件是否为真，都会返回左边表中的记录。
说说自信 houxinyou 工作生活
自信的来源分为两种,一种是源于实力,一种源于头脑.实力是一个综合的评定,有自身的能力,能利用的资源等.比如我想去月亮上,要身体素质过硬,还要有飞船等等一系列的东西.这些都属于实力的一部分.而头脑不同,只要你头脑够简单就可以了!同样要上月亮上,你想,我一跳,1米,我多跳几下,跳个几年,应该就到了!什么?你说我会往下掉?你笨呀你!找个东西踩一下不就行了吗? 无论工作还
WEBLOGIC事务超时设置 bijian1013 weblogic jta 事务超时
系统中统计数据，由于调用统计过程，执行时间超过了weblogic设置的时间，提示如下错误：统计数据出错! 原因：The transaction is no longer active - status: 'Rolling Back. [Reason=weblogic.transaction.internal
两年已过去，再看该如何快速融入新团队 bingyingao java 互联网融入架构新团队
偶得的空闲，翻到了两年前的帖子该如何快速融入一个新团队，有所感触，就记下来，为下一个两年后的今天做参考。时隔两年半之后的今天，再来看当初的这个博客，别有一番滋味。而我已经于今年三月份离开了当初所在的团队，加入另外的一个项目组，2011年的这篇博客之后的时光，我很好的融入了那个团队，而直到现在和同事们关系都特别好。大家在短短一年半的时间离一起经历了一
【Spark七十七】Spark分析Nginx和Apache的access.log bit1129 apache
Spark分析Nginx和Apache的access.log，第一个问题是要对Nginx和Apache的access.log文件进行按行解析，按行解析就的方法是正则表达式： Nginx的access.log解析正则表达式 val PATTERN = """([^ ]*) ([^ ]*) ([^ ]*) (\\[.*\\]) (\&q
Erlang patch bookjovi erlang
Totally five patchs committed to erlang otp, just small patchs. IMO, erlang really is a interesting programming language, I really like its concurrency feature. but the functional programming style
log4j日志路径中加入日期 bro_feng java log4j
要用log4j使用记录日志，日志路径有每日的日期，文件大小5M新增文件。实现方式 log4j: <appender name="serviceLog" class="org.apache.log4j.RollingFileAppender"> <param name="Encoding" v
读《研磨设计模式》-代码笔记-桥接模式 bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ /** * 个人觉得关于桥接模式的例子，蜡笔和毛笔这个例子是最贴切的：http://www.cnblogs.com/zhenyulu/articles/67016.html * 笔和颜色是可分离的，蜡笔把两者耦合在一起了：一支蜡笔只有一种
windows7下SVN和Eclipse插件安装 chenyu19891124 eclipse插件
今天花了一天时间弄SVN和Eclipse插件的安装，今天弄好了。svn插件和Eclipse整合有两种方式，一种是直接下载插件包，二种是通过Eclipse在线更新。由于之前Eclipse版本和svn插件版本有差别，始终是没装上。最后在网上找到了适合的版本。所用的环境系统：windows7JDK：1.7svn插件包版本：1.8.16Eclipse：3.7.2工具下载地址：Eclipse下在地址：htt
[转帖]工作流引擎设计思路 comsci 设计模式工作应用服务器 workflow 企业应用
作为国内的同行，我非常希望在流程设计方面和大家交流，刚发现篇好文(那么好的文章，现在才发现，可惜)，关于流程设计的一些原理，个人觉得本文站得高，看得远，比俺的文章有深度，转载如下 ================================================================================= 自开博以来不断有朋友来探讨工作流引擎该如何
Linux 查看内存，CPU及硬盘大小的方法 daizj linux cpu 内存硬盘大小
一、查看CPU信息的命令 [root@R4 ~]# cat /proc/cpuinfo |grep "model name" && cat /proc/cpuinfo |grep "physical id" model name : Intel(R) Xeon(R) CPU X5450 @ 3.00GHz model name :
linux 踢出在线用户 dongwei_6688 linux
两个步骤： 1.用w命令找到要踢出的用户，比如下面： [root@localhost ~]# w 18:16:55 up 39 days, 8:27, 3 users, load average: 0.03, 0.03, 0.00 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
放手吧,就像不曾拥有过一样 dcj3sjt126com
内容提要：静悠悠编著的《放手吧就像不曾拥有过一样》集结“全球华语世界最舒缓心灵”的精华故事，触碰生命最深层次的感动，献给全世界亿万读者。《放手吧就像不曾拥有过一样》的作者衷心地祝愿每一位读者都给自己一个重新出发的理由，将那些令你痛苦的、扛起的、背负的，一并都放下吧！把憔悴的面容换做一种清淡的微笑，把沉重的步伐调节成春天五线谱上的音符，让自己踏着轻快的节奏，在人生的海面上悠然漂荡，享受宁静与
php二进制安全的含义 dcj3sjt126com PHP
PHP里，有string的概念。 string里，每个字符的大小为byte（与PHP相比，Java的每个字符为Character，是UTF8字符，C语言的每个字符可以在编译时选择）。 byte里，有ASCII代码的字符，例如ABC，123，abc，也有一些特殊字符，例如回车，退格之类的。特殊字符很多是不能显示的。或者说，他们的显示方式没有标准，例如编码65到哪儿都是字母A，编码97到哪儿都是字符
Linux下禁用T440s，X240的一体化触摸板(touchpad) gashero linux ThinkPad 触摸板
自打1月买了Thinkpad T440s就一直很火大，其中最让人恼火的莫过于触摸板。 Thinkpad的经典就包括用了小红点(TrackPoint)。但是小红点只能定位，还是需要鼠标的左右键的。但是自打T440s等开始启用了一体化触摸板，不再有实体的按键了。问题是要是好用也行。实际使用中，触摸板一堆问题，比如定位有抖动，以及按键时会有飘逸。这就导致了单击经常就
graph_dfs hcx2013 Graph
package edu.xidian.graph; class MyStack { private final int SIZE = 20; private int[] st; private int top; public MyStack() { st = new int[SIZE]; top = -1; } public void push(i
Spring4.1新特性——Spring核心部分及其他 jinnianshilongnian spring 4.1
目录 Spring4.1新特性——综述 Spring4.1新特性——Spring核心部分及其他 Spring4.1新特性——Spring缓存框架增强 Spring4.1新特性——异步调用和事件机制的异常处理 Spring4.1新特性——数据库集成测试脚本初始化 Spring4.1新特性——Spring MVC增强 Spring4.1新特性——页面自动化测试框架Spring MVC T
配置HiveServer2的安全策略之自定义用户名密码验证 liyonghui160com
具体从网上看 http://doc.mapr.com/display/MapR/Using+HiveServer2#UsingHiveServer2-ConfiguringCustomAuthentication LDAP Authentication using OpenLDAP Setting
一位30多的程序员生涯经验总结 pda158 编程工作生活咨询
1.客户在接触到产品之后，才会真正明白自己的需求。　　这是我在我的第一份工作上面学来的。只有当我们给客户展示产品的时候，他们才会意识到哪些是必须的。给出一个功能性原型设计远远比一张长长的文字表格要好。 2.只要有充足的时间，所有安全防御系统都将失败。　　安全防御现如今是全世界都在关注的大课题、大挑战。我们必须时时刻刻积极完善它，因为黑客只要有一次成功，就可以彻底打败你。 3.
分布式web服务架构的演变自由的奴隶 linux Web 应用服务器互联网
最开始，由于某些想法，于是在互联网上搭建了一个网站，这个时候甚至有可能主机都是租借的，但由于这篇文章我们只关注架构的演变历程，因此就假设这个时候已经是托管了一台主机，并且有一定的带宽了，这个时候由于网站具备了一定的特色，吸引了部分人访问，逐渐你发现系统的压力越来越高，响应速度越来越慢，而这个时候比较明显的是数据库和应用互相影响，应用出问题了，数据库也很容易出现问题，而数据库出问题的时候，应用也容易
初探Druid连接池之二——慢SQL日志记录 xingsan_zhang 日志连接池 druid 慢SQL
由于工作原因，这里先不说连接数据库部分的配置，后面会补上，直接进入慢SQL日志记录。 1.applicationContext.xml中增加如下配置： <bean abstract="true" id="mysql_database" class="com.alibaba.druid.pool.DruidDataSourc