闲余_梦想

Representation Flow for Action Recognition —— 翻译

Abstract 摘要
1 Introduction 简介
2 Related Works 相关工作
3 Approach 方法

3.1 Review of Optical Flow Methods 回顾光流方法
3.2 Representation Flow Layer 特征流层

Representation Flow within a CNN CNN中的特征流
Computing Flow-of-Flow 计算Flow-of-Flow

3.3 Activity Recognition Model 行为识别模型

4 Experiments 实验

Implementation details 实施细节
Where to compute flow? 在哪计算流？
What to learn? 学习什么？
How many iterations for flow? 需要迭代多少次？
Two-stream fusion? 双流融合？
Flow-of-flow Flow-of-flow
Flow of 3D CNN Feature 3D CNN的流
Comparison to other motion representations 与其他运动特征进行比较
Computation time 计算时间
Comparison to state-of-the-arts 与先进算法进行比较

5 Conclusion 结论

“
该论文发表在CVPR 2019上，本文主要对该论文进行段落逐字翻译。
论文下载：论文下载地址.
CV小白初入CV，还请多指教。
”

Abstract 摘要

In this paper, we propose a convolutional layer inspired by optical flow algorithms to learn motion representations. Our representation flow layer is a fully-differentiable layer designed to capture the ‘flow’ of any representation channel within a convolutional neural network for action recognition. Its parameters for iterative flow optimization are learned in an end-to-end fashion together with the other CNN model parameters, maximizing the action recognition performance. Furthermore, we newly introduce the concept of learning ‘flow of flow’ representations by stacking multiple representation flow layers. We conducted extensive experimental evaluations, confirming its advantages over previous recognition models using traditional optical flows in both computational speed and performance. The code is publicly available.
在本文中，我们提出了一个受光流法启发的学习运动特征的卷积层。我们的表示流层是一个完全可微的层，被设计用来捕捉在卷积神经网络中任何通道的光流进行行为识别。它的迭代流优化参数和其他CNN模型的参数一起以端到端的方式学习，以此来最大化行为识别性能。而且，我们最新提出了 ‘flow to flow’ 的概念，是通过叠加多个表示流层表示的。我们进行了大量实验评估证明该方法的计算速度和性能优于先前使用传统光流方法的识别模型。该方法的代码是公开可用的。

1 Introduction 简介

Activity recognition is an important problem in computer vision with many societal applications including surveillance, robot perception, smart environment/city, and more. Use of video convolutional neural networks (CNNs) have become the standard method for this task, as they can learn more optimal representations for the problem. Two-stream networks[20], taking both RGB frames and optical flow as input, provide state-of-the-art results and have been extremely popular.3-D spatio-temporal CNN models, e.g., I3D [3] , with XYT convolutions also found that such two-stream design (RGB + optical flow) increases their accuracy. Abstracting both appearance information and explicit motion flow benefits the recognition.
行为识别在计算机视觉中是一个非常重要的问题，在监视、机器感知、智慧环境/城市等诸多社会应用中都有所应用。使用视频卷积神经网络（CNNs）已经成为完成该任务的标准方法，因为该方法可以学习到问题更优的特征。双流网络以RGB帧和光流作为输入，该方法得到了非常先进的结果而且非常流行。3D时空卷积模型，比如使用XYT三个维度的I3D也发现了双流设计（RGB+光流）提高了精度。将外观信息和运动流信息抽象出来有利于识别。
However, optical flow is expensive to compute. It often requires hundreds of optimization iterations every frame, and causes learning of two separate CNN streams (i.e., RGB-stream and flow-stream). This requires significant computation cost and a great increase in the number of model parameters to learn. Further, this means that the model needs to compute optical flow every frame even during inference and run two parallel CNNs, limiting its real-time applications.
然而，光流的计算开销大。它通常要求对每一帧进行几百次的最优迭代，而且需要学习两个独立的卷积流（即，RGB流和光流）。这需要大量的计算成本和足够的模型参数训练。此外，这代表该模型需要在判断的时候计算每一帧的光流，而且同时运行两个CNNs，这些限制了它的实时应用。
There were previous works to learn representations capturing motion information without using optical flow as input, such as motion feature networks [15] and ActionFlowNet[16]. However, although they were more advantageous in terms of the number of model parameters and computation speed, they suffered from inferior performance compared to two-stream models on public datasets such as Kinetics [13] and HMDB [14]. We hypothesize that the iterative optimization performed by optical flow methods produces an important feature that other methods fail to capture.
先前有些工作，比如运动特征网络和ActionFlowNet捕获运动信息没有使用光流作为输入。然而，尽管它们在模型参数数量和计算速度上具有优势，但是与双流网络相比，在公共数据集Kinetics和HMDB上其性能较差。我们设想光流方法的迭代优化产生了一种其他方法不能捕获的重要特征。
In this paper, we propose a CNN layer inspired by optical flow algorithms to learn motion representations for action recognition without having to compute optical flow. Our representation flow layer is a fully-differentiable layer designed to capture ‘flow’ of any representation channels within the
model. Its parameters for iterative flow optimization are learned together with other model parameters, maximizing the action recognition performance. This is also done without having/training multiple network streams, reducing the number of parameters in the model. Further, we newly introduce the concept of learning ‘flow of flow’ representations by stacking multiple representation flow layers. We conduct extensive action classification experimental evaluation of where to compute optical flow and various hyperparameters, learning parameters, and fusion techniques.
在这篇文章中，我们受通过光流法学习运动特征启发，提出了不需要计算光流来进行行为识别的CNN层。我们的表示流层是一个完全可微的层，被设计用来捕捉在卷积神经网络中任何通道的光流进行行为识别。它的迭代流优化参数和其他CNN模型的参数一起以端到端的方式学习，以此来最大化行为识别性能。同样这也不需要有/训练多个网络流，因此减少了模型的参数。而且，我们最新提出了 ‘flow to flow’ 的概念，是通过叠加多个表示流层表示的。我们进行大量行为识别实验，评估哪里适合计算光流和各种超参数、学习参数和融合技术。
Our contribution is the introduction of a new differentiable CNN layer that unrolls the iterations of the TV-L1 optical flow method. This allows for learning of the optical flow parameters, application to any CNN feature maps (i.e., intermediate representations), and lower computational cost while maintaining performance.
我们的贡献是引入了一个新的可微CNN层，这展开了TV-L1光流方法的迭代。这允许学习光流参数运用到任何CNN的特征地图（即，中间特征）以及在保持性能的同时产生更低的计算开销。

2 Related Works 相关工作

Capturing motion and temporal information has been studied for activity recognition. Early, hand-crafted approaches such as dense trajectories [24] captured motion information by tracking points through time. Many algorithms have been developed to compute optical flow as a way to capture motion in video [8]. Other works have explored learning the ordering of frames to summarize a video in a single ‘dynamic image’ used for activity recognition [1].
捕获运动和时间特征早就在行为识别中被研究。早期，比如密集轨迹这一手工方法通过跟踪视频中的时间点来捕获运动信息。为了捕获视频中的运动信息，许多计算光流的方法被不断发展。也有其他工作探索了学习顺序帧去总结视频中的单一 ‘动态图像’ 来进行行为识别。
Convolutional neural networks (CNNs) have been applied to activity recognition. Initial approaches explored methods to combine temporal information based on pooling or temporal convolution [12, 17]. Other works have explored using attention to capture sub-events of activities [18]. Two-stream
networks have been very popular: they take input of a single RGB frame (captures appearance information) and a stack of optical flow frames (captures motion information). Often, the two network streams of the model are separately trained and the final predictions are averaged together [20]. There were other two-stream CNN works exploring different ways to ‘fuse’ or combine the motion CNN with the appearance CNN [7, 6]. There were also large 3D XYT CNNs learning spatio-temporal patterns [26, 3], enabled by large video datasets such as Kinetics [13]. However, these approaches
still rely on optical flow input to maximize their accuracies.
卷积神经网络（CNNs）被应用在行为识别领域。最开始的结合时间特征的方法通过池化或者时间卷积来完成。其他工作也探索了使用注意力来获得行为的子事件。双流网络非常流行，是以单帧RGB帧（获得外观信息）和堆叠光流帧（获得运动型）作为输入。通常，两个网络需要单独训练，最后的预测结果是它们的平均值。也有其它双流网络探索使用不同的方法去融合或者结合运动CNN和外观CNN。也有大的3D XYT CNNs在大视频数据集（如，Kinetics）上来学习时空特征。然而，这些方法都依赖于将光流作为输入来获得最高的准确率。
While optical flow is known to be an important feature, flows optimized for activity recognition are often different from the true optical flow [19], suggesting that end-to-end learning of motion representations is beneficial. Recently, there have been works on learning such motion representations using convolutional models. Fan et al. [5] implemented the TV-L1 method using deep learning libraries to increase its computational speed and allow for learning some parameters. The result was fed to a two-stream CNN for the recognition. Several works explored learning a CNN to predict optical flow, which also can be used for action recognition [4, 9, 11, 16, 21]. Lee et al. [15] shifted features from sequential frames to capture motion in a non-iterative fashion. Sun et al. [21] proposed an optical flow guided feature (OFF) by computing the gradients of representations and temporal differences, but it lacked the iterative optimization necessary for accurate flow computation. Further, it requires a three-stream model taking RGB, optical flow, and RGB differences to achieve state-of-the-art performance.
光流已被得知是一种非常重要的特征，但在行为识别中优化得到的光流往往与实际光流不同，这表明端到端的学习运动特征是有益的。最近，有使用卷积模型来学习这些运动特征。Fan等人通过使用深度学习库实施了TV-L1方法，以此提高了计算速度，也允许学习一些参数。结果被送入双流网络来进行行为识别。有一些工作通过学习CNN来预测光流，同样也被用于行为识别。Lee等人通过非迭代的方式获得运动特征。Sun等人提出通过计算特征梯度和时间差异的光流引导特征方法（OFF），但是该方法缺乏精确光流的必要迭代优化。而且它的最好性能需要RGB、光流以及RGB差异三种信息为输入的三流模型来获得。
Unlike prior works, our proposed model with representation flow layers relies only on RGB input, learning far fewer parameters while correctly representing motion with the iterative optimization. It is significantly faster than the video CNNs requiring optical flow input, while still performing as good as or even better than the two-stream models. It clearly outperforms existing motion representation methods including TVNet [5] and OFF [21] in both speed and accuracy, which we experimentally confirm.
与先前的工作不同，我们提出的特征流层模型只依赖于RGB输入，在准确通过迭代优化获取运动特征时学习更少的参数。该方法比在视频CNNs中需要光流作为输入快很多，而且性能甚至好于双流模型。我们的实验证明，该方法在速度和精度上好于现有的运动特征方法，包括TVNet和OFF。

3 Approach 方法

Our method is a fully-differentiable convolutional layer inspired by optical flow algorithms. Unlike traditional optical flow methods, all the parameters of our method can be learned end-to-end, maximizing action recognition performance. Furthermore, our layer is designed to compute the ‘flow’ of any representation channels, instead of limiting its input to be traditional RGB frames.
我们的方法是受光流法启发的一个完全可微的卷积层。与传统的光流方法不同，该方法的所有参数都可以进行端到端的学习来最大化识别的性能。而且，该方法不被限制以传统的RGB帧作为输入，可以计算任何一个特征通道的流。

3.1 Review of Optical Flow Methods 回顾光流方法

Before describing our layer, we briefly review how optical flow is computed. Optical flow methods are based on the brightness consistency assumption. That is, given sequential images $I_1$ , $I_2$ , a point $x$ , $y$ in $I_1$ is located at $x + ∆ x$ , $y + ∆ y$ in $I_2$ , or $I_1(x, y) = I_2(x + ∆x, y + ∆y)$ . These methods assume small movements between frames, so this can be approximated with a Taylor series: $I_2= I_1+\frac{δI }{δx} ∆x+\frac{δI}{ δy }∆y$ , where $u = [∆ x, ∆ y]$ . These equations are solved for $u$ to obtain the flow, but can only be approximated due to the two unknowns.
在讲述我们的层之前，我们先简单的回顾以下光流是如何计算的。光流法是基于光流一致性的假设。也就是说，给定序列图像 $I_1$ , $I_2$ ，给定 $I_1$ 中的两个点 $x$ , $y$ ，其在 $I_2$ 中的位置为 $x + ∆ x$ , $y + ∆ y$ ，即 $I_1(x, y) = I_2(x + ∆x, y + ∆y)$ 。这些方法假设相邻帧之间是小位移的，所以可以使用泰勒级数近似表示： $I_2= I_1+\frac{δI }{δx} ∆x+\frac{δI}{ δy }∆y$ ，其中 $u = [∆ x, ∆ y]$ 。这些方程解 $u$ 获得流信息，但因为两个未知数只能近似得到。
The standard, variational methods for approximating optical flow (e.g., Brox [2] and TV-L1 [27] methods) take sequential images $I_1$ , $I_2$ as input. Variational optical flow methods estimate the flow field, $u$ , using an iterative optimization method. The tensor $u ∈ R^{2×W×H}$ is the $x$ and $y$ directional flow for every location in the image. Taking two sequential images as input, $I_1$ , $I_2$ , the methods first compute the gradient in both $x$ and $y$ directions: $I_2$ . The initial flow is set to 0, $u = 0$ . Then $ρ$ , which captures the motion residual between two frames, based on the current flow estimate $u$ , can be computed. For efficiency, the constant part of $ρ$ , $ρ_c$ is precomputed:
$ρ=I_2-∇_xI_2·u_x-∇_y·I_2·u_y-I_1 (1)$
基于此的变异获得光流的方法（比如，Brox [2] 和 TV-L1 [27] 方法）是以序列图片作为输入的。变异光流方法使用迭代优化估计流场、 $u$ 。张量 $u ∈ R^{2×W×H}$ 是图像中每个位置 $x$ 和 $y$ 方向的流。将 $I_1$ 和 $I_2$ 序列图像作为输入，首先计算 $x$ 和 $y$ 方向的梯度， $I_2$ 。初始化流， $u = 0$ 。 $ρ$ 基于 $u$ 捕获两帧之前的运动残差。为了效率，常量 $ρ$ , $ρ_c$ 可提前计算。
$ρ=I_2-∇_xI_2·u_x-∇_y·I_2·u_y-I_1 (1)$
The iterative optimization is then performed, each updating $u$ :
$ρ=ρ_c+∇_xI_2·u_x+∇_y·I_2·u_y (2)$
$v=\left\{ \begin{array}{rcl} u+\lambda\theta ∇I_2 & & {\rho < -\lambda\theta|∇I_2|^2}\\ u-\lambda\theta ∇I_2 & & {\rho > \lambda\theta|∇I_2|^2}\\ u-\rho\frac{∇I_2}{|I_2|^2} & & {otherwise} \end{array} \right. (3)$
$u=v+\theta·divergence ( p ) (4)$
$p=\frac{p+\frac{\tau}{\theta}∇u}{1+\frac{\tau}{\theta}|∇u|}(5)$
然后进行迭代优化，每次更新 $u$ ：
$ρ=ρ_c+∇_xI_2·u_x+∇_y·I_2·u_y (2)$
$v=\left\{ \begin{array}{rcl} u+\lambda\theta ∇I_2 & & {\rho < -\lambda\theta|∇I_2|^2}\\ u-\lambda\theta ∇I_2 & & {\rho > \lambda\theta|∇I_2|^2}\\ u-\rho\frac{∇I_2}{|I_2|^2} & & {otherwise} \end{array} \right. (3)$
$u=v+\theta·divergence ( p ) (4)$
$p=\frac{p+\frac{\tau}{\theta}∇u}{1+\frac{\tau}{\theta}|∇u|}(5)$
Here $θ$ controls the weight of the TV-L1 regularization term, $λ$ controls the smoothness of the output and $τ$ controls the time-step. These hyperparameters are manually set. $p$ is the dual vector fields, which are used to minimize the energy. The divergence of $p$ , or backward difference, is computed as:
$divergence(p) = p_{x,i,j}− p_{x,i−1,j}+ p_{y,i,j}− p_{y,i,j−1}(6)$
where $p x$ is the $x$ direction and $p_y$ is the $y$ direction, and $p$ contains all the spatial locations in the image.
这里 $\theta$ 控制TV-L1正则化的权重， $λ$ 控制输出的平滑度， $τ$ 控制时间步长。这些超参数是手动设置的。 $p$ 是用来最小化能量的对偶向量场。 $p$ 的差异或者反向差异如下计算：
$divergence(p) = p_{x,i,j}− p_{x,i−1,j}+ p_{y,i,j}− p_{y,i,j−1}(6)$
其中 $p x$ 表示方向 $x$ ， $p_y$ 表示方向 $y$ ， $p$ 表示图像中的所有空间位置。
The goal is to minimize the total variational energy:
$E = |∇u| + λ|∇I_1∗ u + I_1− I_2| (7)$
最小化总差异能量：
$E = |∇u| + λ|∇I_1∗ u + I_1− I_2| (7)$
Approaches run this iterative optimization for multiple input scales, from small to large, and use the previous flow estimate $u$ to warp $I_2$ at the larger scale, providing a coarse-to-fine optical flow estimation. These standard approaches require multiple scales and warpings to obtain a good flow estimate, taking thousands of iterations.
方法通过多个尺度输入从小到大进行迭代，并使用先前的流估计 $u$ 在更在大的尺度上扭曲 $I_2$ ，从而提供一个从粗到细的光流估计。这些标准的方法需要多尺度和扭曲通过几千次迭代来获得好的流估计。

3.2 Representation Flow Layer 特征流层

Inspired by the optical flow algorithm, we design a fully-differentiable, learnable, convolutional representation flow layer by extending the general algorithm outlined above. The main differences are that (i) we allow the layer to capture flow of any CNN feature map, and that (ii) we learn its parameters including $θ$ , $λ$ , and $τ$ as well as the divergence weights. We also make several key changes to reduce computation time: (1) we only use a single scale, (2) we do not perform any warping, and (3) we compute the flow on a CNN tensor with a smaller spatial size. Multiple scale and warping are computationally expensive, each requiring many iterations. By learning the flow parameters, we can eliminate the need for these additional steps. Our method is applied on lower resolution CNN feature maps, instead of the RGB input, and is trained in an end-to-end fashion. This not only benefits its speed, but also allows the model to learn a motion representation optimized for activity recognition.
受光流法启发，我们通过一般算法的扩展，设计了全可微、可学习的卷积流层。主要的差异在(i) 允许层捕获任何CNN特征图的流，(ii) 学习包括 $θ$ , $λ$ ， $τ$ 和散度权重参数。同时我们也做了几个关键的改变来减少计算时间：(1) 我们只使用一个规模， (2)我们不需要任何扭曲，(3)我们计算一个小空间尺度的CNN张量的流。多尺度和扭曲计算开销大，每个要求很多次迭代。通过学习流参数，我们可以消除这些额外的步骤。我们的方法应用在较低分辨率的CNN特征图上，而不是RGB输入，而且是进行端到端的训练。这不仅提高了它的速度，而且允许模型为行为识别学习运动特征优化。
We note that the brightness consistency assumption can similarly be applied to CNN feature maps. Instead of capturing pixel brightness, we capture feature value consistency.This same assumption holds as CNNs are designed to be spatially invariant; i.e., they produce roughly the same feature value for the same object as it moves.
我们注意到亮度一致性假设同样适用于CNN特征图。我们捕获的不是像素亮度而是特征值的一致性。同样的假设也适用于空间不变性，比如，当一个物体移动时，他们会产生相同的特征值。
Given the input $F_1$ ， $F_2$ , a single channel from sequential CNN feature maps (or input image), we compute the feature-map-gradient by convolving the input feature maps with the Sobel filter:
$∇F_{2x}=\begin{bmatrix} 1 & 0 & -1\\ 2 & 0 &-2\\1 & 0& -1\end{bmatrix}*F_2,∇F_{2y}=\begin{bmatrix} 1 & 2 & 1\\0 & 0 &0\\-1 & -2& -1\end{bmatrix}*F_2(8)$
给定输入 $F_1$ ， $F_2$ ，来自序列CNN特征图（或者输入图）的单通道，我们计算使用Sobel算子与输入特征图进行卷积操作得到特征图梯度：
$∇F_{2x}=\begin{bmatrix} 1 & 0 & -1\\ 2 & 0 &-2\\1 & 0& -1\end{bmatrix}*F_2,∇F_{2y}=\begin{bmatrix} 1 & 2 & 1\\0 & 0 &0\\-1 & -2& -1\end{bmatrix}*F_2(8)$
We set $u = 0$ , $p = 0$ initially, each having width and height matching the input, then we can compute $ρc= F_2−F_1$ . Next, following Algorithm 1, we repeatedly apply the operations in Eqs. 2-5 for a fixed number of iterations to enable the iterative optimization. To compute the divergence, we zero-pad $p$ on the first column (x-direction) or row (y- direction) then convolve it with weights, $w_x$ , $w_y$ to compute Eq. 6:
$divergence(p)=p_x*w_x+p_y*w_y(9)$
where initially $w_x=\begin{bmatrix}-1&1 \end{bmatrix}$ and $w_y=\begin{bmatrix}-1\\1 \end{bmatrix}$ . Note that these parameters are also differentiable and can be learned with backpropagation. We compute $\nabla u$ as
$∇u_x=\begin{bmatrix} 1&0&-1\\2&0&-2\\1&0&-1\end{bmatrix}*u_x,∇u_y=\begin{bmatrix} 1&2&1\\0&0&0\\-1&-2&-1\end{bmatrix}*u_y(10)$
我们初始化 $u = 0$ ， $p = 0$ ，每一个有和输入匹配的高度和宽度，然后我们可以计算 $ρc= F_2−F_1$ 。接下来，根据公式1，我们在公式. 2-5重复操作，达到迭代优化。为了计算散度，我们在第一列（x-方向）或者第一行（y-方向）上加 $p$ ，然后使用权值进行卷积，结合 $w_x$ , $w_y$ 计算公式6：
$divergence(p)=p_x*w_x+p_y*w_y(9)$
其中初始 $w_x=\begin{bmatrix}-1&1 \end{bmatrix}$ ， $w_y=\begin{bmatrix}-1\\1 \end{bmatrix}$ 。注意这些参数同样可微，也可以通过反向传播被学习。我们按如下方法计算 $\nabla u$ ：
$∇u_x=\begin{bmatrix} 1&0&-1\\2&0&-2\\1&0&-1\end{bmatrix}*u_x,∇u_y=\begin{bmatrix} 1&2&1\\0&0&0\\-1&-2&-1\end{bmatrix}*u_y(10)$

Representation Flow within a CNN CNN中的特征流

Representation Flow within a CNN Algorithm 1 and Fig. 2 describe the process of our representation flow layer. Our flow layer with multiple iterations could also be inter-preted as having a sequence of convolutional layers sharing parameters (i.e., each blue box in Fig. 2), with each layer’s behavior dependent on its previous layer. As a result of this formulation, the layer becomes fully differentiable and allows for the learning of all parameters, including $(τ, λ, θ)$ and the divergence weights $w_x, w_y)$ . This enables our learned representation flow layer to be optimized for its task (i.e., action recognition).
CNN中的特征流算法 1和图 2描述了我们特征流的过程。具有多个迭代的流层同样也可以共享参数（例如，图2中的每个蓝方框），每个层的表现取决于它先前的层。由于这个公式，层变得可微且允许学习所有参数，包括 $(τ, λ, θ)$ 和散度权重 $w_x, w_y)$ 这使得我们的学习特征层可以为了其任务优化（比如行为识别）。

Computing Flow-of-Flow 计算Flow-of-Flow

Computing Flow-of-Flow Standard optical flow algorithms compute the flow for two sequential images. An optical flow image contains information about the direction and magnitude of the motion. Applying the flow algorithm directly on two flow images means that we are tracking pixels/locations showing similar motion in two consecutive frames. In practice, this typically leads to a worse performance due to inconsistent optical flow results and non-rigid motion. On the other hand, our representation flow layer is ‘learned’ from the data, and is able to suppress such inconsistency and better abstract/represent motion by having multiple regular convolutional layers between the flow layers. Fig. 6 illustrates such design, which we confirm its benefits in the experiment section. By stacking multiple representation flow layers, our model is able to capture longer temporal intervals and consider locations with motion consistency.
计算Flow-of-Flow 标准的光流算法计算两个序列图像的流。光流图像包含运动的方向和大小信息。直接运用硫酸法在两张流图上代表我们在连续两帧中展示跟踪像相似运动的素点/位置。在实际运动中，因为不一致的光流和非刚性的运动导致性能下降。另一个方面，我们的特征流层是从数据中学习的，能够在流层之间通过多次有规律的卷积来抑制不一致性和更好的抽象/表示运动。图 6展示了这个设计，实验证明了它的优势。通过叠加多个特征流层，我们的模型可以捕获更长时间范围并根据运动一致性确定位置。
CNN feature maps may have hundreds or thousands of channels and our representation flow layer computes the flow for each channel, which can take significant time and memory. To address this, we apply a convolutional layer to reduce the number of channels from $C$ to $C^{'}$ before the flow layer (note that $C^{'}$ is still significantly more than traditional optical flow algorithms, which were only applied to single-channel, greyscale images). For numerical stability, we normalize this feature map to be in [0,255], matching standard image values. We found that the CNN features were quite small on average (< 0.5) and the TVL-1 algorithm default hyperparameters are designed for standard images values in [0,255], thus we found this normalization step important. Using the normalized feature, we compute the flow and stack the $x$ and $y$ flows, resulting in $2C^{'}$ channels. Finally, we apply another convolutional layer to convert from $2C^{'}$ channels to $C$ channels. This is passed to the remaining CNN layers for the prediction. We average predictions from many frames to classify each video, as shown in Fig. 3.
CNN特征图有几百或者几千个通道，我们的特征流层计算每一个通道的流，这需要很多时间和存储空间。为了解决这个问题，我们运用一个卷积层来减少 $C$ 到 $C^{'}$ 的通道的数（注意 $C^{'}$ 只应用于单通道灰度图像，但仍明显多于传统的光流算法）。为了数值稳定性，我们归一化特征图的大小为[0，255]，因此我们发现这个归一化步骤是重要的。使用归一化的特征，我们计算流并堆叠 $x$ $y$ 方向的流，产生 $2C^{'}$ 通道。最终，我们运用另一个卷积层转换 $C$ 通道到 $2C^{'}$ 通道。这传递给剩下的CNN层进行预测。我们从很多帧来均值预测每个视频的类别，如图 3 所示。

3.3 Activity Recognition Model 行为识别模型

We place the representation flow layer inside a standard activity recognition model taking a $T \times C \times W \times H$ tensor as input to a CNN. Here, $C$ is 3 as our model uses direct RGB frames as an input. $T$ is the number of frames the model processes, and $W$ and $H$ are the spatial dimensions. The CNN outputs a prediction per-timestep and these are temporally averaged to produce a probability for each class. The model is trained to minimize crossentropy:
$L(v,c)=-\sum(c==i)log(p_i)(11)$
where $p = M (v)$ , $v$ is the video, the function $M$ is the classification CNN and $c$ represents which of the $K$ classes $v$ belongs. That is, the parameters in our flow layers are trained together with the other layers, so that it maximizes the final classification accuracy.
我们将特征流层放在一个标准的行为识别模型中，以 $T \times C \times W \times H$ 的张量作为CNN的输入。这里， $C$ 是3，因为我们的模型采用RGB帧作为输入。 $T$ 是模型处理的帧数， $W$ 和 $H$ 是空间维度。CNN每一个时间段输出一个预测，去均值后产生每一类的概率。模型最小化交叉熵损失：
$L(v,c)=-\sum(c==i)log(p_i)(11)$
其中， $p = M (v)$ ， $v$ 是视频，函数 $M$ 是分类CNN， $c$ 表示 $v$ 属于 $K$ 类中的一种。这也就是说，我们流层中的参数是和其他层一起训练的，从而使最后分类精度最高化。

4 Experiments 实验

Implementation details 实施细节

Implementation details We implemented our representation flow layer in PyTorch and our code and models are available. As training CNNs on videos is computationally expensive, we used a subset of the Kinetics dataset [13] with 100k videos from 150 classes: Tiny-Kinetics. This allowed testing many models more quickly, while still having sufficient data to train large CNNs. For most experiments, we used ResNet-34 [10] with input of size $16 \times 112 \times 112$ (i.e., 16 frames with spatial size of 112). To further reduce the computation time for many studies, we used this smaller input, which reduces performance, but allowed us to use larger batch sizes and run many experiments more quickly. Our final models are trained on standard $224 \times 224$ images. Check Appendix for specific training details.
实施细节我们使用PyTorch来实施我们的特征流层，我们的代码和模型都是可用的。因为训练CNNs在视频上计算开销很小，我们使用Kinetics的一个来自于150个类包含100k个视频子集Tiny-Kinetics。这使得模型测试速度加快，同时也有足够的数据训练大的CNNs。对于大多数实验，我们使用输入尺寸为 $16 \times 112 \times 112$ 的ResNet-34（例如，6帧尺寸为112）。为了更好的减少计算时间，我们使用更小的输入，这降低了性能，但允许我们使用更大的批处理大小和运行更多的实验。我们最终模型的是在 $224 \times 224$ 的图像上训练的。具体训练细节查看附录。

Where to compute flow? 在哪计算流？

Where to compute flow? To determine where in the network to compute the flow, we compare applying our flow layer on the RGB input, after the first conv. layer, and after the each of the 5 residual blocks. The results are shown in Table 1. We find that computing the flow on the input provides poor performance, similar to the performance of the flow-only networks, but there is a significant jump after even 1 layer, suggesting that computing the flow of a feature is beneficial, capturing both the appearance and motion information. However, after 4 layers, the performance begins to decline as the spatial information is too abstracted/compressed (due to pooling and large spatial receptive field size), and sequential features become very similar, containing less motion information. Note that our HMDB performance in this table is quite low compared to state-of-the-art methods due to being trained from scratch using few frames and low spatial resolution ( $112 \times 112$ ). For the following experiments, unless otherwise noted, we apply the layer after the 3rd residual block. In Fig. 7, we visualize the learned motion representations computer after block 3.
在哪里计算流？为决定在哪里计算流，我们将应用在RGB输入、第一个卷积层后和每五个残差模块后进行了比较。结果如表 1所示。我们发现在输入上计算流性能较差，接近仅流输入的网络，但在一层之后有显著的进步，这表明计算特征的流是有益的，可以捕获时间和运动信息。然后，在四层之后性能开始下降，因为空间信息太抽象/压缩（因为池化和空间感受野较大），序列图像变得很相似，包含的运动信息减少。注意与先进算法比较在HMDB上的性能表现差，这是因为开始训练的时候使用较少的帧和低的空间尺度（ $112 \times 112$ ）。在接下来的实验中，无特别说明，我们应用层在第三个残差模块后。在图 7中，我们可视化模块3之后的流运动特征计算。

What to learn? 学习什么？

What to learn? As our method is fully differentiable, we can learn any of the parameters, such as the kernels used to compute image gradients, the kernels for the divergence computation and even $τ$ , $λ$ , $θ$ . In Table 2, we compare the effects of learning different parameters. We find that learning the Sobel kernel values reduces performance due to noisy gradients particularly when the batch size is limited, but learning the divergence and $τ$ , $λ$ , $θ$ is beneficial.
学习什么因为我们的方法是完全可微的，我们可以学习任何参数，比如计算图像梯度的核，计算散度的核以及 $τ$ , $λ$ , $θ$ 。在表 2中，我们比较了学习不同参数的效果。我们发现Sobel核由于噪声梯度性能下降，尤其在批处理大小受限的时候，但是对于学习散度和 $τ$ , $λ$ , $θ$ 是有益的。

How many iterations for flow? 需要迭代多少次？

How many iterations for flow? To confirm that the iterations are important and determine how many we need, we experiment with various numbers of iterations. We compare the number of iterations needed for both learning ( $d i v e r g e n c e + τ$ , $λ$ , $θ$ ) and not learning parameters. The flow is computed after 3 residual blocks. The results are shown in Table 3. We find that learning provides better performance with fewer iterations (similar to the finding in [5]), and that iteratively computing the feature is important. We use 10 or 20 iterations in the remaining experiments as they provide good performance and are fast.
需要多少次迭代为了证明迭代是重要，我们进行了不同数量的迭代操作。我们了比较学习( $d i v e r g e n c e + τ$ , $λ$ , $θ$ )和非学习参数所需要的迭代次数。流是在第三个残差模块后计算。结果展示在表 3中。我们发现采用更少的迭代的学习可以提高更好的性能（同[5]中的发现），也说明迭代计算特征是重要的。我们采用10或者20次迭代在其余实验中，因为他们提供了好的性能和速度。

Two-stream fusion? 双流融合？

Two-stream fusion? Two-stream CNNs fusing both RGB and optical flow features has been heavily studied [20, 7]. Based on these works, we compare various ways of fusing RGB and our flow representation, shown in Fig. 4. We compare no fusion, late fusion (i.e., separate RGB and flow CNNs) and addition/multiplication/concatenation fusion. In Table 4, we compare different fusion methods for different locations in the network. We find that fusing RGB information is very important “when computing flow directly from RGB input”. However, it is not as beneficial when computing the flow of representations as the CNN has already abstracted much appearance information away. We found that concatenation of the RGB and flow features perform poorly compared to the others. We do not use two-stream fusion in any other experiments, as we found that computing the representation flow after the 3rd residual block provides sufficient performance even without any fusion.
双流融合？将RGB和光流特征融合的双流CNNs已经进行了大量的研究。在这些工作的基础上，我们比较了各种RGB融合和我们的流特征，结果在图 4中展示。我们比较了无融合、晚期融合（比如，分开的RGB和流CNNs）以及加法/乘法/连接融合。在表 4中，我们比较不同位置的不同融合方法。我们发现 “直接从RGB输入计算流” 时融合RGB信息非常重要。然而，这在计算特征流不那么有用，因为CNN已经获取了很多外观信息。我们发现连接RGB和流特征的性能比其他方法要差。我们不使用双流融合在其他实验中，因为我们发现及时没有任何融合在第三个残差模块后计算流特征性能也不错。

Flow-of-flow Flow-of-flow

Flow-of-flow We can stack our layer multiple times, computing the flow-of-flow (FoF). This has the advantage of combining more temporal information into a single feature. Our results are shown in Table 5. Applying the TV-L1 algorithm twice gives quite poor performance, as optical flow features do not really satisfy the brightness consistency assumption, as they capture magnitude and direction of motion (shown in Fig. 5). Applying our representation flow layer twice performs significantly better than TV-L1 twice, but still worse than our baseline of not doing so. However, we can add a convolutional layer between the first and second flow layer, flow-conv-flow (FcF), (Fig. 6), allowing the model to better learn longer-term flow representations. We find this performs best, as this intermediate layer is able to smooth the flow and produce a better input for the representation flow layer. However, we find adding a third flow layer reduces performance as the motion representation becomes unreliable, due to the large spatial receptive field size. In Fig. 7, we visualize the learned flow-of-flow, which is a smoother, acceleration-like feature with abstract motion patterns.
Flow-of-flow 我们可以多次叠加我们的层，计算flow-of-flow(FoF)。这样做的好处是将更多的时间信息结合到单个特征中。我们的结果在表 5中展示。两次使用TV-L1的表现很差，因为光流特征捕获运动的大小和方向时难以满足亮度一致性假设（如图 5所示）。两次使用我们的特征流层性能比TV-L1好，但是仍然比不使用的基准性能差。然后，我们在第一个和第二个流层之间增加一个卷积层，flow-conv-flow (FcF)（图 6），允许模型更好的学习长时间范围的六特征。我们发现这个表现最好，因为这个中间层可以平滑两个流层且为特征流层产生更好的输入。然而，我们发现添加第三个流层降低了性能，因为空间感受野较大运动特征开始变得不可靠。在图 7中，我们可视化更平滑、类似加速特征的flow-of-flow。

Flow of 3D CNN Feature 3D CNN的流

Flow of 3D CNN Feature Since 3D convolutions capture some temporal information, we test computing our flow representation on features from a 3D CNN. As 3D CNNs are expensive to train, we follow the method of I3D [3] to inflate a ResNet-18 pretrained on ImageNet to a 3D CNN for videos. We also compare to the (2+1)D method of spatial conv. followed by temporal conv from [26], which produces a similar feature combining spatial and temporal information. We find our flow layer increases performance even with 3D and (2+1)D CNNs already capturing some temporal information: Tables 6 and 7. These experiments used 10 iterations and learning the flow parameters. In these experiments, FcF was not used.
由于3D卷积不活了一些时间信息，我们在3D CNN上测试计算流特征图。因为3D CNNs难以训练，我们采用I3D方法，将在ResNet-18上预训练的图像扩展到用于视频的3D CNN。我们也比较了空间卷积的(2+1)D方法，然后从[26]中提取时间序列，得到了结合时空信息的相似特征。我们发现我们的流层提高了性能，即使3D和（2+1）D CNNs已经获得了一些时间信息：表 6和7。这些实验使用了10次迭代来学习流参数，FcF未被使用。
We also compared to the OFF [21] using (2+1)D and 3D CNNs. We observe that this method does not result in meaningful performance increases using CNNs that capture temporal information, while our approach does.
我们还使用(2+1)D和3D cnn与OFF[21]进行了比较。观察发现使用CNNs捕获时间信息并没有使得性能得到有效提升，但我们的方法可以。

Comparison to other motion representations 与其他运动特征进行比较

Comparison to other motion representations We compare to existing CNN-based motion representation methods to confirm the usefulness of our representation flow. For these experiments, when available, we used code provided by the authors and otherwise implemented the methods ourselves. To better compare to existing works, we used $(16 \times) 224 \times 224$ images. Table 8 shows the results. MFNet [15] captures motion by spatially shifting CNN feature maps, then summing the results, TVNet [5] applies a convolutional optical flow method to RGB inputs, and ActionFlowNet [16] trains a CNN to jointly predict optical flow and activity classes. We also compare to OFF [21] using only RGB inputs. Note that the HMDB performance in [21] was reported using their three-stream model (i.e., RGB + RGB-diff + optical flow inputs), and here we compare to the version only using RGB. Our method, which applies the iterative flow computation on CNN feature maps, performs the best.
与其他运动特征进行比较我们与现存的以CNN为基础的运动特征方法比较，证明了我们的特征流是有用的。对于这些实验，我们使用了作者提供的代码或者用自己的方法实现。为了更好的与现有的工作比较，我们使用 $(16 \times) 224 \times 224$ 尺寸的图像。表 8展示了结果。MFNet [15]通过空间移动的CNN特征图捕获运动，然后将结果相加；TVNet [5]对RGB输入使用卷积光流方法；ActionFlowNet [16]训练一个CNN联合预测光流和行为分类。我们也与仅RGB为输入的OFF进行比较。注意到[21]在HMDB上的性能是在使用三流模型得到的（即，RGB+RGB差异+光流输入），这里我们与仅使用RGB输入比较。我们的方法在CNN feature map上应用迭代流计算，效果最好。

Computation time 计算时间

Computation time We compare our representation flow to state-of-the-art two-stream approaches in terms of run-time and number of parameters. All timings were measured using a single Pascal Titan X GPU, for a batch of videos with size $32 \times 224 \times 224$ . The flow/two-stream CNNs include the time to run the TV-L1 algorithm (OpenCV GPU version) to compute the optical flow. All CNNs were based on the ResNet-34 architecture. As also shown in Table 9, our method is significantly faster than two-stream models relying on TV-L1 or other optical flow methods, while performing similarly or better. The number of parameters our model has is half of its two-stream competitors (e.g., 21M vs. 42M, in the case of 2D CNNs).
计算时间我们将我们的特征流方法与先进的双流方法就运行时间和参数数量进行比较。所有的时间是在使用单个Pascal Titan X GPU处理批处理大小为 $32 \times 224 \times 224$ 的视频上计算的。流/双流CNNs包含运行TV-L1（OpenCV GPU 版本）计算光流的时间。所有CNNs是以ResNet-34为基础架构。如表 9所示，我们的方法比依赖TV-L1的双流模型或者其他光流模型更快，而且性能相似甚至更好。我们模型的参数是双流模型的一半（例如，在2D CNNs中，是2100百万与4200百万）。

Comparison to state-of-the-arts 与先进算法进行比较

Comparison to state-of-the-arts We also compared our action recognition accuracies with the state-of-the-arts on Kinetics and HMDB. For this, we train our models using $32 \times 224 \times 224$ inputs with the full kinetics dataset, using 8 V100s. We used the 2D ResNet-50 as the architecture. Based on our experiments, we applied our representation flow layer after the 3rd residual block, learned the hyperparameters and divergence kernels, and used 20 iterations. We also compare our flow-of-flow model. Following [22], the evaluation is performed using a running average of the parameters over time. Our results, shown in Table 9, confirm that this approach clearly outperforms existing models using RGB only inputs, and is competitive against expensive two-stream networks. Our model performs the best among those not using optical flow inputs (i.e., among the models only taking ∼600ms per video). The models requiring optical flow were more than 10 times slower, including two-stream versions of [3, 25, 26]
与先进算法进行比较我们将我们的算法与其他先进的算法在Kinetics和HMDB数据集上进行了比较。为此，我们使用尺寸为 $32 \times 224 \times 224$ 作为输入在全Kinetics数据集上来训练我们的模型。我们使用2D ResNet-50作为架构。在我们实验的基础上，我们将特征流层应用在第三个残差模块之后，使用20个迭代学习超参数和散度核。我们同样与flow-of-flow模型比较。在[22]之后，使用参数随时间的运行均值来评估。我们的结果在表 9 中展示，证实了这个方法的性能比现存的仅使用RGB作为输入的模型好，而且比计算开销大的双流网络更有竞争力。我们的模型在不使用光流输入中性能最好（在这些模型中仅使用600ms的视频）需要光流的模型慢10倍以上，包括双流模型[3, 25, 26]

5 Conclusion 结论

We introduced a learnable representation flow layer in- spired by optical flow algorithms. We experimentally compared various forms of our layer to confirm that the iterative optimization and learnable parameters are important. Our model clearly outperformed existing methods in both speed and accuracy on standard datasets. We also introduced the concept of ‘flow of flow’ to compute longer-term motion representations and showed it benefits performance.
我们介绍了一种基于光流法启发的可学习特征流层。我们通过实验比较了我们层的各种形式，证明了迭代优化和可学习参数是重要的。我们的模型在标准数据集上的速度和精度的表现明显好于现存方法。我们也介绍了‘flow of flow’的概念去计算长时间运动特征，并展示了它的优秀性能。

"
一些方法与代码还未理解、研究，等待后续补充。
"

你可能感兴趣的:(CV,行为识别,计算机视觉,深度学习,pytorch,CVPR)

PyTorch 中结合迁移学习和强化学习的完整实现方案小赖同学啊人工智能 pytorch 迁移学习人工智能
结合迁移学习（TransferLearning）和强化学习（ReinforcementLearning,RL）是解决复杂任务的有效方法。迁移学习可以利用预训练模型的知识加速训练，而强化学习则通过与环境的交互优化策略。以下是如何在PyTorch中结合迁移学习和强化学习的完整实现方案。1.场景描述假设我们有一个任务：训练一个机器人手臂抓取物体。我们可以利用迁移学习从一个预训练的视觉模型（如ResNet
pytorch 模型测试小赖同学啊人工智能 pytorch 人工智能 python
在使用PyTorch进行模型测试时，一般包含加载测试数据、加载训练好的模型、进行推理以及评估模型性能等步骤。以下为你详细介绍每个步骤及对应的代码示例。1.导入必要的库importtorchimporttorch.nnasnnimporttorchvisionimporttorchvision.transformsastransforms2.加载测试数据假设我们使用的是CIFAR-10数据集作为示例
一文讲清楚自我学习和深度学习平凡而伟大(心之所向) 人工智能人工智能深度学习机器学习
自我学习（Self-Learning）和深度学习（DeepLearning）是两个不同的概念，但它们在某些应用场景中可以有交集。下面我们将分别介绍这两个概念，并探讨如何将它们结合起来用于自我学习系统。自我学习（Self-Learning）自我学习是指个体或系统通过自主探索、实践和反思来获取知识和技能的过程。它强调的是无需外部直接指导的学习方式，通常包括以下几个方面：自主性：学习者根据自己的兴趣、需
Pytorch实现之基于相对平均生成对抗网络的人脸图像超分辨率这张生成的图像能检测吗优质GAN模型训练自己的数据集生成对抗网络人工智能神经网络计算机视觉深度学习 python pytorch
简介简介：改进SRGAN，并使用相对平均生成对抗网络的人脸图像超分辨率训练自己的数据集论文题目：FaceImageSuper-resolutionBasedOnRelativeAverageGenerativeAdversarialNetworks（基于相对平均生成对抗网络的人脸图像超分辨率）会议：20212ndAsiaSymposiumonSignalProcessing(ASSP)摘要：人脸图
Ubuntu20.04下VSCode配置PCL和OpenCV库-C++ Pertance vscode opencv c++
Ubuntu20.04+VSCode+Cpp+PCL+OpenCV准备工作代码编辑：VSCode开发语言：C++编译工具：CmakeG++依赖需求：PCL/OpenCV安装PCL库sudoaptinstalllibpcl-dev配置OpenCV库安装依赖sudoapt-getinstallbuild-essentialsudoapt-getinstallcmakegitlibgtk2.0-devp
libcurl编译是出现的error LNK2019: 无法解析的外部符号 __imp__IdnToAscii@20解决方法李洛克07 技术攻关
网上介绍的都是假的，静态库不能编译过。本人在xp32，win732，win764下面均试过，输出同样的结果。只有编译动态库，引用动态库才能成功。一共两种情况，第一，如果是使用nmake/fmakefile.vcmode=staticVC=10，此处为release版本，如果添加DEBUG=yes为debug版本。则在新建工程，使用libcurl库时时出现：1>libcurld.lib(idn_wi
Self-Attentive Sequential Recommendation论文阅读笔记调包调参侠推荐系统学习深度学习机器学习神经网络算法
SASRec论文阅读笔记论文标题：Self-AttentiveSequentialRecommendation发表于：2018ICDM作者：Wang-ChengKang,JulianMcAuley论文代码：https://github.com/pmixer/SASRec.pytorch论文地址：https://arxiv.org/pdf/1808.09781v1.pdf摘要顺序动态是许多现代推荐系
NCU使用指南及模型性能测试（pytorch2.5.1） Jakari cuda gpu ncu python docker 深度学习 pytorch
本项目在原项目的基础上增加了NsightCompute(ncu)测试的功能，并对相关脚本功能做了一些健硕性的增强，同时，对一些框架的代码进行了更改（主要是数据集的大小和epoch等），增加模型性能测试的效率，同时完善了模型LSTM的有关功能。OverviewNsightCompute(NCU)是NVIDIA提供的GPU内核级性能分析工具，专注于CUDA程序的优化。它提供详细的计算资源、内存带宽、指
深度学习数据集封装-----目标检测篇科研小天才深度学习目标检测人工智能
前言在上篇文章中，我们深入探讨了图像分类数据集的制作流程。图像分类作为计算机视觉领域的一个基础任务，通常被认为是最为简单直接的子任务之一。然而，当我们转向目标检测任务时，复杂度便显著提升，尤其是在标注框的处理环节。不同的模型架构往往对标注框的处理方式有着各自独特的要求。以YOLO系列为例，它自有一套成熟且高效的方法来应对这一挑战。鉴于篇幅有限，本文暂不深入展开YOLO的相关内容，感兴趣的读者可以查
语义向量模型全解：从基础到现在的deepseek中的语义向量主流模型来自于狂人人工智能语言模型
一、语义向量模型：自然语言处理的基石语义向量模型（SemanticVectorModel）是自然语言处理（NLP）的核心技术，它将词汇、句子或文档映射为高维向量，在数学空间中量化语义信息。通过向量距离（如余弦相似度）衡量语义的相似性，支撑了搜索引擎、情感分析、机器翻译等实际应用。1.1发展简史1980s~2000s：基于统计的浅层模型，如TF-IDF（直接表征词的重要性）、LSA（通过矩阵分解降维
深入理解 Transformer：用途、原理和示例范吉民(DY Young) 简单AI学习 transformer 深度学习人工智能
深入理解Transformer：用途、原理和示例一、Transformer是什么Transformer是一种基于注意力机制（AttentionMechanism）的深度学习架构，在2017年的论文“AttentionIsAllYouNeed”中惊艳登场。它打破了传统循环神经网络（RNN）按顺序处理序列、难以并行计算以及卷积神经网络（CNN）在捕捉长距离依赖关系上的局限，另辟蹊径地采用多头注意力机制
深度学习算法模型：从原理到未来 YDH_AlwaysRunning 深度学习
近年来，人工智能（AI）技术以前所未有的速度改变着人类生活，而深度学习的崛起无疑是这场技术革命的核心驱动力。从手机中的语音助手到医学影像的智能诊断，从自动驾驶汽车到生成式AI创作的诗歌和画作，深度学习算法模型正逐渐渗透到社会的每个角落。本文将从基本原理出发，解析典型模型的运作机制，探讨其应用现状与发展趋势，带您全面认识这一改变世界的技术。一、深度学习的基本原理：让机器学会"思考"1.1神经网络的生
Java基础语法练习37（枚举、注解（@Override、@Deprecated和@SuppressWarnings））橙序研工坊小白Java的成长 java 开发语言
一、枚举在Java中，枚举（enum）是一种特殊的数据类型，它允许开发者定义一组命名的常量集合先看正常示例，如下代码：publicclassEnum01{publicstaticvoidmain(String[]args){System.out.println(Season.SPRING.getName());}}classSeason{privateStringname;privateStrin
Java——通配符以及上下限六七_Shmily java java windows 开发语言
在Java泛型中，通配符?用于表示未知类型，通常用于增强泛型的灵活性。通配符可以与上限和下限结合使用，进一步限制类型的范围。以下是通配符及其上下限的详细介绍：一、通配符?的基本用法通配符?表示未知类型，可以用于泛型类、泛型接口和泛型方法的参数类型。示例：使用通配符//定义一个方法，接受任意类型的ListpublicstaticvoidprintList(Listlist){for(Objectel
预训练模型微调与下游任务迁移学习技术 AGI大模型与大数据研究院计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
1.背景介绍机器学习技术近年来在计算机视觉、自然语言处理等领域取得了飞速发展,这离不开大规模预训练模型的贡献。预训练模型通过在海量数据上的自监督学习,学习到了丰富的特征表示,为下游任务提供了强大的初始化。而对预训练模型进行有效的微调,可以充分利用预训练知识,在有限数据上快速达到出色的性能。此外,迁移学习技术也为模型在不同任务间的知识复用提供了有效途径。本文将详细介绍预训练模型微调与下游任务迁移学习
Java基础语法练习33（单例模式：饿汉式和懒汉式）橙序研工坊小白Java的成长 java 开发语言
单例模式：1.饿汉式2.懒汉式饿汉式：步骤如下：1.构造器私有化（防止new）2.类的内部创建对象3.向外暴露一个静态的公共方法个人理解：饿（等不及）就是已经创建好了对象等你来用示例代码如下：publicclasseTon{publicstaticvoidmain(String[]args){//第四步通过方法可以获取对象girlFiendinstance1=girlFiend.getInstan
【安装Tvikit包的时候提示 OpenCV 的旧宏 CV_WINDOW_AUTOSIZE不适配opencv3+】是阿牛啊计算机视觉深度学习 C++编程设计 opencv 人工智能计算机视觉
@安装Tvikit包的时候提示OpenCV的旧宏CV_WINDOW_AUTOSIZE不适配opencv3+1.查找所有OpenCV旧宏使用grep工具全局搜索CV_开头的宏，确保没有遗漏：bashgrep-r“CV_”~/lidar_odometry/src/fast_livo_ws/src/rpg_vikit/vikit_common检查输出结果，找到所有使用旧宏的地方。2.替换旧宏为新宏根据搜
大模型推理速度测评的实战代码 herosunly 大模型推理速度人工智能实战代码
大家好，我是herosunly。985院校硕士毕业，现担任算法研究员一职，热衷于机器学习算法研究与应用。曾获得阿里云天池比赛第一名，CCF比赛第二名，科大讯飞比赛第三名。拥有多项发明专利。对机器学习和深度学习拥有自己独到的见解。曾经辅导过若干个非计算机专业的学生进入到算法行业就业。希望和大家一起成长进步。今天给大家带来的文章是大模型推理速度测评的实战代码，希望能对学习大模型的同学们有所帮助
排序算法系列10-基数排序 dulang2015 数据结构与算法排序算法数据结构
基数排序简介实现(java)复杂度和稳定性1.基数排序简介非比较排序,从个位开始,分配,收集,逐位进行计数排序,桶排序的一种实现2.实现(java)publicclassRadixSort{publicstaticvoidmain(String[]args){int[]arr={40,35,5,63,21,82,96,77,52,19};System.out.println("原数组:"+Arra
【JAVA面试题】设计模式之原型模式今天你慧了码码码码码码码码码码 JAVA面试题原型模式 java 设计模式
【JAVA面试题】设计模式之原型模式一、原型模式核心概念原型模式（PrototypePattern）是一种创建型设计模式，通过复制现有对象（原型）来创建新对象，避免反复初始化带来的性能损耗。适用于需要高频创建相似对象的场景。二、原型模式实战案例：敌机生成优化1.原始方案性能缺陷publicclassClient{publicstaticvoidmain(String[]args){Listenem
蓝桥杯 2022 Java 研究生省赛 3 题质因数个数菜鸟0088 蓝桥杯 java 职场和发展
importjava.util.Scanner;//1:无需package//2:类名必须Main,不可修改publicclassMain{publicstaticvoidmain(String[]args){Scannerscan=newScanner(System.in);//唯一分离定律任何一个数都可以被分解为两个质数相乘的形式//所以找质因数当一个数能longn=scan.nextLong
微调（Fine-tuning）路野yue 人工智能深度学习
微调（Fine-tuning）是自然语言处理（NLP）和深度学习中的一种常见技术，用于将预训练模型（Pre-trainedModel）适配到特定任务上。它的核心思想是：在预训练模型的基础上，通过少量任务相关的数据进一步训练模型，使其更好地适应目标任务。1.微调的核心思想预训练模型：像BERT、GPT这样的模型，已经在大量通用文本数据上进行了预训练，学习到了丰富的语言知识（如语法、语义、上下文关系等
【创建型设计模式】原型设计模式可有道？ c#设计模式设计模式开发语言 c#java 原型模式
引言我们在学习每一个设计模式之前，我们就应该带着问题去学习，这样才会找到我们想要的答案，而且让我们理解的更透彻，记忆的更深刻，比如：解决了一个什么问题，使用场景等。而最好的方法就是通过例子去敲代码去感受，去理解。案列供应商的信息类:供应商名称，供应商经营范围，供应商编号。现在假设我们需要复制三份供应商对象初始代码staticvoidMain(string[]args){Providerprovid
软件设计和软件架构之间的区别前网易架构师-高司机软件架构软件设计系统架构
作者简介：高科，先后在IBMPlatformComputing从事网格计算，淘米网，网易从事游戏服务器开发，拥有丰富的C++，go等语言开发经验，mysql，mongo，redis等数据库，设计模式和网络库开发经验，对战棋类，回合制，moba类页游，手游有丰富的架构设计和开发经验。并且深耕深度学习和数据集训练，提供商业化的视觉人工智能检测和预警系统（煤矿，工厂，制造业，消防等领域的工业化产品），合
OpenCV计算摄影学（14）实现对比度保留去色（Contrast Preserving Decolorization）的函数decolor() 村北头的码农 OpenCV opencv 人工智能计算机视觉
操作系统：ubuntu22.04OpenCV版本：OpenCV4.9IDE:VisualStudioCode编程语言：C++11算法描述将彩色图像转换为灰度图像。它是数字印刷、风格化的黑白照片渲染，以及许多单通道图像处理应用中的基本工具。cv::decolor是OpenCV中用于实现对比度保留去色（ContrastPreservingDecolorization）的一个函数。此函数可以将输入的彩色
计算机视觉｜ConvNeXt：CNN 的复兴，Transformer 的新对手紫雾凌寒 AI 炼金厂 #计算机视觉 #深度学习机器学习计算机视觉人工智能 transformer ConvNeXt 动态网络神经网络
一、引言在计算机视觉领域，卷积神经网络（ConvolutionalNeuralNetworks，简称CNN）长期以来一直是核心技术，自诞生以来，它在图像分类、目标检测、语义分割等诸多任务中都取得了令人瞩目的成果。然而，随着VisionTransformer（ViT）的出现，计算机视觉领域的格局发生了重大变化。ViT通过自注意力机制，打破了传统卷积神经网络的局部感知局限，能够捕捉长距离依赖关系，在图
基于PyTorch的深度学习2——Numpy与Tensor Wis4e 深度学习 pytorch numpy
Tensor自称为神经网络界的Numpy，它与Numpy相似，二者可以共享内存，且之间的转换非常方便和高效。不过它们也有不同之处，最大的区别就是Numpy会把ndarray放在CPU中进行加速运算，而由Torch产生的Tensor会放在GPU中进行加速运算。1.创建创建Tensor的方法有很多，可以从列表或ndarray等类型进行构建，也可根据指定的形状构建。importtorch#根据list数
图像识别技术与应用课后总结（12）一元钱面包人工智能
全局平均池化（GlobalAveragePooling）1.导入库和设备配置importtorch.nnasnnimporttorch.nn.functionalasFdevice=torch.device("cuda:0"iftorch.cuda.is_available()else"cpu")-importtorch.nnasnn：导入PyTorch的神经网络模块，用于构建神经网络层。-imp
【北上广深杭大厂AI算法面试题】深度学习篇...Squeeze Excitation（SE）网络结构详解，附代码。（二）努力毕业的小土博^_^ AI算法题库人工智能算法深度学习神经网络 cnn
【北上广深杭大厂AI算法面试题】深度学习篇…SqueezeExcitation（SE）网络结构详解，附代码。（二）【北上广深杭大厂AI算法面试题】深度学习篇…SqueezeExcitation（SE）网络结构详解，附代码。（二）文章目录【北上广深杭大厂AI算法面试题】深度学习篇...SqueezeExcitation（SE）网络结构详解，附代码。（二）SqueezeExcitation（SE）网络
opencv进行视频读取和调用摄像头以及对采集到的图像进行闭环处理_libtorch和opencv读取一段视频并进行处理 2401_86984695 opencv 音视频人工智能
控制帧数waitKey(delay);delay是控制图像显示的帧数摄像头在开始的时候一直是处于调用状态就是不显示图像，这个可能是系统问题，经过查阅资料表明是dll文件没有粘贴到system文件中；所对应的程序为：#include#includeusingnamespacecv;intmain(){VideoCapturecapture(0);(0);while(1){Matframe;captu
集合框架天子之骄 java 数据结构集合框架
集合框架集合框架可以理解为一个容器，该容器主要指映射(map)、集合(set)、数组(array)和列表(list)等抽象数据结构。从本质上来说，Java集合框架的主要组成是用来操作对象的接口。不同接口描述不同的数据类型。简单介绍： Collection接口是最基本的接口，它定义了List和Set，List又定义了LinkLi
Table Driven（表驱动）方法实例 bijian1013 java enum Table Driven 表驱动
实例一： /** * 驾驶人年龄段 * 保险行业，会对驾驶人的年龄做年龄段的区分判断 * 驾驶人年龄段：01-[18,25);02-[25,30);03-[30-35);04-[35,40);05-[40,45);06-[45,50);07-[50-55);08-[55,+∞) */ public class AgePeriodTest { //if...el
Jquery 总结 cuishikuan java jquery Ajax Web jquery方法
1.$.trim方法用于移除字符串头部和尾部多余的空格。如：$.trim(' Hello ') // Hello2.$.contains方法返回一个布尔值，表示某个DOM元素（第二个参数）是否为另一个DOM元素（第一个参数）的下级元素。如：$.contains(document.documentElement, document.body); 3.$
面向对象概念的提出麦田的设计者 java 面向对象面向过程
面向对象中，一切都是由对象展开的，组织代码，封装数据。在台湾面向对象被翻译为了面向物件编程，这充分说明了，这种编程强调实体。下面就结合编程语言的发展史，聊一聊面向过程和面向对象。 c语言由贝尔实
linux网口绑定被触发 linux
刚在一台IBM Xserver服务器上装了RedHat Linux Enterprise AS 4，为了提高网络的可靠性配置双网卡绑定。一、环境描述我的RedHat Linux Enterprise AS 4安装双口的Intel千兆网卡，通过ifconfig -a命令看到eth0和eth1两张网卡。二、双网卡绑定步骤： 2.1 修改/etc/sysconfig/network
XML基础语法肆无忌惮_ xml
一、什么是XML？ XML全称是Extensible Markup Language，可扩展标记语言。很类似HTML。XML的目的是传输数据而非显示数据。XML的标签没有被预定义，你需要自行定义标签。XML被设计为具有自我描述性。是W3C的推荐标准。二、为什么学习XML？用来解决程序间数据传输的格式问题做配置文件充当小型数据库三、XML与HTM
为网页添加自己喜欢的字体知了ing 字体秒表 css
@font-face { font-family: miaobiao;//定义字体名字 font-style: normal; font-weight: 400; src: url('font/DS-DIGI-e.eot');//字体文件 } 使用： <label style="font-size:18px;font-famil
redis范围查询应用-查找IP所在城市矮蛋蛋 redis
原文地址： http://www.tuicool.com/articles/BrURbqV 需求根据IP找到对应的城市原来的解决方案 oracle表（ip_country）：查询IP对应的城市： 1.把a.b.c.d这样格式的IP转为一个数字，例如为把210.21.224.34转为3524648994 2. select city from ip_
输入两个整数，计算百分比 alleni123 java
public static String getPercent(int x, int total){ double result=(x*1.0)/(total*1.0); System.out.println(result); DecimalFormat df1=new DecimalFormat("0.0000%");
百合——————>怎么学习计算机语言百合不是茶 java 移动开发
对于一个从没有接触过计算机语言的人来说，一上来就学面向对象，就算是心里上面接受的了，灵魂我觉得也应该是跟不上的，学不好是很正常的现象，计算机语言老师讲的再多，你在课堂上面跟着老师听的再多，我觉得你应该还是学不会的，最主要的原因是你根本没有想过该怎么来学习计算机编程语言，记得大一的时候金山网络公司在湖大招聘我们学校一个才来大学几天的被金山网络录取，一个刚到大学的就能够去和
linux下tomcat开机自启动 bijian1013 tomcat
方法一：修改Tomcat/bin/startup.sh 为: export JAVA_HOME=/home/java1.6.0_27 export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar:. export PATH=$JAVA_HOME/bin:$PATH export CATALINA_H
spring aop实例 bijian1013 java spring AOP
1.AdviceMethods.java package com.bijian.study.spring.aop.schema; public class AdviceMethods { public void preGreeting() { System.out.println("--how are you!--"); } } 2.beans.x
[Gson八]GsonBuilder序列化和反序列化选项enableComplexMapKeySerialization bit1129 serialization
enableComplexMapKeySerialization配置项的含义 Gson在序列化Map时，默认情况下，是调用Key的toString方法得到它的JSON字符串的Key，对于简单类型和字符串类型，这没有问题，但是对于复杂数据对象，如果对象没有覆写toString方法，那么默认的toString方法将得到这个对象的Hash地址。 GsonBuilder用于
【Spark九十一】Spark Streaming整合Kafka一些值得关注的问题 bit1129 Stream
包括Spark Streaming在内的实时计算数据可靠性指的是三种级别： 1. At most once，数据最多只能接受一次，有可能接收不到 2. At least once, 数据至少接受一次，有可能重复接收 3. Exactly once 数据保证被处理并且只被处理一次，具体的多读几遍http://spark.apache.org/docs/lates
shell脚本批量检测端口是否被占用脚本 ronin47
#!/bin/bash cat ports |while read line do#nc -z -w 10 $line nc -z -w 2 $line 58422>/dev/null2>&1if[ $?-eq 0]then echo $line:ok else echo $line:fail fi done 这里的ports 既可以是文件
java-2.设计包含min函数的栈 bylijinnan java
具体思路参见：http://zhedahht.blog.163.com/blog/static/25411174200712895228171/ import java.util.ArrayList; import java.util.List; public class MinStack { //maybe we can use origin array rathe
Netty源码学习-ChannelHandler bylijinnan java netty
一般来说，“有状态”的ChannelHandler不应该是“共享”的，“无状态”的ChannelHandler则可“共享” 例如ObjectEncoder是“共享”的, 但 ObjectDecoder 不是因为每一次调用decode方法时，可能数据未接收完全（incomplete），它与上一次decode时接收到的数据“累计”起来才有可能是完整的数据，是“有状态”的 p
java生成随机数 cngolon java
方法一： /** * 生成随机数 * @author [email protected] * @return */ public synchronized static String getChargeSequenceNum(String pre){ StringBuffer sequenceNum = new StringBuffer(); Date dateTime = new D
POI读写海量数据 ctrain 海量数据
import java.io.FileOutputStream; import java.io.OutputStream; import org.apache.poi.xssf.streaming.SXSSFRow; import org.apache.poi.xssf.streaming.SXSSFSheet; import org.apache.poi.xssf.streaming
mysql 日期格式化date_format详细使用 daizj mysql date_format 日期格式转换日期格式化
日期转换函数的详细使用说明 DATE_FORMAT(date,format) Formats the date value according to the format string. The following specifiers may be used in the format string. The&n
一个程序员分享8年的开发经验 dcj3sjt126com 程序员
在中国有很多人都认为IT行为是吃青春饭的，如果过了30岁就很难有机会再发展下去!其实现实并不是这样子的，在下从事.NET及JAVA方面的开发的也有8年的时间了，在这里在下想凭借自己的亲身经历，与大家一起探讨一下。明确入行的目的很多人干IT这一行都冲着“收入高”这一点的，因为只要学会一点HTML, DIV+CSS，要做一个页面开发人员并不是一件难事，而且做一个页面开发人员更容
android欢迎界面淡入淡出效果 dcj3sjt126com android
很多Android应用一开始都会有一个欢迎界面，淡入淡出效果也是用得非常多的，下面来实现一下。主要代码如下： package com.myaibang.activity; import android.app.Activity;import android.content.Intent;import android.os.Bundle;import android.os.CountDown
linux 复习笔记之常见压缩命令 eksliang tar解压 linux系统常见压缩命令 linux压缩命令 tar压缩
转载请出自出处:http://eksliang.iteye.com/blog/2109693 linux中常见压缩文件的拓展名 *.gz gzip程序压缩的文件 *.bz2 bzip程序压缩的文件 *.tar tar程序打包的数据，没有经过压缩 *.tar.gz tar程序打包后，并经过gzip程序压缩 *.tar.bz2 tar程序打包后，并经过bzip程序压缩 *.zi
Android 应用程序发送shell命令 gqdy365 android
项目中需要直接在APP中通过发送shell指令来控制lcd灯，其实按理说应该是方案公司在调好lcd灯驱动之后直接通过service送接口上来给APP，APP调用就可以控制了，这是正规流程，但我们项目的方案商用的mtk方案，方案公司又没人会改，只调好了驱动，让应用程序自己实现灯的控制，这不蛋疼嘛！！！！发就发吧！一、关于shell指令：我们知道，shell指令是Linux里面带的
java 无损读取文本文件 hw1287789687 读取文件无损读取读取文本文件 charset
java 如何无损读取文本文件呢？以下是有损的 @Deprecated public static String getFullContent(File file, String charset) { BufferedReader reader = null; if (!file.exists()) { System.out.println("getFull
Firebase 相关文章索引 justjavac firebase
Awesome Firebase 最近谷歌收购Firebase的新闻又将Firebase拉入了人们的视野，于是我做了这个 github 项目。 Firebase 是一个数据同步的云服务，不同于 Dropbox 的「文件」，Firebase 同步的是「数据」，服务对象是网站开发者，帮助他们开发具有「实时」（Real-Time）特性的应用。开发者只需引用一个 API 库文件就可以使用标准 RE
C++学习重点 lx.asymmetric C++笔记
1.c++面向对象的三个特性：封装性，继承性以及多态性。 2.标识符的命名规则：由字母和下划线开头，同时由字母、数字或下划线组成；不能与系统关键字重名。 3.c++语言常量包括整型常量、浮点型常量、布尔常量、字符型常量和字符串性常量。 4.运算符按其功能开以分为六类：算术运算符、位运算符、关系运算符、逻辑运算符、赋值运算符和条件运算符。 &n
java bean和xml相互转换 q821424508 java bean xml xml和bean转换 java bean和xml转换
这几天在做微信公众号做的过程中想找个java bean转xml的工具，找了几个用着不知道是配置不好还是怎么回事，都会有一些问题，然后脑子一热谢了一个javabean和xml的转换的工具里，自己用着还行，虽然有一些约束吧，还是贴出来记录一下顺便你提一下下，这个转换工具支持属性为集合、数组和非基本属性的对象。 packag
C 语言初级位运算 1140566087 位运算 c
第十章位运算 1、位运算对象只能是整形或字符型数据，在VC6.0中int型数据占4个字节 2、位运算符：运算符作用 ~ 按位求反 << 左移 >> 右移 & 按位与 ^ 按位异或 | 按位或他们的优先级从高到低； 3、位运算符的运算功能： a、按位取反： ~01001101 = 101
14点睛Spring4.1-脚本编程 wiselyman spring4
14.1 Scripting脚本编程脚本语言和java这类静态的语言的主要区别是:脚本语言无需编译,源码直接可运行; 如果我们经常需要修改的某些代码,每一次我们至少要进行编译,打包,重新部署的操作,步骤相当麻烦; 如果我们的应用不允许重启,这在现实的情况中也是很常见的; 在spring中使用脚本编程给上述的应用场景提供了解决方案,即动态加载bean; spring支持脚本