大数据机器学习实验室

【论文翻译】Mean Shift: A Robust Approach toward Feature Space Analysis

论文题目：Mean Shift: A Robust Approach toward Feature Space Analysis
论文来源: Mean Shift: A Robust Approach toward Feature Space Analysis
翻译人：BDML@CQUT实验室

Mean Shift: A Robust Approach toward Feature Space Analysis

均值偏移：面向特征空间分析的鲁棒性方法

$Comaniciu^1 \qquad Peter Meer^2\\$

$^1Siemens Corporate Research\\ 755 College Road East, Princeton, NJ 08540\\ [email protected]\\$

$^2Electrical and Computer Engineering Department\\ Rutgers University\\ 94 Brett Road, Piscataway, NJ 08854-8058\\ [email protected]\\$

Abstract

A general nonparametric technique is proposed for the analysis of a complex multimodal feature space and to delineate arbitrarily shaped clusters in it. The basic computational module of the technique is an old pattern recognition procedure, the mean shift. We prove for discrete data the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying density function and thus its utility in detecting the modes of the density. The equivalence of the mean shift procedure to the Nadaraya–Watson estimator from kernel regression and the robust M-estimators of location is also established. Algorithms for two low-level vision tasks, discontinuity preserving smoothing and image segmentation are described as applications. In these algorithms the only user set parameter is the resolution of the analysis, and either gray level or color images are accepted as input. Extensive experimental results illustrate their excellent performance.

Keywords: mean shift; clustering; image segmentation; image smoothing; feature space; low-level
vision

摘要

本文提出了一种通用的非参数技术，用于分析复杂的多峰特征空间并在其中描绘任意形状的聚类。该技术的基本计算模块是一个古老的模式识别程序，均值偏移。对于离散数据，我们证明了递归均值偏移过程收敛到基础密度函数的最近固定点，从而证明了其在检测密度模式中的效用。还建立了通过核回归和位置的鲁棒M估计到Nadaraya–Watson估计的均值偏移过程的等价性。描述了用于两个底层视觉任务的算法，即图像平滑和图像分割。在这些算法中，唯一的用户设置参数是分析的分辨率，并且可以接受灰度或彩色图像作为输入。大量的实验结果证明了它们的出色性能。

关键字：均值偏移；聚类；图像分割；图像平滑；特征空间；底层视觉

1 Introduction

Low-level computer vision tasks are misleadingly difficult. Incorrect results can be easily obtained since the employed techniques often rely upon the user correctly guessing the values for the tuning parameters. To improve performance the execution of low-level tasks should be task driven, i.e., supported by independent high level information. This approach, however, requires that first the low-level stage provides a reliable enough representation of the input, and that the feature extraction process is controlled only by very few tuning parameters corresponding to intuitive measures in the input domain.

Feature space based analysis of images is a paradigm which can achieve the above stated goals. A feature space is a mapping of the input obtained through the processing of the data in small subsets at a time. For each subset a parametric representation of the feature of interest is obtained and the result is mapped into a point in the multidimensional space of the parameter. After the entire input is processed, significant features correspond to denser regions in the feature space, i.e., to clusters, and the goal of the analysis is the delineation of these clusters.

The nature of the feature space is application dependent. The subsets employed in the mapping can range from individual pixels as in the color space representation of an image, to a set of quasi-randomly chosen data points as in the probabilistic Hough transform. Both the advantage and the disadvantage of the feature space paradigm are arising from the global nature of the derived representation of the input. On one hand, all the evidence for the presence of a significant feature is pooled together providing an excellent tolerance to a noise level which may render local decisions unreliable. On the other hand, features with lesser support in the feature space may not be detected in spite of being salient for the task to be executed. This disadvantage, however, can be largely avoided by either augmenting the feature space with additional (spatial) parameters from the input domain, or by robust post processing of the input domain guided by the results of the feature space analysis.

Analysis of the feature space is application independent. While there are a plethora of published clustering techniques, most of them are not adequate to analyze feature spaces derived from real data. Methods which rely upon a priori knowledge of the number of clusters present (including those which use optimization of a global criterion to find this number), as well as methods which implicitly assume the same shape (most often elliptical) for all the clusters in the space, are not able to handle the complexity of a real feature space. For a recent survey of such methods see [29, Sec.8].

In Figure 1 a typical example is shown. The color image in Figure 1a is mapped into the three-dimensional Luv color space (to be discussed in Section 4). There is a continuous transition between the clusters arising from the dominant colors, and a decomposition of the space into elliptical tiles will introduce severe artifacts. Enforcing a Gaussian mixture model over such data is doomed to fail, e.g., [49], and even the use of a robust approach with contaminated Gaussian densities [67] cannot be satisfactory for such complex cases. Note also that the mixture models require the number of clusters as a parameter which raises its own challenges. For example, the method described in [45] proposes several different ways to determine this number.

Arbitrarily structured feature spaces can be analyzed only by nonparametric methods since these methods do not have embedded assumptions. Numerous nonparametric clustering methods were described in the literature and they can be classified into two large classes: hierarchical clustering and density estimation. Hierarchical clustering techniques either aggregate or divide the data based on some proximity measure. See [28, Sec.3.2] for a survey of hierarchical clustering methods. The hierarchical methods tend to be computationally expensive and the definition of a meaningful stopping criterion for the fusion (or division) of the data is not straightforward.

The rationale behind the density estimation based nonparametric clustering approach is that the feature space can be regarded as the empirical probability density function (p.d.f.) of the represented parameter. Dense regions in the feature space thus correspond to local maxima of the p.d.f., that is, to the modes of the unknown density. Once the location of a mode is determined, the cluster associated with it is delineated based on the local structure of the feature space [25, 60, 63].

Our approach to mode detection and clustering is based on the mean shift procedure, proposed in 1975 by Fukunaga and Hostetler [21] and largely forgotten till Cheng’s paper [7] rekindled the interest in it. In spite of its excellent qualities, the mean shift procedure does not seem to be known in the statistical literature. While the book [54, Sec.6.2.2] discusses [21], the advantages of employing a mean shift type procedure in density estimation were only recently rediscovered [8].

As will be proven in the sequel a computational module based on the mean shift procedure is an extremely versatile tool for feature space analysis and can provide reliable solutions for many vision tasks. In Section 2 the mean shift procedure is defined and its properties are analyzed. In Section 3 the procedure is used as the computational module for robust feature space analysis and implementational issues are discussed. In Section 4 the feature space analysis technique is applied to two low level vision tasks: discontinuity preserving filtering and image segmentation. Both algorithms can have as input either gray level or color images and the only parameter to be tuned by the user is the resolution of the analysis. The applicability of the mean shift procedure is not restricted to the presented examples. In Section 5 other applications are mentioned and the procedure is put into a more general context.

1 引言

底层的计算机视觉任务具有误导性。由于所采用的技术通常依赖于用户正确猜测调整参数的值，因此很容易获得不正确的结果。为了提高性能，底层任务的执行应该由任务驱动，即由独立的高级信息支持。但是，这种方法要求首先底层阶段提供输入的足够可靠的表示，并且特征提取过程仅由很少的与输入域中的直观度量相对应的调整参数来控制。

基于特征空间的图像分析是可以实现上述目标的范例。特征空间是一次通过处理小子集中的数据获得的输入的映射。对于每个子集，获取感兴趣特征的参数表示，并将结果映射到参数多维空间中的一个点。在处理完所有输入之后，重要特征对应于特征空间中较密集的区域，即对应于聚类，而分析的目的是描绘这些聚类。

特征空间的性质取决于应用程序。映射中采用的子集的范围可以从图像颜色空间表示中的单个像素到概率霍夫变换中的一组准随机选择的数据点。特征空间范式的优势和劣势都源于输入的派生表示的全局性质。一方面，将存在重要特征的所有证据汇总在一起，以提供对噪声水平的出色容忍度，这可能会使本地决策变得不可靠。另一方面，尽管对要执行的任务很重要，但可能无法检测到在特征空间中支持较少的特征。但是，可以通过使用来自输入域的其他（空间）参数扩展特征空间，或通过特征空间分析的结果对输入域进行可靠的后处理，来很大程度上避免此缺点。

特征空间的分析与应用程序无关。尽管存在大量已发布的聚类技术，但大多数技术不足以分析从真实数据得出的特征空间。依赖于对存在的聚类数的先验知识的方法（包括那些使用全局准则优化来找到此数目的方法），以及隐含地假设空间中所有聚类的形状相同（通常为椭圆形）的方法不能处理真实特征空间的复杂性。有关此类方法的最新调查，请参见[29，第8节]。

图1：一个特征空间的例子。（a）是一个400x276色彩的图像。（b）是相应的L*u*v色彩空间110，400个数据点

图1中显示了一个典型示例。图1a中的彩色图像被映射到三维L*u*v彩色空间（将在第4节中讨论）。由主要颜色引起的聚类之间存在连续过渡，并且空间分解为椭圆形瓷砖会引入严重的伪影。对此类数据执行高斯混合模型注定会失败，例如，[49]，甚至对于这种复杂情况，即使使用具有受污染的高斯密度的稳健方法[67]也无法令人满意。还要注意，混合模型需要将聚类数作为参数，这会带来挑战。例如，[45]中描述的方法提出了几种不同的方法来确定该数字。

任意结构化的特征空间只能通过非参数方法进行分析，因为这些方法没有嵌入的假设。文献中描述了许多非参数聚类方法，它们可分为两大类：层次聚类和密度估计。层次聚类技术基于某种接近度度量来聚合或划分数据。有关分层聚类方法的概述，请参见[28，第3.2节]。分层方法在计算上趋于昂贵，并且对于数据的融合（或划分）有意义的停止标准的定义并不简单。

基于密度估计的非参数聚类方法的基本原理是可以将特征空间视为所表示参数的经验概率密度函数（p.d.f.）。因此，特征空间中的密集区域对应于p.d.f.的局部最大值，即对应于未知密度的众数。一旦确定了模式的位置，便会基于特征空间的局部结构[25、60、63]来确定与其关联的群集。

我们的模式检测和聚类方法是基于均值偏移过程，1975年，Fukunaga和Hostetler[21]提出了这个想法，但大部分已经被遗忘，直到Cheng的论文[7]重新点燃了人们对它的兴趣。尽管具有出色的质量，但该均值偏移过程在统计文献中似乎并不为人所知。本书[54，Sec.6.2.2]讨论了[21]时，直到最近才重新发现在密度估计中采用均值偏移类型过程的优势[8]。

正如后续将证明的那样，基于均值偏移过程的计算模块是用于特征空间分析的极其通用的工具，可以为许多视觉任务提供可靠的解决方案。在第2节中，定义了均值偏移程序并分析了其性质。在第3节中，该过程用作鲁棒特征空间分析的计算模块，并讨论了实现问题。在第4节中，特征空间分析技术应用于两个底层视觉任务：不连续性保留过滤和图像分割。两种算法都可以输入灰度图像或彩色图像作为输入，并且用户要调整的唯一参数是分析的分辨率。均值偏移过程的适用性不限于所提供的示例。在第5节中，提到了其他应用程序，并且将该过程放到了更一般的上下文中。

2 The Mean Shift Procedure

Kernel density estimation (known as the Parzen window technique in the pattern recognition literature [17, Sec.4.3]) is the most popular density estimation method. Given $n$ data points $X_i$ , $i = 1, . . ., n$ in the $d$ -dimensional space $R^d$ , the multivariate kernel density estimator with kernel $K (X)$ and a symmetric positive definite $d\times d$ bandwidth matrix $H$ , computed in the point $X$ is given by
$\hat{f}(X)=\frac{1}{n}\sum^n_{i=1}K_H(X-X_i)\tag{1}$
where
$K_H(X)=|H|^{-1/2}K(H^{-1/2}X)\tag{2}$
The $d$ -variate kernel $K (X)$ , is a bounded function with compact support satisfying [62, p.95]
$\int_{R^d}K(X)dX=1\qquad \lim_{||X||\to\infty}||X||^dk(X)=0\\ \int_{R^d}XK(X)dX=0\qquad \int_{R^d}XX^TK(X)dX=c_KI\\ \tag{3}$
where $c_K$ is a constant. The multivariate kernel can be generated from a symmetric univariate kernel $K_1(x)$ in two different ways
$K^P(X)=\prod^d_{i=1}K_1(x_i)\qquad K^S(X)=a_{k,d}K_1(||X||)\tag{4}$
where $K^P(X)$ is obtained from the product of the univariate kernels, and $K^P(X)$ from rotating $K_1(x)$ in $R_d$ , i.e., $K^S(X)$ is radially symmetric. The constant $a^{-1}_{k,d}=\int_{R^d}K_1(||X||)dX$ assures that $K^S (X)$ integrates to one, though this condition can be relaxed in our context. Either type of multivariate kernel obeys (3), but for our purposes the radially symmetric kernels are often more suitable.

We are interested only in a special class of radially symmetric kernels satisfying
$K(X)=c_{k,d}k(||X||^2)\tag{5}$
in which case it suffices to define the function $k (x)$ called the profile of the kernel, only for $x\geq0$ .The normalization constant $c_{k,d}$ , which makes $K (X)$ to integrate to one, is assumed strictly positive.

Using a fully parameterized H increases the complexity of the estimation [62, p.106] and in practice the bandwidth matrix H is chosen either as diagonal H = diag $h_1^2,...,h_d^2]$ , or proportional to the identity matrix H = $h^2$ I. The clear advantage of the latter case is that only one bandwidth parameter $h > 0$ must be provided, however, as can be seen from (2) then the validity of an Euclidean metric for the feature space should be confirmed first. Employing only one bandwidth parameter, the kernel density estimator (1) becomes the well known expression
$\hat{f}(x)=\frac{1}{nh^d}\sum^n_{i=1}K(\frac{X-X_i}{h}\tag{6})$
The quality of a kernel density estimator is measured by the mean of the square error between the density and its estimate, integrated over the domain of definition. In practice, however, only an asymptotic approximation of this measure (denoted as AMISE) can be computed. Under the asymptotics the number of data points $n\to\infty$ while the bandwidth $h\to\infty$ at a rate slower than $n^{-1}$ . For both types of multivariate kernels the AMISE measure is minimized by the Epanechnikov kernel [51, p.139], [62, p.104] having the profile
$k_E(x)=\begin{cases}1-x\ & 0\leq x \leq 1 \\ 0 & x>1\end{cases}\tag{7}$
which yields the radially symmetric kernel
$k_E(x)=\begin{cases}\frac{1}{2}c^{-1}_d(d+2(1-||X||^2))\ & ||X||\leq 1 \\ 0 & otherwise\end{cases}\tag{8}$
where $c_d$ is the volume of the unit $d$ -dimensional sphere. Note that the Epanechnikov profile is not differentiable at the boundary. The profile
$k_N(x)=\exp(-\frac{1}{2}x)\qquad x\geq0\tag{9}$
yields the multivariate normal kernel
$K_N(X)=(2\pi)^{-d/2}\exp(-\frac{1}{2}||X||^2)\tag{10}$
for both types of composition (4). The normal kernel is often symmetrically truncated to have a
kernel with finite support.

While these two kernels will suffice for most applications we are interested in, all the results
presented below are valid for arbitrary kernels within the conditions to be stated. Employing the
profile notation the density estimator (6) can be rewritten as
$\hat{f}_{h,k}(X)=\frac{c_{k,d}}{nh^d}\sum^n_{i=1}k(||\frac{X-X_i}{h}||^2)\tag{11}$
The first stepinthe analysisof a feature space withthe underlyingdensity $f (x)$ is to find themodes of this density. The modes are located among the zeros of the gradient $\nabla f(x)=0$ , and the mean shift procedure is an elegant way to locate these zeros without estimating the density.

2 均值偏移步骤

内核密度估计（在模式识别文献中称为Parzen窗口技术[17，第4.3节]）是最流行的密度估计方法。给定 $n$ 个数据点 $X_i$ , $i = 1, . . ., n$ 在 $d$ 维空间 $R^d$ 中，在点x上计算的具有核 $K (X)$ 和对称正定 $d\times d$ 带宽矩阵H的多元核密度估计器由下式给出：
$\hat{f}(X)=\frac{1}{n}\sum^n_{i=1}K_H(X-X_i)\tag{1}$
则
$K_H(X)=|H|^{-1/2}K(H^{-1/2}X)\tag{2}$
$d$ 变量内核 $K (X)$ 是一个有界函数
$\int_{R^d}K(X)dX=1\qquad \lim_{||X||\to\infty}||X||^dk(X)=0\\ \int_{R^d}XK(X)dX=0\qquad \int_{R^d}XX^TK(X)dX=c_KI\\ \tag{3}$
其中 $c$ 为常数。可以采用两种不同方式从对称单变量内核 $K_1(x)$ 生成多变量内核
$K^P(X)=\prod^d_{i=1}K_1(x_i)\qquad K^S(X)=a_{k,d}K_1(||X||)\tag{4}$
其中， $K^P(X)$ 是从单变量核的乘积获得的，而 $K^R(X)$ 是通过在 $R^d$ 中旋转 $K_1(x)$ 来获得的，即 $K^S(X)$ 是径向对称的。常数 $a^{-1}_{k,d}=\int_{R^d}K_1(||X||)dX$ 可以确保 $K^S(X)$ 积分为1，不过在我们的上下文中可以放宽此条件。两种类型的多元核均服从（3），但出于我们的目的，径向对称核通常更适合。

我们只对满足以下条件的一类特殊的径向对称核感兴趣：
$K(X)=c_{k,d}k(||X||^2)\tag{5}$
在这种情况下，仅对于 $x\geq0$ 定义为内核轮廓的函数 $k (x)$ 就足够了。使 $K (X)$ 积分为1的归一化常数 $c_{k,d}$ 严格地假定为正。

使用完全参数化的H会增加估计的复杂度[62，p.106]，实际上，带宽矩阵H可以选择为对角线H = diag $h_1^2,...,h_d^2]$ ，或与恒等式H = $h^2$ I成正比。后一种情况的明显优势是，仅必须提供一个带宽参数 $h > 0$ ，但是，从（2）中可以看出有效性首先应确定特征空间的欧几里得度量。仅使用一个带宽参数，核密度估计器（1）成为众所周知的表达式
$\hat{f}(x)=\frac{1}{nh^d}\sum^n_{i=1}K(\frac{X-X_i}{h}\tag{6})$
内核密度估计量的质量是通过在定义范围内积分的密度与其估计值之间的平方误差的平均值来衡量的。但是，实际上，只能计算该度量的渐近近似值（表示为AMISE）。在渐近状态下，数据点的数量为 $n\to\infty$ , 而带宽 $h\to\infty$ 的速率慢于$ n^{-1}$。对于两种类型的多元内核，通过具有以下特征的Epanechnikov内核[51，p.139]，[62，p.104]可以将AMISE度量最小化
$k_E(x)=\begin{cases}1-x\ & 0\leq x \leq 1 \\ 0 & x>1\end{cases}\tag{7}$
其中产生径向对称的内核
$k_E(x)=\begin{cases}\frac{1}{2}c^{-1}_d(d+2(1-||X||^2))\ & ||X||\leq 1 \\ 0 & otherwise\end{cases}\tag{8}$
其中 $c$ 是单位 $d$ 维球体的体积。注意，Epanechnikov轮廓在边界处是不可微的。
$k_N(x)=\exp(-\frac{1}{2}x)\qquad x\geq0\tag{9}$
产生多元正态核
$K_N(X)=(2\pi)^{-d/2}\exp(-\frac{1}{2}||X||^2)\tag{10}$
对于两种类型的成分（4）。普通内核通常被对称地截断，以使内核具有有限的支持。

尽管这两个内核足以满足我们感兴趣的大多数应用程序的需要，但下面列出的所有结果对于在规定条件下的任意内核都是有效的。利用轮廓符号，密度估算器（6）可以重写为
$\hat{f}_{h,k}(X)=\frac{c_{k,d}}{nh^d}\sum^n_{i=1}k(||\frac{X-X_i}{h}||^2)\tag{11}$
分析具有基础密度 $f (x)$ 的特征空间的第一步是找到该密度的模式。这些模式位于梯度 $\nabla f(x)=0$ 的零之间，并且均值偏移过程是一种无需估计密度即可找到这些零的绝妙方法。

2.1 Density Gradient Estimation

The density gradient estimator is obtained as the gradient of the density estimator by exploiting the
linearity of (11)
$\hat{\nabla f_{h,k}(X)}\equiv\nabla\hat f_{h,k}(X)=\frac{2c_{k,d}}{nh^{d+2}}\sum^n_{i=1}(X-X_i)k^{'} \left( \left\|\frac{X-X_i}{h} \right\|^2 \right) \tag{12}$
We define the function
$g(x)=-k^{'}(x)\tag{13}$
assuming that the derivative of the kernel profile $k$ exists for all $x\in[0,\infty)$ , except for a finite set of points. Using now $g (x)$ for profile, the kernel $G (X)$ is defined as
$G(X)=c_{g,d}g(\|X\|^2)\tag{14}$
where $c_{g,d}$ is the corresponding normalization constant. The kernel $K (X)$ was called the shadow of $G (X)$ in [7] in a slightly different context. Note that the Epanechnikov kernel is the shadow of the uniform kernel, i.e., the $d$ -dimensional unit sphere; while the normal kernel and its shadow have the same expression.

Introducing $g (x)$ into (12) yields
$\hat{\nabla}f_{h,K}(X)=\frac{2c_{k,d}}{nh^{d+2}}\sum^n_{i=1}(X_i-X)g\left(\left\|\frac{X-X_i}{h}\right\|^2\right) \\=\frac{2c_{k,d}}{nh^{d+2}}\left[ \sum^n_{i=1}g\left(\left\|\frac{X-X_i}{h}\right\|^2\right) \right] \left[ \frac{\sum^n_{i=1}X_ig\left(\left\|\frac{X-X_i}{h}\right\|^2\right)} {\sum^n_{i=1}g\left(\left\|\frac{X-X_i}{h}\right\|^2\right)}-x \right] \tag{15}$
where $\sum^n_{i=1}g\left(\left\|\frac{X-X_i}{h}\right\|^2\right)$ is assumed to be a positive number. This condition is easy to satisfy for all the profiles met in practice. Both terms of the product in (15) have special significance. From (11) the first term is proportional to the density estimate at $X$ computed with the kernel $G$
$\hat{f}_{h,G}(X)=\frac{c_{g,d}}{nh^{d}}\sum^n_{i=1}g\left(\left\|\frac{X-X_i}{h}\right\|^2\right)\tag{16}$
The second term is the mean shift
$\pmb{m}_{h,G}(X)=\frac{\sum^n_{i=1}X_ig\left(\left\|\frac{X-X_i}{h}\right\|^2\right)}{\sum^n_{i=1}g\left(\left\|\frac{X-X_i}{h}\right\|^2\right)}-x \tag{17}$
i.e., the difference between the weighted mean, using the kernel $G$ for weights, and $X$ the center of the kernel (window). From (16) and (17) the expression (15) becomes
$\hat{\nabla}f_{h,K}(X)=\hat f_{h,G}(X)\frac{2c_{k,d}}{h^2c_{g,d}}\pmb{m}_{h,G}(X)\tag{18}$
yielding
$\pmb{m}_{h,G}(X)=\frac{1}{2}h^2c\frac{\hat{\nabla}f_{h,K}(X)}{\hat{f_{h,G}}(X)} \tag{19}$
The expression (19) shows that at location $X$ the mean shift vector computed with kernel $G$ is proportional to the normalizeddensity gradient estimate obtained with kernel $K$ . The normalization is by the density estimate in $X$ computed with the kernel $G$ . The mean shift vector thus always points toward the direction of maximum increase in the density. This is a more general formulation of the property first remarked by Fukunaga and Hostetler [20, p.535], [21], and also discussed in [7].

The relation captured in (19) is intuitive,the local mean is shifted toward the region in which the majority of the points reside. Since the mean shift vector is aligned with the local gradient estimate it can define a path leading to a stationary point of the estimated density. The modes of the density are such stationary points. The mean shift procedure, obtained by successive

– computation of the mean shift vector $\pmb{m}_{h,G}(X)$ ,

– translation of the kernel (window) $G (X)$ by $\pmb{m}_{h,G}(X)$ ,

is guaranteed to converge at a nearby point where the estimate (11) has zero gradient, as will be shown in the next section. The presence of the normalization by the density estimate is a desirable feature. The regions of low density values are of no interest for the feature space analysis, and in such regions the mean shift steps are large. Similarly, near local maxima the steps are small, and the analysis more refined. The mean shift procedure thus is an adaptive gradient ascent method.

2.1 密度梯度估计

利用（11）的线性可将密度梯度估算器作为密度估算器的梯度获得
$\hat{\nabla f_{h,k}(X)}\equiv\nabla\hat f_{h,k}(X)=\frac{2c_{k,d}}{nh^{d+2}}\sum^n_{i=1}(X-X_i)k^{'} \left( \left\|\frac{X-X_i}{h} \right\|^2 \right) \tag{12}$
令
$g(x)=-k^{'}(x)\tag{13}$
假设内核配置文件 $k$ 的导数对于所有 $x\in[0,\infty)$ ，但一组有限的点除外。现在使用 $g (x)$ 进行分析，将内核 $G (X)$ 定义为
$G(X)=c_{g,d}g(\|X\|^2)\tag{14}$
其中 $c_{g,d}$ 是对应的归一化常数。在[7]中，内核在 $K (X)$ 稍微不同的上下文中称为 $G (X)$ 的阴影。请注意，Epanechnikov核是均匀核（即d维单位球体）的阴影；而普通内核及其阴影具有相同的表达。

将 $g (x)$ 带入(12)
$\hat{\nabla}f_{h,K}(X)=\frac{2c_{k,d}}{nh^{d+2}}\sum^n_{i=1}(X_i-X)g\left(\left\|\frac{X-X_i}{h}\right\|^2\right) \\=\frac{2c_{k,d}}{nh^{d+2}}\left[ \sum^n_{i=1}g\left(\left\|\frac{X-X_i}{h}\right\|^2\right) \right] \left[ \frac{\sum^n_{i=1}X_ig\left(\left\|\frac{X-X_i}{h}\right\|^2\right)} {\sum^n_{i=1}g\left(\left\|\frac{X-X_i}{h}\right\|^2\right)}-x \right] \tag{15}$
假定 $\sum^n_{i=1}g\left(\left\|\frac{X-X_i}{h}\right\|^2\right)$ 为正数，对于在实践中满足的所有配置文件，此条件很容易满足。(15）中产品的两个术语都具有特殊的意义。根据（11），第一项与使用核G计算的x处的密度估计值成比例
$\hat{f}_{h,G}(X)=\frac{c_{g,d}}{nh^{d}}\sum^n_{i=1}g\left(\left\|\frac{X-X_i}{h}\right\|^2\right)\tag{16}$
第二项是均值偏移
$\pmb{m}_{h,G}(X)=\frac{\sum^n_{i=1}X_ig\left(\left\|\frac{X-X_i}{h}\right\|^2\right)}{\sum^n_{i=1}g\left(\left\|\frac{X-X_i}{h}\right\|^2\right)}-x \tag{17}$
即，使用内核 $G$ 进行加权的加权平均值与 $X$ 内核中心（窗口）之间的差。从（16）和（17）表达式（15）变为
$\hat{\nabla}f_{h,K}(X)=\hat f_{h,G}(X)\frac{2c_{k,d}}{h^2c_{g,d}}\pmb{m}_{h,G}(X)\tag{18}$
即
$\pmb{m}_{h,G}(X)=\frac{1}{2}h^2c\frac{\hat{\nabla}f_{h,K}(X)}{\hat{f_{h,G}}(X)} \tag{19}$
表达式（19）表明，在位置 $X$ 处，由核 $G$ 计算出的平均位移矢量与由核 $K$ 获得的归一化密度梯度估计值成正比。归一化是通过由核 $G$ 得出的 $X$ 中的密度估计值。总是指向密度最大增加的方向。这是由Fukunaga和Hostetler首先提出的性质的更一般的表述[20, p.535]，也在[7]中讨论过。

在(19)中捕捉到的关系是直观的，局部的平均值被移向大多数点所在的区域。由于平均位移向量与局部梯度估计对齐，它可以定义一条路径，导致估计密度的平稳点。密度的模态就是这样的平稳点。平均位移程序，由逐次获得

– 计算平均偏移向量 $\pmb{m}_{h,G}(X)$ ，

– 将核(窗) $G (X)$ 偏移 $\pmb{m}_{h,G}(X)$ ，

这将保证收敛于估计值(11)具有零梯度的附近点，如下一节所示。通过密度估计进行归一化是一个理想的功能。低密度值区域对于特征空间分析不重要，并且在此类区域中，平均移动步长很大。同样，在局部最大值附近，步长很小，分析也更加精细。因此，平均移位过程是一种自适应梯度上升方法。

2.2 Sufficient Condition for Convergence

Denote by ${Y_j\}_{j=1,2...}$ the sequence of successive locations of the kernel $G$ , where from (17)
$Y_{j+1}=\frac{\sum^n_{i=1}X_ig\left(\left\|\frac{Y_i-X_i}{h}\right\|^2\right)}{\sum^n_{i=1}g\left(\left\|\frac{Y_i-X_i}{h}\right\|^2\right)} \qquad j=1,2... \tag{20}$
is the weighted mean at $Y_j$ computed with kernel $G$ and $Y_1$ is the center of the initial position of the kernel. The corresponding sequence of density estimates computed with kernel $K$ , $\{\hat{f}_{h,K}(j)\}_{j=1,2...}$ , is given by
$\hat{f}_{h,K}(j)=\hat{f}_{h,K}(Y_j) \qquad j=1,2... \tag{21}$
As stated by the following theorem, a kernel $K$ that obeys some mild conditions suffices for the convergence of the sequences ${Y_j\}_{j=1,2...}$ and $\{\hat{f}_{h,K}(j)\}_{j=1,2...}$

Theorem 1 : If the kernel $K$ has a convex and monotonically decreasing profile, the sequences ${Y_j\}_{j=1,2...}$ and $\{\hat{f}_{h,K}(j)\}_{j=1,2...}$ converge, and $\{\hat{f}_{h,K}(j)\}_{j=1,2...}$ is also monotonically increasing.

The proof is given in the Appendix. The theorem generalizes the result derived differently in [13], where $K$ was the Epanechnikov kernel, and $G$ the uniform kernel. The theorem remains valid when each data point $X$ iis associated with a nonnegative weight $w_i$ . An example of nonconvergence when the kernel $K$ is not convex is shown in [10, p.16].

The convergence property of the mean shift was also discussed in [7, Sec.IV]. (Note, however, that almost all the discussion there is concerned with the “blurring” process in which the input is recursively modified after each mean shift step.) The convergence of the procedure as defined in this paper was attributed in [7] to the gradient ascent nature of (19). However, as shown in [4, Sec.1.2], moving in the direction of the local gradient guarantees convergence only for infinitesimal steps. The stepsize of a gradient based algorithm is crucial for the overall performance. If the step size is too large, the algorithm will diverge, while if the step size is too small, the rate of convergence may be very slow. A number of costly procedures have been developed for stepsize selection [4, p.24].The guaranteed convergence (as shown by Theorem 1) is due to the adaptive magnitude of the mean shift vector which also eliminates the need for additional procedures to chose the adequate stepsizes. This is a major advantage over the traditional gradient based methods.

For discrete data, the number of steps to convergence depends on the employed kernel. When $G$ is the uniform kernel, convergence is achieved in a finite number of steps, since the number of locations generating distinct mean values is finite. However, when the kernel $G$ imposes a weighting on the data points (according to the distance from its center), the mean shift procedure is infinitely convergent. The practical way to stop the iterations is to set a lower bound for the magnitude of the mean shift vector.

2.2收敛的充分条件

如下表示为 ${Y_i}_{j=1,2...}$ 核 $G$ 的连续位置的序列，其中（17）
$Y_{j+1}=\frac{\sum^n_{i=1}X_ig\left(\left\|\frac{Y_i-X_i}{h}\right\|^2\right)}{\sum^n_{i=1}g\left(\left\|\frac{Y_i-X_i}{h}\right\|^2\right)} \qquad j=1,2... \tag{20}$
是用内核 $G$ 计算的 $Y_j$ 的加权平均值，而 $Y_1$ 是内核初始位置的中心。用核 $K$ 计算的相应密度估计序列 $\{\hat{f}_{h,K}(j)\}_{j=1,2...}$ 由下式给出
$\hat{f}_{h,K}(j)=\hat{f}_{h,K}(Y_j) \qquad j=1,2... \tag{21}$
如以下定理所述，服从某些温和条件的核 $K$ 满足序列 ${Y_j\}_{j=1,2...}$ 和 $\{\hat{f}_{h,K}(j)\}_{j=1,2...}$ 的收敛。

定理1 ：如果核 $K$ 具有凸且单调递减的轮廓，则序列 ${Y_j\}_{j=1,2...}$ 和 $\{\hat{f}_{h,K}(j)\}_{j=1,2...}$ 收敛，并且 $\{\hat{f}_{h,K}(j)\}_{j=1,2...}$ 也单调增加。

证明在附录中给出。该定理概括了在[13]中得出的不同结果，其中 $K$ 是Epanechnikov核，而 $G$ 是均匀核。当每个数据点 $X$ 与非负权重 $w_i$ 相关联时，该定理仍然有效。在[10，p.16]中显示了当核 $K$ 不是凸时的不收敛示例。

在[7，IV节]中也讨论了均值偏移的收敛性。（但是，请注意，几乎所有讨论都与“模糊”过程有关，在该过程中，在每个均值偏移步骤之后对输入进行递归修改。）本文定义的过程的收敛性在[7]中归因于（19）的梯度上升性质。但是，如[4，第1.2节]所示，沿局部梯度的方向移动只能保证收敛于无穷小步长。基于梯度的算法的步长对于整体性能至关重要。如果步长太大，则算法将发散，而如果步长太大，则收敛速度可能会很慢。现在已经开发出许多成本高昂的程序来进行逐步大小选择[4，p.24]。保证的收敛性（如定理1所示）是由于均值偏移矢量的自适应幅度所致，这也消除了选择其他程序来选择适当步长的需求。与传统的基于梯度的方法相比，这是一个主要优势。

对于离散数据，收敛步骤的数量取决于所采用的内核。当 $G$ 是均匀核时，由于生成不同平均值的位置数是有限的，所以收敛的步数是有限的。但是，当内核 $G$ 对数据点施加权重时（根据距其中心的距离），平均移位过程将无限收敛。停止迭代的实际方法是为平均移位向量的大小设置一个下限。

2.3 Mean Shift Based Mode Detection

Let us denote by $Y_c$ and $\hat{f}_{h,K}(Y_c)$ the convergence points of the sequences ${Y_j\}_{j=1,2...}$ and $\{\hat{f}{h,K}(j)\}_{j=1,2...}$ , respectively. The implications of Theorem 1 are the following.

First, the magnitude of the mean shift vector converges to zero. Indeed, from (17) and (20) the $j$ -th mean shift vector is
$\pmb{m}_{h,G}(Y_j)=Y_{j+1}-Y_j \tag{22}$
and, at the limit $\pmb{m}_{h,G}(Y_c)=Y_{c}-Y_c=0$ . In other words, the gradient of the density estimate (11) computed at $Y_c$ is zero
$\nabla \hat{f}_{h,K}(Y_c)=0, \tag{23}$
due to (19). Hence, $Y_c$ is a stationary point of $\hat{f}_{h,K}$ .

Second, since $\{\hat{f}{h,K}(j)\}_{j=1,2...}$ is monotonically increasing, the mean shift iterations satisfy the conditions required by the Capture Theorem [4, p.45], which states that the trajectories of such gradient methods are attracted by local maxima if they are unique (within a small neighborhood) stationary points. That is, once $Y_j$ gets sufficiently close to a mode of $\hat{f}_{h,K}$ , it converges to it. The set of all locations that converge to the same mode defines the basin of attraction of that mode.

The theoretical observations from above suggest a practical algorithm for mode detection:

– run the mean shift procedure to find the stationary points of $\hat{f}_{h,K}$ ,

– prune these points by retaining only the local maxima.

The local maxima points are defined according to the Capture Theorem, as unique stationary points within some small open sphere. This property can be tested by perturbing each stationary point by a random vector of small norm, and letting the mean shift procedure converge again. Should the point of convergence be unchanged (up to a tolerance), the point is a local maximum.

2.3基于均值偏移的模式检测

我们用 $Y_c$ a和 $\hat{f}_{h,K}(Y_c)$ 来分别表示序列 ${Y_j\}_{j=1,2...}$ 和 $\{\hat{f}{h,K}(j)\}_{j=1,2...}$ ，定理1的含义如下。

首先，平均偏移矢量的大小收敛到零。实际上，从（17）和（20），第 $j$ 个均值偏移向量为
$\pmb{m}_{h,G}(Y_j)=Y_{j+1}-Y_j \tag{22}$
并且，在极限 $\pmb{m}_{h,G}(Y_c)$ 处， $Y_{c}-Y_c=0$ 。换句话说，在 $Y_c$ 处计算的密度估计值（11）的梯度为零
$\nabla \hat{f}_{h,K}(Y_c)=0, \tag{23}$
由于（19）可知， $Y_c$ 是 $K$ 的固定点。

其次，由于 $\{\hat{f}{h,K}(j)\}_{j=1,2...}$ 是单调增加的，所以均值偏移迭代满足Capture Theorem [4，p.45] 的要求，该定理指出，如果这种梯度方法的轨迹是唯一的(在一个小邻域内)静止点，则会被局部极大值所吸引。也就是说，一旦 $Y_j$ 足够接近 $\hat{f}_{h,K}$ 的某个模态，它就会收敛。收敛到同一模式的所有位置的集合定义了该模式的吸引域。

以上理论观察为模式检测提供了一种实用的算法:

-运行均值偏移程序，求 $\hat{f}_{h,K}$ 的固定点，

-通过只保留局部最大值来删除这些点。

根据捕获定理，将局部最大值定义为某个小开放球体内的唯一固定点。可以通过以下方法来测试此属性：使用小范数的随机向量扰动每个固定点，然后让均值偏移过程再次收敛。如果收敛点保持不变（不超过公差），则该点为局部最大值。

2.4 Smooth Trajectory Property

The mean shift procedure employing a normal kernel has an interesting property. Its path toward the mode follows a smooth trajectory, the angle between two consecutive mean shift vectors being always less than 90 degrees.

Using the normal kernel (10) the $j$ -th mean shift vector is given by:
$\pmb{m}_{h,N}(Y_i)=Y_{j+1}-Y_{j}=\frac{\sum^n_{i=1}X_i\exp\left(-\left\|\frac{Y_j-X_i}{h}\right\|^2\right)} {\sum^n_{i=1}\exp\left(-\left\|\frac{Y_j-X_i}{h}\right\|^2\right)}-y_i. \tag{24}$
The following theorem holds true for all $j = 1, 2, . . .$ , according to the proof given in the Appendix.

Theorem 2: The cosine of the angle between two consecutive mean shift vectors is strictly positive when a normal kernel is employed, i.e.,
$\frac{\pmb{m}_{h,N}(Y_j)^\top\pmb{m}_{h,N}(Y_{j+1})} {\left\|\pmb{m}_{h,N}(Y_j)\right\|\left\|\pmb{m}_{h,N}(Y_{j+1})\right\|}>0. \tag{25}$
As a consequence of Theorem 2 the normal kernel appears to be the optimal one for the mean shift procedure. The smooth trajectory of the mean shift procedure is in contrast with the standard steepest ascent method [4, p.21] (local gradient evaluation followed by line maximization) whose convergence rate on surfaces with deep narrow valleys is slow due to its zigzagging trajectory.

In practice, the convergence of the mean shift procedure based on the normal kernel requires large number of steps, as was discussed at the end of Section 2.2. Therefore, in most of our experiments we have used the uniform kernel, for which the convergence is finite, and not the normal kernel. Note, however, that the quality of the results almost always improves when the normal kernel is employed.

2.4 平滑轨迹特性

采用标准核函数的均值偏移过程有一个有趣的性质。它走向模式的轨迹是平滑的，两个连续的平均位移向量之间的夹角总是小于90度。

使用标准核（10），第 $j$ 个均值移动向量由下式给出：
$\pmb{m}_{h,N}(Y_i)=Y_{j+1}-Y_{j}=\frac{\sum^n_{i=1}X_i\exp\left(-\left\|\frac{Y_j-X_i}{h}\right\|^2\right)} {\sum^n_{i=1}\exp\left(-\left\|\frac{Y_j-X_i}{h}\right\|^2\right)}-y_i. \tag{24}$
根据附录中给出的证明，对于所有的 $j = 1, 2, . . .$ ，以下定义成立。

定理2：当使用标准核时，两个连续均值偏移向量之间的角度的余弦严格为正。
$\frac{\pmb{m}_{h,N}(Y_j)^\top\pmb{m}_{h,N}(Y_{j+1})} {\left\|\pmb{m}_{h,N}(Y_j)\right\|\left\|\pmb{m}_{h,N}(Y_{j+1})\right\|}>0. \tag{25}$
定理2的结果是，标准核似乎是均值偏移程序的最佳选择。均值移位过程的平滑轨迹与标准最陡上升方法[4，p.21]（局部梯度评估，然后进行线最大化）相反，该方法由于其曲折轨迹而在深窄谷的表面上的收敛速度较慢。

在实践中，基于标准核的均值偏移过程的收敛需要大量步骤，如2.2节末尾所述。因此，在我们的大多数实验中，我们使用的是收敛性是有限的统一内核，而不是标准内核。但是请注意，使用标准内核时，结果质量几乎总是会提高。

2.5 Relation to Kernel Regression

Important insight can be gained when the relation (19) is obtained approaching the problem differently. Considering the univariate case suffices for this purpose.

Kernel regression is an on parametric method to estimate complex trends from noisy data. See [62, Chap.5] for an introduction to the topic, [24] for a more in-depth treatment. Let $n$ measured data points be $X_i,Z_i)$ and assume that the values $X_i$ are the outcomes of a random variable $x$ with probability density function $f (x)$ , $x_i = X_i$ ; $i=1,\dots,n$ , while the relation between $Z_i$ and $X_i$ is
$Z_i=m(X_i)+\epsilon_i \qquad i=1,\dots,n \tag{26}$
where $m (x)$ is called the regression function, and $\epsilon_i$ is an independently distributed, zero-mean error, $E[\epsilon_i] = 0$ .

A natural way to estimate the regression function is by locally fitting a degree $p$ polynomial to the data. For a window centered at $x$ the polynomial coefficients then can be obtained by weighted least squares, the weights being computed from a symmetric function $g (x)$ . The size of the window is controlled by the parameter $h$ , $g_h(x)=h^{-1}g(x/h)$ . The simplest case is that of fitting a constant to the data in the window, i.e., $p = 0$ . It can be shown, [24, Sec.3.1], [62, Sec.5.2], that the estimated constant is the value of the Nadaraya–Watson estimator
$\hat{m}(x;h)=\frac{\sum^n_{i=1}g_h(x-X_i)Z_i}{\sum^n_{i=1}g_h(x-X_i)} \tag{27}$
introduced in the statistical literature 35 years ago. The asymptotic conditional bias of the estimator has the expression [24, p.109], [62, p.125],
$E[(\hat{m}(x;h)-m(x))|X_1,\dots,X_n|]\approx h^2\frac{m^{''}(x)f(x)+2m^{'}(x)f^{'}(x)}{2f(x)}\mu_2[g] \tag{28}$
where $\mu_2[g] = \int u^2 g (u)du$ . Defining $m (x) = x$ reduces the Nadaraya–Watson estimator to (20) (in the univariate case), while (28) becomes
$E[(\hat{x}-x)|X_1,\dots,X_n|]\approx h^2\frac{f^{'}(x)}{f(x)\mu_2[g]} \tag{29}$
which is similar to (19). The mean shift procedure thus exploits to its advantage the inherent bias of the zero-order kernel regression.

The connection to the kernel regression literature opens many interesting issues, however, most of these are more of a theoretical than practical importance.

2.5 内核回归关系

Important insight can be gained when the relation (19) is obtained approaching the problem differently. Considering the univariate case suffices for this purpose.

内核回归是非参数方法，可以根据嘈杂的数据估算复杂趋势。有关该主题的介绍，请参见[62，第5章]，有关更深入的处理，请参见[24]。设 $n$ 个测得的数据点为 $X_i,Z_i)$ ，并假定 $X_i$ 值是概率密度函数为 $f (x)$ ， $x_i = X_i$ 的随机变量 $x$ 的结果；当 $i=1,\dots,n$ ， $Z_i$ 和 $X_i$ 之间的关系是
$Z_i=m(X_i)+\epsilon_i \qquad i=1,\dots,n \tag{26}$
其中 $m (x)$ 称为回归函数， $\epsilon_i$ 是一个独立分布的零均值误差， $E[\epsilon_i] = 0$ 。

估计回归函数的一种自然方法是通过局部拟合一个 $p$ 次多项式的数据。在 $x$ 多项式系数为中心的窗口可以通过加权最小二乘法,计算的权重从对称函数 $g (x)$ 窗口的大小是由参数 $h$ , $g_h(x)=h^{-1}g(x/h)$ 。最简单的情况是为窗口中的数据拟合一个常数，即。 $p = 0$ 。可以看出，[24，第3.1节]，[62，第5.2节]，估计的常数就是Nadaraya–Watson估计量的值
$\hat{m}(x;h)=\frac{\sum^n_{i=1}g_h(x-X_i)Z_i}{\sum^n_{i=1}g_h(x-X_i)} \tag{27}$
35年前在统计文献中引入的。该估计量的条件渐近偏差有[24,p。109], [62, p.125],
$E[(\hat{m}(x;h)-m(x))|X_1,\dots,X_n|]\approx h^2\frac{m^{''}(x)f(x)+2m^{'}(x)f^{'}(x)}{2f(x)}\mu_2[g] \tag{28}$
$\mu_2[g] = \int u^2 g (u)du$ ，定义 $m (x) = x$ 将Nadaraya–Watson估计量减少到(20)(在单变量情况下)，而(28)变成
$E[(\hat{x}-x)|X_1,\dots,X_n|]\approx h^2\frac{f^{'}(x)}{f(x)\mu_2[g]} \tag{29}$
与(19)相似。均值偏移法利用了零阶核回归的固有偏差。

与内核回归文献的联系带来了许多有趣的问题，但是，大多数问题在理论上比实际上更重要。

2.6 Relation to Location M-estimators

The M-estimators are a family of robust techniques which can handle data in the presence of severe contaminations, i.e., outliers. See [26], [32] for introductory surveys. In our context only the problem of location estimation has to be considered.

Given the data $x_i$ , $1,\dots,n$ , and the scale $h$ , will define $\hat{\pmb\theta}$ , the location estimator as
$\hat{\pmb\theta}=\arg\min_{\theta}J({\pmb\theta})=\arg\min_{\theta}\sum^n_{i=1}\rho\left(\left\|\frac{{\pmb\theta}-X_i}{h}\right\|^2\right) \tag{30}$
where, $\rho(u)$ is a symmetric, nonnegative valued function, with a unique minimum at the origin and nondecreasing for $u\geq0$ . The estimator is obtained from the normal equations
$\nabla_\theta J(\hat{\pmb\theta})=2h^{-2}(\hat{\pmb\theta}-X_i)w\left(\left\|\frac{\hat{\pmb\theta}-X_i}{h}\right\|^2\right)=0 \tag{31}$
where $w(u)=\frac{d\rho(u)}{du}$ . Therefore the iterations to find the location M-estimate are based on
$\hat{\pmb\theta}=\frac{\sum^n_{i=1}X_iw\left(\left\|\frac{\hat{\pmb\theta}-X_i}{h}\right\|^2\right)}{\sum^n_{i=1}w\left(\left\|\frac{\hat{\pmb\theta}-X_i}{h}\right\|^2\right)} \tag{32}$
which is identical to (20) when $w(u)\equiv g(u)$ . Taking into account (13) the minimization (30) becomes
$\hat{\pmb\theta}=\arg\max_{\theta}\sum^n_{i=1}k\left(\left\|\frac{{\pmb\theta}-X_i}{h}\right\|^2\right) \tag{33}$

你可能感兴趣的:(机器学习,机器学习)

机器学习驱动的智能化电池管理技术与应用满木悦电池化学机器人化学电池机器学习人工智能硕博研究生
在人工智能与电池管理技术融合的背景下，电池科技的研究和应用正迅速发展，创新解决方案层出不穷。从电池性能的精确评估到复杂电池系统的智能监控，从数据驱动的故障诊断到电池寿命的预测优化，人工智能技术正以其强大的数据处理能力和模式识别优势，推动电池管理领域的技术进步。据最新研究动态，目前在电池管理领域的人工智能应用主要集中在以下几个方面：1.状态估计：包括电池的荷电状态（SOC）和健康状态（SOH）的实时
梯度下降法理论理解伶星37 机器学习人工智能
梯度下降法：看似原始却透露着机器学习的本质前提：在研究梯度下降方法之前，你要理解矩阵运算（解析解）的方法矩阵运算目前的缺点只能进行对线性函数经行分析，无法对复杂的函数经行分析什么是梯度，以及梯度向量梯度下降的形象例子以及基本思想有三个兄弟被困在山上，得要死，他们目标是看谁尽快找到山谷中的水源老大比较后选择最陡的方向随便探索一下，就朝较低处走去探测几下就走陡峭的方向梯度下降算法的核心思想就是沿着负梯
Java 大视界 -- 基于 Java 的大数据机器学习模型的多模态融合技术与应用（143）青云交大数据新视界 Java 大视界 java 大数据机器学习多模态融合智能安防智能客服数据处理
亲爱的朋友们，热烈欢迎来到青云交的博客！能与诸位在此相逢，我倍感荣幸。在这飞速更迭的时代，我们都渴望一方心灵净土，而我的博客正是这样温暖的所在。这里为你呈上趣味与实用兼具的知识，也期待你毫无保留地分享独特见解，愿我们于此携手成长，共赴新程！一、欢迎加入【福利社群】点击快速加入：青云交灵犀技韵交响盛汇福利社群点击快速加入2：2024CSDN博客之星创作交流营（NEW)二、本博客的精华专栏：大数据新视
2025年第二届机器学习与神经网络国际学术会议(MLNN 2025) 分享学术科研与论文的禁小默机器学习神经网络人工智能
重要信息官网：www.icmlnn.org时间：2025年4月22-24日地点：中国-重庆简介2025年第二届机器学习与神经网络国际学术会议（MLNN2025）围绕学习系统与神经网络的核心理论、关键技术和应用展开讨论，涵盖深度学习、计算机视觉、自然语言处理、强化学习等多个子领域，通过特邀报告、主题演讲、海报展示等形式，展示相关领域的最新研究成果和技术创新。征稿主题神经网络机器学习深度学习算法及应用
MySQL中基于机器学习的自适应缓存热点识别优化策略——开启数据库性能新纪元墨夶数据库学习资料1 数据库 mysql 机器学习
在数据驱动的世界里，数据库的性能直接影响到整个应用系统的响应速度和用户体验。随着业务量的增长和技术的发展，传统的缓存机制逐渐暴露出局限性。如何更智能地识别并利用热点数据进行缓存优化，成为提升数据库性能的关键所在。今天，我们将深入探讨一种创新的方法——基于机器学习的自适应缓存热点识别优化策略，并分享其在MySQL中的具体实现方案。为什么选择机器学习？‍传统上，开发者们依赖于手动配置或预设规则来决定哪
AI人工智能软件开发方案：开启智能时代的创新钥匙广州硅基技术官方人工智能
一、引言：AI浪潮下的软件开发新机遇近年来，人工智能（AI）技术的迅猛发展如同一股汹涌澎湃的浪潮，席卷了全球各个领域。从最初的概念提出到如今的广泛应用，AI历经了漫长的发展历程，终于迎来了属于它的黄金时代。回首过去，AI的发展并非一帆风顺，早期由于计算能力和算法的限制，经历了多次起伏。但随着大数据、云计算、机器学习、深度学习等技术的不断突破，AI迎来了爆发式增长。如今，AI已经深入到人们生活和工作
【机器学习】算法分类 CH3_CH2_CHO 什么？！是机器学习！！机器学习算法有监督学习无监督学习半监督学习强化学习
1、有监督学习1.1定义使用带标签的数据训练模型。有监督学习是机器学习中最常见的一种类型，它利用已知的输入特征和对应的输出标签来训练模型，使模型能够学习到特征与标签之间的映射关系。在训练过程中，模型会不断地调整自身的参数，以最小化预测值与真实标签之间的误差，从而提高预测的准确性。1.2回归问题1.2.1目标预测连续值。回归问题的目标是预测一个连续的数值结果，模型的输出是一个实数值。1.2.2解释回
使用 Baseten 部署和运行机器学习模型的指南 shuoac 机器学习人工智能 python
随着机器学习模型在各个行业中的广泛应用，如何高效地部署和运行这些模型成为一个关键问题。本文将介绍如何使用Baseten平台来部署和服务机器学习模型。Baseten是LangChain生态系统中的一个重要提供者，它提供了所需的基础设施来高效地运行模型。无论是开源模型如Llama2和Mistral，还是专有或经过微调的模型，Baseten都能在专用GPU上运行。技术背景介绍Baseten提供了一种不同
机器学习——分类、回归、聚类、LASSO回归、Ridge回归（自用）代码的建筑师模型学习模型训练机器学习机器学习分类回归正则化项 LASSO Ridge 朴素
纠正自己的误区：机器学习是一个大范围，并不是一个小的方向，比如：线性回归预测、卷积神经网络和强化学都是机器学习算法在不同场景的应用。机器学习最为关键的是要有数据，也就是数据集名词解释：数据集中的一行叫一条样本或者实例，列名称为特征或者属性。样本的数量称为数据量，特征的数量称为特征维度机器学习常用库：Numpy和sklearn朴素的意思是特征的各条件都是相互独立的机器学习（模型、策略、算法）损失函数
量化交易系统中如何处理机器学习模型的训练和部署？ openwin_top 量化交易系统开发机器学习人工智能量化交易
microPythonPython最小内核源码解析NI-motion运动控制c语言示例代码解析python编程示例系列python编程示例系列二python的Web神器Streamlit如何应聘高薪职位量化交易系统中，机器学习模型的训练和部署需要遵循一套严密的流程，以确保模型的可靠性、性能和安全性。以下是详细描述以及相关的示例：1.数据收集和预处理数据收集在量化交易中，数据是最重要的资产。收集的数
【深度学习与大模型基础】第7章-特征分解与奇异值分解 lynn-66 深度学习与大模型基础算法机器学习人工智能
一、特征分解特征分解（EigenDecomposition）是线性代数中的一种重要方法，广泛应用于计算机行业的多个领域，如机器学习、图像处理和数据分析等。特征分解将一个方阵分解为特征值和特征向量的形式，帮助我们理解矩阵的结构和性质。1.特征分解的定义对于一个n×n的方阵A，如果存在一个非零向量v和一个标量λ，使得：则称λ为矩阵A的特征值，v为对应的特征向量。特征分解将矩阵A分解为：其中：Q是由特征
【论文阅读】Persistent Homology Captures the Generalization of Neural Networks Without A Validation Set 开心星人论文阅读论文阅读
将神经网络表征为加权的无环图，直接根据模型的权重矩阵构造PD。计算相邻batch的权重矩阵PD之间的距离。比较同调收敛性与神经网络的验证精度变化趋势摘要机器学习从业者通常通过监控模型的某些指标来估计其泛化误差，并在训练数值收敛之前停止训练，以防止过拟合。通常，这种误差度量或任务相关的指标是通过一个验证集（holdoutset）来计算的。因为这些数据没有直接用于更新模型参数，通常假设模型在验证集上的
震惊！ “深度学习”都在学习什么扉间798 深度学习学习人工智能
常见的机器学习分类算法俗话说三个臭皮匠胜过诸葛亮这里面集成学习就是将单一的算法弱弱结合算法融合用投票给特征值加权重AdaBoost集成学习算法通过迭代训练一系列弱分类器，给予分类错误样本更高权重，使得后续弱分类器更关注这些样本，然后将这些弱分类器线性组合成强分类器，提高整体分类性能。（一）投票机制投票是一种直观且常用的算法融合策略。在多分类问题中，假设有多个分类器对同一数据进行分类判断。每个分类器
【论文阅读】Availability Attacks Create Shortcuts 开心星人论文阅读论文阅读
还得重复读这一篇论文，有些地方理解不够透彻可用性攻击通过在训练数据中添加难以察觉的扰动，使数据无法被机器学习算法利用，从而防止数据被未经授权地使用。例如，一家私人公司未经用户同意就收集了超过30亿张人脸图像，用于构建商业人脸识别模型。为解决这些担忧，许多数据投毒攻击被提出，以防止数据被未经授权的深度模型学习。它们通过在训练数据中添加难以察觉的扰动，使模型无法从数据中学习太多信息，从而导致模型在未见
机器学习 Day01人工智能概述山北雨夜漫步机器学习人工智能
1.什么样的程序适合在gpu上运行计算密集型的程序：此类程序主要运算集中在寄存器，寄存器读写速度快，而GPU拥有强大的计算能力，能高效处理大量的寄存器运算，因此适合在GPU上运行。像科学计算中的数值模拟、密码破解等场景的程序，都属于计算密集型，在GPU上运行可大幅提升运算速度。易于并行的程序：GPU采用SIMD架构，有众多核心，同一时间每个核心适合做相同的事。易于并行的程序能充分利用GPU这一特性
机器学习：让计算机学会思考的艺术平凡而伟大. 机器学习机器学习人工智能
目录什么是机器学习？机器学习的基本步骤常见的机器学习算法机器学习的实际应用如何入门机器学习？结语在当今数字化时代，机器学习（MachineLearning,ML）已经成为一个炙手可热的话题。从推荐系统到自动驾驶汽车，再到语音助手，机器学习的应用无处不在。然而，对于许多人来说，机器学习仍然是一个神秘而复杂的领域。本文将用通俗易懂的语言，带你走进机器学习的世界，了解它的基本原理和应用。什么是机器学习？
机器学习中的 K-均值聚类算法及其优缺点平凡而伟大. 机器学习机器学习算法均值算法
K-均值聚类是一种常用的无监督学习算法，用于将数据集中的样本分成K个簇。其基本原理是将所有样本点划分到K个簇使得簇内样本点之间的距离尽可能接近，而不同簇之间的距离尽可能远。算法流程如下：随机选择K个样本点作为初始的聚类中心。将每个样本点分配到与其最近的聚类中心所在的簇。更新每个簇的聚类中心为该簇所有样本点的平均值。重复第2步和第3步，直到聚类中心不再变化或者达到最大迭代次数。优点：简单且易于实现。
一文讲清楚深度学习和机器学习平凡而伟大. 机器学习人工智能深度学习机器学习人工智能
目录1.定义机器学习（MachineLearning,ML）深度学习（DeepLearning,DL）2.工作原理机器学习深度学习3.应用场景机器学习深度学习4.主要区别5.为什么选择深度学习？6.总结深度学习和机器学习是人工智能（AI）领域中两个密切相关但有所区别的概念。要清楚地解释它们之间的关系，我们可以从定义、工作原理、应用场景以及两者的主要区别等方面进行探讨。1.定义机器学习（Machin
机器学习knnlearn1 XW-ABAP 机器学习机器学习人工智能
importmatplotlib.pyplotaspltimportnumpyasnpimportoperator#定义一个函数用于创建数据集defcreateDataSet():#定义特征矩阵，每个元素是一个二维坐标点，代表不同策略数据点的坐标group=np.array([[20,3],[15,5],[18,1],[5,17],[2,15],[3,20]])#定义每个数据点对应的标签，用于区分
基于 MySQL 和 Spring Boot 的在线论坛管理系统设计与实现城南|阿洋-计算机从小白到大神 mysql spring boot 数据库
markdownCopy✌全网粉丝20W+,csdn特邀作者、博客专家、CSDN[新星计划]导师、java领域优质创作者,博客之星、掘金/华为云/阿里云/InfoQ等平台优质作者、专注于Java、pyhton、机器学习技术领域和毕业项目实战✌哈喽兄弟们，好久不见哦～最近整理了一下之前写过的一些小项目/毕业设计。发现还是有很多存货的，想一想既然放在电脑里面也吃灰，那么还不如分享出去，没准还可以帮助到
零基础入门机器学习：用Scikit-learn实现鸢尾花分类藍海琴泉机器学习 scikit-learn 分类
适合人群：机器学习新手|数据分析爱好者|需快速展示案例的学生一、引言：为什么要学这个案例？目的：明确机器学习解决什么问题，建立学习信心。机器学习定义：让计算机从数据中自动学习规律（如分类鸢尾花品种）。为什么选鸢尾花数据集：数据量小、特征明确，适合教学演示。Scikit-learn优势：提供现成算法和工具，无需从头写数学公式。二、环境准备：5分钟快速上手目的：搭建可运行的代码环境，避免卡在工具安装环
机器学习--DBSCAN聚类算法详解 2201_75491841 机器学习算法聚类人工智能
目录引言1.什么是DBSCAN聚类？2.DBSCAN聚类算法的原理3.DBSCAN算法的核心概念3.1邻域（Neighborhood）3.2核心点（CorePoint）3.3直接密度可达（DirectlyDensity-Reachable）3.4密度可达（Density-Reachable）3.5密度相连（Density-Connected）4.DBSCAN算法的步骤5.DBSCAN算法的优缺点5
【机器学习】机器学习工程实战-第3章数据收集和准备腊肉芥末果机器学习工程实战机器学习人工智能
上一章：第2章项目开始前文章目录3.1关于数据的问题3.1.1数据是否可获得3.1.2数据是否相当大3.1.3数据是否可用3.1.4数据是否可理解3.1.5数据是否可靠3.2数据的常见问题3.2.1高成本3.2.2质量差3.2.3噪声（noise）3.2.4偏差（bias）3.2.5预测能力低（lowpredictivepower）3.2.6过时的样本3.2.7离群值3.2.8数据泄露/目标泄漏3
机器学习实战第一章机器学习基础 LuoY、 Machine Learning 机器学习算法人工智能
第一章机器学习1.1何谓机器学习1.2关键术语1.3机器学习的主要任务1.4如何选择合适的算法1.5开发机器学习应用程序的步骤1.6Python语言的优势1.1何谓机器学习 1、简单地说，机器学习就是把无序的数据转换成有用的信息； 2、机器学习能让我们自数据集中受启发，我们会利用计算机来彰显数据背后的真实含义； 3、机器学习横跨计算机科学、工程技术和统计学等多个学科，需要多学科的
数据挖掘实战-基于机器学习的垃圾邮件检测模型艾派森数据挖掘实战合集数据挖掘机器学习人工智能 python
‍♂️个人主页：@艾派森的个人主页✍作者简介：Python学习者希望大家多多支持，我们一起进步！如果文章对你有帮助的话，欢迎评论点赞收藏加关注+目录1.项目背景2.数据集介绍
集成学习（随机森林） herry57 数学建模大数据随机森林集成学习
目录一、集成学习概念二、Bagging集成原理三、随机森林四、例子（商品分类）一、集成学习概念集成学习通过建⽴⼏个模型来解决单⼀预测问题。它的⼯作原理是⽣成多个分类器/模型，各⾃独⽴地学习和作出预测。这些预测最后结合成组合预测，因此优于任何⼀个单分类的做出预测。只要单分类器的表现不太差，集成学习的结果总是要好于单分类器的二、Bagging集成原理分类圆形和长方形三、随机森林在机器学习中，随机森林是
【机器学习】朴素贝叶斯入门：从零到垃圾邮件过滤实战吴师兄大模型 0基础实现机器学习入门到精通机器学习人工智能朴素贝叶斯深度学习 pytorch sklearn 开发语言
Langchain系列文章目录01-玩转LangChain：从模型调用到Prompt模板与输出解析的完整指南02-玩转LangChainMemory模块：四种记忆类型详解及应用场景全覆盖03-全面掌握LangChain：从核心链条构建到动态任务分配的实战指南04-玩转LangChain：从文档加载到高效问答系统构建的全程实战05-玩转LangChain：深度评估问答系统的三种高效方法（示例生成、手
【机器学习】机器学习工程实战-第2章项目开始前腊肉芥末果机器学习工程实战机器学习人工智能
上一章：第1章概述文章目录2.1机器学习项目的优先级排序2.1.1机器学习的影响2.1.2机器学习的成本2.2估计机器学习项目的复杂度2.2.1未知因素2.2.2简化问题2.2.3非线性进展2.3确定机器学习项目的目标2.3.1模型能做什么2.3.2成功模型的属性2.4构建机器学习团队2.4.1两种文化2.4.2机器学习团队的成员2.5机器学习项目为何失败2.5.1缺乏有经验的人才2.5.2缺乏领
机器学习怎么做特征工程全栈你个大西瓜人工智能机器学习人工智能特征工程数据预处理特征变换特征降维特征构造
一、特征工程通俗解释特征工程就像厨师做菜前的食材处理：原始数据是“生肉和蔬菜”，特征工程是“切块、腌制、调料搭配”，目的是让机器学习模型（食客）更容易消化吸收，做出更好预测（品尝美味）。二、为什么要做特征工程？数据质量差：原始数据常有缺失、噪声、不一致问题（如年龄列混入“未知”）。模型限制：算法无法直接理解原始数据（如文本、日期需要数值化）。提升效果：好特征能显著提升模型性能（准确率提升10%~5
【机器学习】机器学习四大分类藓类少女机器学习机器学习分类人工智能
机器学习的方法主要可以分为四大类，根据学习方式和数据标注情况进行分类：1.监督学习（SupervisedLearning）特点：有标注数据（即训练数据有明确的输入(X)和输出(Y)）。学习目标是找到一个映射(f(X)\approxY)。适用于分类和回归问题。主要算法：分类（Classification）：逻辑回归（LogisticRegression）支持向量机（SVM）朴素贝叶斯（NaïveBa
关于旗正规则引擎中的MD5加密问题何必如此 jsp MD5 规则加密
一般情况下，为了防止个人隐私的泄露，我们都会对用户登录密码进行加密，使数据库相应字段保存的是加密后的字符串，而非原始密码。在旗正规则引擎中，通过外部调用，可以实现MD5的加密，具体步骤如下： 1.在对象库中选择外部调用，选择“com.flagleader.util.MD5”，在子选项中选择“com.flagleader.util.MD5.getMD5ofStr({arg1})”； 2.在规
【Spark101】Scala Promise/Future在Spark中的应用 bit1129 Promise
Promise和Future是Scala用于异步调用并实现结果汇集的并发原语，Scala的Future同JUC里面的Future接口含义相同，Promise理解起来就有些绕。等有时间了再仔细的研究下Promise和Future的语义以及应用场景，具体参见Scala在线文档：http://docs.scala-lang.org/sips/completed/futures-promises.html
spark sql 访问hive数据的配置详解 daizj spark sql hive thriftserver
spark sql 能够通过thriftserver 访问hive数据，默认spark编译的版本是不支持访问hive，因为hive依赖比较多，因此打的包中不包含hive和thriftserver,因此需要自己下载源码进行编译，将hive，thriftserver打包进去才能够访问，详细配置步骤如下： 1、下载源码 2、下载Maven,并配置此配置简单，就略过
HTTP 协议通信周凡杨 java httpclient http 通信
一：简介 HTTPCLIENT，通过JAVA基于HTTP协议进行点与点间的通信！二：代码举例测试类： import java
java unix时间戳转换 g21121 java
把java时间戳转换成unix时间戳： Timestamp appointTime=Timestamp.valueOf(new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(new Date())) SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd hh:m
web报表工具FineReport常用函数的用法总结（报表函数）老A不折腾 web报表 finereport 总结
说明：本次总结中，凡是以tableName或viewName作为参数因子的。函数在调用的时候均按照先从私有数据源中查找，然后再从公有数据源中查找的顺序。 CLASS CLASS(object):返回object对象的所属的类。 CNMONEY CNMONEY(number,unit)返回人民币大写。 number:需要转换的数值型的数。 unit:单位，
java jni调用c++ 代码报错墙头上一根草 java C++jni
# # A fatal error has been detected by the Java Runtime Environment: # # EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00000000777c3290, pid=5632, tid=6656 # # JRE version: Java(TM) SE Ru
Spring中事件处理de小技巧 aijuans spring Spring 教程 Spring 实例 Spring 入门 Spring3
Spring 中提供一些Aware相关de接口，BeanFactoryAware、 ApplicationContextAware、ResourceLoaderAware、ServletContextAware等等，其中最常用到de匙ApplicationContextAware.实现ApplicationContextAwaredeBean，在Bean被初始后，将会被注入 Applicati
linux shell ls脚本样例 annan211 linux linux ls源码 linux 源码
#! /bin/sh - #查找输入文件的路径 #在查找路径下寻找一个或多个原始文件或文件模式 # 查找路径由特定的环境变量所定义 #标准输出所产生的结果通常是查找路径下找到的每个文件的第一个实体的完整路径 # 或是filename :not found 的标准错误输出。 #如果文件没有找到则退出码为0 #否则即为找不到的文件个数 #语法 pathfind [--
List,Set,Map遍历方式 (收集的资源,值得看一下) 百合不是茶 list set Map遍历方式
List特点：元素有放入顺序，元素可重复 Map特点：元素按键值对存储，无放入顺序 Set特点：元素无放入顺序，元素不可重复（注意：元素虽然无放入顺序，但是元素在set中的位置是有该元素的HashCode决定的，其位置其实是固定的） List接口有三个实现类：LinkedList，ArrayList，Vector LinkedList：底层基于链表实现，链表内存是散乱的，每一个元素存储本身
解决SimpleDateFormat的线程不安全问题的方法 bijian1013 java thread 线程安全
在Java项目中，我们通常会自己写一个DateUtil类，处理日期和字符串的转换，如下所示： public class DateUtil01 { private SimpleDateFormat dateformat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); public void format(Date d
http请求测试实例（采用fastjson解析） bijian1013 http 测试
在实际开发中，我们经常会去做http请求的开发，下面则是如何请求的单元测试小实例，仅供参考。 import java.util.HashMap; import java.util.Map; import org.apache.commons.httpclient.HttpClient; import
【RPC框架Hessian三】Hessian 异常处理 bit1129 hessian
RPC异常处理概述 RPC异常处理指是，当客户端调用远端的服务，如果服务执行过程中发生异常，这个异常能否序列到客户端？如果服务在执行过程中可能发生异常，那么在服务接口的声明中，就该声明该接口可能抛出的异常。在Hessian中，服务器端发生异常，可以将异常信息从服务器端序列化到客户端，因为Exception本身是实现了Serializable的
【日志分析】日志分析工具 bit1129 日志分析
1. 网站日志实时分析工具 GoAccess http://www.vpsee.com/2014/02/a-real-time-web-log-analyzer-goaccess/ 2. 通过日志监控并收集 Java 应用程序性能数据(Perf4J) http://www.ibm.com/developerworks/cn/java/j-lo-logforperf/ 3.log.io 和
nginx优化加强战斗力及遇到的坑解决 ronin47 nginx 优化
　　　先说遇到个坑，第一个是负载问题，这个问题与架构有关，由于我设计架构多了两层，结果导致会话负载只转向一个。解决这样的问题思路有两个：一是改变负载策略，二是更改架构设计。　　　由于采用动静分离部署，而nginx又设计了静态，结果客户端去读nginx静态，访问量上来，页面加载很慢。解决：二者留其一。最好是保留apache服务器。　　　来以下优化：　　　
java-50-输入两棵二叉树A和B，判断树B是不是A的子结构 bylijinnan java
思路来自： http://zhedahht.blog.163.com/blog/static/25411174201011445550396/ import ljn.help.*; public class HasSubtree { /**Q50. * 输入两棵二叉树A和B，判断树B是不是A的子结构。例如，下图中的两棵树A和B，由于A中有一部分子树的结构和B是一
mongoDB 备份与恢复开窍的石头 mongDB备份与恢复
Mongodb导出与导入 1: 导入/导出可以操作的是本地的mongodb服务器,也可以是远程的. 所以,都有如下通用选项: -h host 主机 --port port 端口 -u username 用户名 -p passwd 密码 2: mongoexport 导出json格式的文件
[网络与通讯]椭圆轨道计算的一些问题 comsci 网络
如果按照中国古代农历的历法，现在应该是某个季节的开始，但是由于农历历法是3000年前的天文观测数据，如果按照现在的天文学记录来进行修正的话，这个季节已经过去一段时间了。。。。。也就是说，还要再等3000年。才有机会了，太阳系的行星的椭圆轨道受到外来天体的干扰，轨道次序发生了变
软件专利如何申请 cuiyadll 软件专利申请
软件技术可以申请软件著作权以保护软件源代码，也可以申请发明专利以保护软件流程中的步骤执行方式。专利保护的是软件解决问题的思想，而软件著作权保护的是软件代码（即软件思想的表达形式）。例如，离线传送文件，那发明专利保护是如何实现离线传送文件。基于相同的软件思想，但实现离线传送的程序代码有千千万万种，每种代码都可以享有各自的软件著作权。申请一个软件发明专利的代理费大概需要5000-8000申请发明专利可
Android学习笔记 darrenzhu android
1.启动一个AVD 2.命令行运行adb shell可连接到AVD,这也就是命令行客户端 3.如何启动一个程序 am start -n package name/.activityName am start -n com.example.helloworld/.MainActivity 启动Android设置工具的命令如下所示： # am start -
apache虚拟机配置，本地多域名访问本地网站 dcj3sjt126com apache
现在假定你有两个目录，一个存在于 /htdocs/a，另一个存在于 /htdocs/b 。现在你想要在本地测试的时候访问 www.freeman.com 对应的目录是 /xampp/htdocs/freeman ,访问 www.duchengjiu.com 对应的目录是 /htdocs/duchengjiu。 1、首先修改C盘WINDOWS\system32\drivers\etc目录下的
yii2 restful web服务[速率限制] dcj3sjt126com PHP yii2
速率限制为防止滥用，你应该考虑增加速率限制到您的API。例如，您可以限制每个用户的API的使用是在10分钟内最多100次的API调用。如果一个用户同一个时间段内太多的请求被接收，将返回响应状态代码 429 (这意味着过多的请求)。要启用速率限制, [[yii\web\User::identityClass|user identity class]] 应该实现 [[yii\filter
Hadoop2.5.2安装——单机模式 eksliang hadoop hadoop单机部署
转载请出自出处：http://eksliang.iteye.com/blog/2185414 一、概述 Hadoop有三种模式单机模式、伪分布模式和完全分布模式，这里先简单介绍单机模式，默认情况下，Hadoop被配置成一个非分布式模式，独立运行JAVA进程，适合开始做调试工作。二、下载地址 Hadoop 网址http:
LoadMoreListView+SwipeRefreshLayout（分页下拉）基本结构 gundumw100 android
一切为了快速迭代 import java.util.ArrayList; import org.json.JSONObject; import android.animation.ObjectAnimator; import android.os.Bundle; import android.support.v4.widget.SwipeRefreshLayo
三道简单的前端HTML/CSS题目 ini html Web 前端 css 题目
使用CSS为多个网页进行相同风格的布局和外观设置时，为了方便对这些网页进行修改，最好使用（）。http://hovertree.com/shortanswer/bjae/7bd72acca3206862.htm 在HTML中加入<table style=”color:red; font-size:10pt”>，此为（）。http://hovertree.com/s
overrided方法编译错误 kane_xie override
问题描述：在实现类中的某一或某几个Override方法发生编译错误如下： Name clash: The method put(String) of type XXXServiceImpl has the same erasure as put(String) of type XXXService but does not override it 当去掉@Over
Java中使用代理IP获取网址内容（防IP被封，做数据爬虫） mcj8089 免费代理IP 代理IP 数据爬虫 JAVA设置代理IP 爬虫封IP
推荐两个代理IP网站： 1. 全网代理IP：http://proxy.goubanjia.com/ 2. 敲代码免费IP：http://ip.qiaodm.com/ Java语言有两种方式使用代理IP访问网址并获取内容，方式一，设置System系统属性 // 设置代理IP System.getProper
Nodejs Express 报错之 listen EADDRINUSE qiaolevip 每天进步一点点学习永无止境 nodejs 纵观千象
当你启动 nodejs服务报错： >node app Express server listening on port 80 events.js:85 throw er; // Unhandled 'error' event ^ Error: listen EADDRINUSE at exports._errnoException (
C++中三种new的用法 _荆棘鸟_ C++new
转载自：http://news.ccidnet.com/art/32855/20100713/2114025_1.html 作者: mt 其一是new operator，也叫new表达式；其二是operator new，也叫new操作符。这两个英文名称起的也太绝了，很容易搞混，那就记中文名称吧。new表达式比较常见，也最常用，例如： string* ps = new string("
Ruby深入研究笔记1 wudixiaotie Ruby
module是可以定义private方法的 module MTest def aaa puts "aaa" private_method end private def private_method puts "this is private_method" end end