大彤小忆

吴恩达机器学习（十八）—— ex7：K-means Clustering and Principal Component Analysis (MATLAB + Python)

吴恩达机器学习系列内容的学习目录 $\rightarrow$ 吴恩达机器学习系列内容汇总。

一、K-means聚类
- 1.1 实现K-means
- - 1.1.1 找到最近的聚类中心
  - 1.1.2 计算聚类中心均值
- 1.2 K-means用于样本数据集
- 1.3 随机初始化
- 1.4 使用K-means进行图像压缩
- - 1.4.1 K-means用于像素
- 1.5 可选练习：使用自己的图像
二、主成分分析
- 2.1 样本数据集
- 2.2 实现PCA
- 2.3 用PCA减少维数
- - 2.3.1 将数据投影到主成分上
  - 2.3.2 重建数据的近似值
  - 2.3.3 可视化投影
- 2.4 面部图像数据集
- - 2.4.1 PCA用于面部
  - 2.4.2 减少维度
- 2.5可选练习：PCA用于可视化
三、MATLAB实现
- 3.1 ex7.m
- 3.2 ex7_pca.m
四、Python实现
- 4.1 ex7.py
- 4.2 ex7_pca.py

本次练习对应的基础知识总结 $\rightarrow$ 聚类和降维。

本次练习对应的文档说明和提供的MATLAB代码 $\rightarrow$ 提取码：cm24 。

本次练习对应的完整代码实现(MATLAB + Python版本) $\rightarrow$ Github链接。

一、K-means聚类

在此练习中，我们将实现K-means算法并使用它进行图像压缩。我们将首先启动一个样本2D数据集，来帮助我们直观理解K-means算法是如何工作的。之后，使用K-means算法进行图像压缩，通过将图像中出现的颜色数量减少为仅图像中最常见的颜色。我们将在练习中使用ex7.m。

1.1 实现K-means

K-means算法是一种用于将类似数据样本自动聚集的方法。具体地，给出训练集 ${ x^{(1)}，...，x^{(m)}\}$ （其中 $x^{(i)}∈R^{n}$ ），并且希望将数据分组成几个有凝聚力的“簇”。K-means的直观体现是一个迭代过程，它通过猜测初始聚类中心点（centroids）开始，然后重复地将样本分配给最接近的中心，基于分配来重新计算中心点。
K-means算法如下：

% Initialize centroids
centroids = kMeansInitCentroids(X, K);
for iter = 1:iterations
    % Cluster assignment step: Assign each data point to the
    % closest centroid. idx(i) corresponds to cˆ(i), the index
    % of the centroid assigned to example i
    idx = findClosestCentroids(X, centroids);
    % Move centroid step: Compute means based on centroid
    % assignments
    centroids = computeMeans(X, idx, K);
end

算法的内部循环反复执行两个步骤：(i) 将每个训练样本 $x^{(i)}$ 分配给其最接近的聚类中心；(ii) 使用分配给它的点重新计算每个中心点的平均值。对于聚类中心，K-means算法总是收敛到最终的均值集。请注意，融合解决方案可能并不总是理想的，并且取决于聚类中心的初始设置。因此，实际上，K-means算法通常用不同的随机初始化运行几次，从不同的随机初始化中选择这些不同解决方案的一种方法是选择具有最低代价函数值（失真）的解决方案。
我们将在下一节中分开实现K-means算法的两个阶段。

1.1.1 找到最近的聚类中心

在K-means算法的“簇分配”阶段找到最近的聚类中心，算法把每个训练样本 $x^{(i)}$ 分配给其最接近的聚类中心，给出当前聚类中心位置。具体来说，对于每个样本 $i$
$c^{(i)} := j\ _{}\ _{}\ _{}\ _{}that\ _{}minimizes\ _{}\left \| x^{(i)}-\mu _{i} \right \|^{2}$

其中 $c^{(i)}$ 是最接近 $x^{(i)}$ 的聚类中心的索引，以及 $\mu _{i}$ 是第 $j$ 个聚类中心的位置（值）。请注意， $c^{(i)}$ 对应于开始代码中的idx(i)。
我们的任务是在findClosestCentroids.m中填写代码。此函数采用数据矩阵 $X$ 和 $c e n t r o i d s$ 内所有聚类中心的位置，并应输出一维数组 $i d x$ ，该数组包含每个训练样本的最近聚类中心的索引（值在 ${1，…，K}$ 中，其中 $K$ 是聚类中心总数）。
我们可以在每个训练样本和每个聚类中心上使用循环来实现此操作。
完成findClosestCentroids.m需要填写以下代码：

for i = 1:size(X,1)
  dist = pdist([X(i,:);centroids]);%D=pdist(x) 计算m*n的数据矩阵中对象之间的欧几里得距离
  dist =dist(:,1:K);%得到每一个样本到所有聚类中心的距离
  [row, col] = find(dist == min(dist));
  idx(i) = col(1);
end

完成findClosestCentroids.m中的代码后，脚本ex7.m将运行代码，我们应该看到与分配给前3个样本的聚类中心对应的输出[1 3 2]。

Finding closest centroids.

Closest centroids for the first 3 examples: 
 1 3 2
(the closest centroids should be 1, 3, 2 respectively)

1.1.2 计算聚类中心均值

给每个点分配一个聚类中心，算法重新计算的第二阶段，将各点的平均值分配给每个聚类中心。具体来说，对于我们的每个聚类中心 $k$
$\mu _{k}:=\frac{1}{\left | C_{k} \right |}\sum _{i\in C_{k}}x^{(i)}$

其中 $C_{k}$ 为分配给聚类中心 $k$ 的样本集。具体地，如果两个样本 $x^{(3)}$ 和 $x^{(5)}$ 被分配给聚类中心 $k = 2$ ，则应更新 $μ_{2}= \frac{1}{2}(x^{(3)}+ x^{(5)})$ 。
我们现在应该在Computecentroids.m中填写代码。我们可以使用一个循环遍历聚类中心来实现此函数，还可以使用循环遍历样本。但是如果我们使用向量化实现而不是循环实现，则代码可能运行地更快。
完成Computecentroids.m需要填写以下代码：

for i = 1:K
  centroids(i,:) = mean(X(find(idx == i),:));
end

完成Computecentroids中的代码后，脚本ex7.m将运行代码并在K-means的第一步之后输出聚类中心。

Computing centroids means.

Centroids computed after initial finding of closest centroids: 
 2.428301 3.157924 
 5.813503 2.633656 
 7.119387 3.616684 

(the centroids should be
   [ 2.428301 3.157924 ]
   [ 5.813503 2.633656 ]
   [ 7.119387 3.616684 ]

1.2 K-means用于样本数据集

实现函数findClosestCentroids和computeCentroids之后，ex7.m脚本的下一步将在玩具2D数据集上运行K-means算法，以帮助我们了解K-means是如何工作的。我们的函数从runKmeans.m脚本内部调用，可以查看函数以了解它的工作原理。请注意，代码调用我们在循环中实现的两个函数。
运行下一步时，K-means代码将产生可视化，在每次迭代时通过算法的进展来实现可视化。多次按enter键以查看K-means算法的每个步骤如何更改聚类中心和簇分配。最后，我们应该得到图1中所示的图。

图1 期望输出

运行过程输出如下图所示。

Running K-Means clustering on example dataset.

K-Means iteration 1/10...
Press enter to continue.
K-Means iteration 2/10...
Press enter to continue.
K-Means iteration 3/10...
Press enter to continue.
K-Means iteration 4/10...
Press enter to continue.
K-Means iteration 5/10...
Press enter to continue.
K-Means iteration 6/10...
Press enter to continue.
K-Means iteration 7/10...
Press enter to continue.
K-Means iteration 8/10...
Press enter to continue.
K-Means iteration 9/10...
Press enter to continue.
K-Means iteration 10/10...
Press enter to continue.

K-Means Done.

1.3 随机初始化

在ex7.m中样本数据集的聚类中心的初始分配是设计的，使得我们将看到与图1相同的图像。实际中，好的初始化聚类中心的策略是从训练集中选择随机样本。
在这部分练习中，我们应该用以下代码填写函数 kMeansInitCentroids.m：

% Initialize the centroids to be random examples
% Randomly reorder the indices of examples
randidx = randperm(size(X, 1));
% Take the first K examples as centroids
centroids = X(randidx(1:K), :);

上面的代码首先对样本的索引进行随机排列（使用randperm）。然后，它基于索引的随机排列选择前 $K$ 个样本。这允许随意选择要选择的样本，而不会出现两次选择相同样本的风险。

1.4 使用K-means进行图像压缩

在本练习中，我们将应用K-means进行图像压缩。在图像直接的24位颜色表示中，每个像素表示为三个8位无符号整数（范围为0到255），其指定红色、绿色和蓝色强度值，该编码通常被称为RGB编码。我们的图像包含数千种颜色，在这部分练习中，我们将把颜色的数量减少到16种。
通过这种减少，可以以有效的方式表示（压缩）照片。具体地说，我们只需要存储16种选定颜色的RGB值，对于图像中的每个像素，我们现在只需要存储该位置的颜色索引（其中仅需要4位来表示16种可能性）。
在本练习中，我们将使用K-means算法选择将用于表示压缩图像的16种颜色。具体地，我们将把原始图像中的每个像素视为数据样本，并使用K-means算法找到在3维RGB空间中最佳组（簇）像素的16种颜色。一旦我们计算出图像上的聚类中心后，我们将使用16种颜色替换原始图像中的像素。

1.4.1 K-means用于像素

在MATLAB中，可以按如下方式读取图像：

% Load 128x128 color image (bird small.png)
A = imread('bird small.png');
% You will need to have installed the image package to used
% imread. If you do not have the image package installed, you
% should instead change the following line to
%
% load('bird small.mat'); % Loads the image into the variable A

这将创建三维矩阵A，其前两个索引表示像素位置，最后索引表示红色、绿色或蓝色。例如，A(50,33,3)给出了50行、33列的像素的蓝色强度。
ex7.m中的代码首先加载图像，然后重新创建像素颜色的 $m \times 3$ 矩阵（其中 $m = 16384 = 128 \times 128$ ），并调用K-means函数。
原始的128x128图像如图2所示。

图2 原始的128x128图像

在查找前 $K = 16$ 种颜色以表示图像后，我们现在可以使用findClosestCentroids函数分配每个像素位置到最近的聚类中心。这允许我们使用每个像素的聚类中心分配来表示原始图像。请注意，我们已显着减少描述图像所需的字节数。对于128×128像素位置中的每一个，所需的原始图像需要24位，使得总尺寸为128×128×24 = 393,216位。新的表示需要以16种颜色字典的形式进行一些开销存储，每个形式需要24位，但图像本身只需要每像素位置4位。所使用的最终字节数为16×24 + 128×128×4 = 65,920位，其对应于将原始图像压缩大约6倍。
最后，我们可以通过仅基于聚类中心分配的重建图像来查看压缩的影响。具体来说，我们可以用分配给每个像素位置的聚类中心的平均值替换每个像素位置。图3显示了我们获得的重建。即使所得到的图像保留了原始的大部分特征，我们也看到了一些压缩损伤。

图3 原始和重建图像（使用K-means压缩图像）

1.5 可选练习：使用自己的图像

在本练习中使用自己的图像，修改提供的代码以在自己的图像上运行。请注意，如果自己的图像非常大，则K-means可能需要很长时间才能运行。因此，建议在运行代码之前将图像的大小调整为可管理的大小，也可以尝试改变 $K$ 来查看对压缩的影响。

二、主成分分析

在本练习中，我们将使用主成分分析（PCA）来实现维数减少。我们将首先尝试使用样本2D数据集以获得PCA如何工作的直观理解，然后在5000个面部图像的更大数据集上使用它。
提供的脚本ex7_pca.m将帮助我们开始练习的前半部分。

2.1 样本数据集

为了从直观上了解PCA的工作原理，首先从2D数据集开始，该数据集具有一个大的变化方向和一个较小的变化。脚本ex7_pca.m将绘制训练数据（图4）。在这一部分练习中，我们将可视化在使用PCA将数据从2D降至1D时发生的事情。实际中，我们可能希望将数据从256降至50维，但是在该样本中使用较低维的数据允许我们更好地可视化算法。

图4 样本数据集1

2.2 实现PCA

在这部分练习中，我们将实现PCA。 PCA由两个计算步骤组成：首先，计算数据的协方差矩阵；然后，使用MATLAB的svd函数来计算特征向量 $U_{1},U_{2},...U_{n}$ 。这些将对应于数据变化的主成分。
在使用PCA之前，重要的是首先通过从数据集中减去每个特征的平均值来归一化数据，并缩放每个维度，使它们处于同一范围内。在提供的脚本ex7_pca.m中，使用featureNormalize函数对特征进行了归一化。
归一化数据后，我们可以运行PCA来计算主成分。我们的任务是在pca.m中完成代码以计算数据集的主成分。首先，我们应该计算数据的协方差矩阵：
$\Sigma =\frac{1}{m}X^{T}X$

其中 $X$ 是行中样本的数据矩阵，而 $m$ 是样本的数量。注意， $Σ$ 是 $n \times n$ 矩阵而不是求和操作。
计算协方差矩阵后，我们可以运行SVD来计算主成分。在MATLAB中，我们可以使用以下命令运行SVD：[U，S，V] = svd（Sigma），其中U将包含主成分，S将包含对角线矩阵。
完成pca.m需要填写以下代码：

Sigma = X'*X./m;
[U,S,~] = svd(Sigma);

完成pca.m后，ex7_pca.m脚本将在样本数据集上运行PCA，并绘制找到的相应主成分（图5）。该脚本还将输出找到的顶部主成分（特征向量），并且我们应该期望看到大约[-0.707 -0.707]的输出。

图5 计算的数据集的特征向量

Running PCA on example dataset.

Top eigenvector: 
 U(:,1) = -0.707107 -0.707107 

(you should expect to see -0.707107 -0.707107)

2.3 用PCA减少维数

计算主成分后，我们可以使用它们来减少数据集的特征维度，通过将每个样本投影到较低维度空间， $x^{(i)}→z^{(i)}$ （例如，从2D到1D投影数据）。在练习的这一部分中，我们将使用PCA返回的特征向量，并将样本数据集投影到1维空间。
实际中，如果我们使用的是线性回归或神经网络等学习算法，我们现在可以使用投影数据而不是原始数据。通过使用投影数据，我们可以更快地训练模型，因为输入中的尺寸较小。

2.3.1 将数据投影到主成分上

我们现在应该在projectData.m中填写代码。具体地，给定数据集 $X$ ，主成分 $U$ 和维度减少到 $K$ 所需的数量。我们应该将 $X$ 中的每个样本投射到 $U$ 的前 $K$ 个成分上。注意，通过 $U$ 的前 $K$ 列给出了 $U$ 的前 $K$ 个成分，U_reduce = U(:, 1:K)。
完成projectData.m需要填写以下代码：

Z = X*U(:,1:K);

完成projectData.m中的代码后，ex7_pca.m将第一个样本投影到第一个维度上，我们应该看到约1.481的值（或者可能为-1.481，如果我们用 $U_{1}$ 代替 $U_{1}$ ）。

Dimension reduction on example dataset.

Projection of the first example: 1.481274

(this value should be about 1.481274)

2.3.2 重建数据的近似值

将数据投影到较低维度空间后，我们可以通过将它们投影出回原始的高维空间来近似恢复数据。我们的任务是完成recoverData.m以将Z的每个样本投影回原始空间，并返回X_rec中的恢复近似值。
完成recoverData.m需要填写以下代码：

X_rec = Z*U(:,1:K)';

完成recoverData.m中的代码后，ex7_pca.m将恢复第一个样本的近似值，我们应该看到约[-1.047 -1.047]的值。

Approximation of the first example: -1.047419 -1.047419

(this value should be about  -1.047419 -1.047419)

2.3.3 可视化投影

完成函数projectData和recoverData后，ex7_pca.m现在将实现投影和近似重建以显示投影如何影响数据。在图6中，原始数据点用蓝圆圈表示，而投影的数据点用红色圆圈表示。投影只有效地保留了U1给出的方向上的信息。

图6 PCA后的归一化和投影数据

2.4 面部图像数据集

在练习的这一部分中，我们将在脸部图像上运行PCA，以了解如何在实际中使用维度减少。数据集ex7faces.mat包含32×32的灰度脸部图像的数据集 $X$ 。每行X对应于一个面部图像（长度为1024的行向量）。
ex7_pca.m的下一步将加载和可视化这些面部图像的前100个（图7）。

图7 面部数据集

2.4.1 PCA用于面部

要在面部数据集上运行PCA，我们首先通过从数据矩阵 $X$ 中减去每个特征的平均值来归一化数据集。脚本ex7_pca.m将为我们执行此操作，然后运行PCA代码。运行PCA后，我们将获得数据集的主成分。请注意，U（每行）中的每个主成分是长度为 $n$ 的向量（用于面部数据集， $n = 1024$ ）。事实证明，我们可以通过将它们中的每一个重塑为32×32矩阵来可视化这些主成分，该矩阵对应于原始数据集中的像素。脚本ex7_pca.m显示描述最大变化的前36个主成分（图8）。如果需要，我们还可以更改代码以显示更多主成分，以查看它们如何捕获越来越多的详细信息。

图8 面部数据集上的主成分

2.4.2 减少维度

既然已经计算了面部数据集的主成分，可以使用它来减少面部数据集的维度。这允许我们使用较小的输入大小（例如，100维）而不是原始的1024维的学习算法，有助于加快学习算法。
ex7_pca.m的下一部分将对面部数据集投影到前100主成分上。具体地，现在通过向量 $Z^{(i)}∈R^{100}$ 来描述每个面部图像。
要了解维度降低中丢失的内容，可以使用投影数据集来恢复数据。在ex7_pca.m中，实现数据的近似恢复，并且原始和投影的面部图像并排显示（图9）。从重建来看，我们可以观察到虽然细节丢失，但脸部的一般结构和外观依然在。在数据集大小中，这是一个显着的减少（超过10×），可以帮助显着加快学习算法。例如，如果我们正在训练一个神经网络以实现人员识别（给定一张面部图像，预测人的身份），则可以使用减少到仅100维的维度而不是原始像素。

图9 面部的原始图像和通过前100个主成分重建的面部的图像

2.5可选练习：PCA用于可视化

在第一节中的K-means图像压缩练习中，已经在三维RGB空间中使用了K-means算法。在ex7_pca.m脚本的最后一部分中，我们使用scatter3函数（三维散点图函数）可视化此3D空间中的最终像素分配，每个数据点根据已分配给它的集群进行着色，如图10所示。我们可以将鼠标拖到图上以旋转并在3个维度中检查此数据。

图10 在3维中的原始数据

事实证明，3维或更高维的可视化数据集可能是麻烦的。因此，即使以丢失某些信息为代价，通常也仅希望在2维中显示数据。实际中，PCA通常为了可视化的目的，用于减少数据的维度。在ex7_pca.m的下一部分中，脚本将应用PCA的实现使3维数据减少到2维，并在2维散点图中可视化结果，如图11所示。 PCA投影可以被认为是选择最大化数据传播视图的旋转，这通常对应于“最佳”视图。

图11 使用PCA产生的2维可视化

三、MATLAB实现

3.1 ex7.m

%% Machine Learning Online Class
%  Exercise 7 | Principle Component Analysis and K-Means Clustering
%
%  Instructions
%  ------------
%
%  This file contains code that helps you get started on the
%  exercise. You will need to complete the following functions:
%
%     pca.m
%     projectData.m
%     recoverData.m
%     computeCentroids.m
%     findClosestCentroids.m
%     kMeansInitCentroids.m
%
%  For this exercise, you will not need to change any code in this file,
%  or any other files other than those mentioned above.
%

%% Initialization
clear ; close all; clc

%% ================= Part 1: Find Closest Centroids ====================
%  To help you implement K-Means, we have divided the learning algorithm 
%  into two functions -- findClosestCentroids and computeCentroids. In this
%  part, you should complete the code in the findClosestCentroids function. 
%
fprintf('Finding closest centroids.\n\n');

% Load an example dataset that we will be using
load('ex7data2.mat');

% Select an initial set of centroids
K = 3; % 3 Centroids
initial_centroids = [3 3; 6 2; 8 5];

% Find the closest centroids for the examples using the
% initial_centroids
idx = findClosestCentroids(X, initial_centroids);%findClosestCentroids()计算每个样本的聚类中心

fprintf('Closest centroids for the first 3 examples: \n')
fprintf(' %d', idx(1:3));
fprintf('\n(the closest centroids should be 1, 3, 2 respectively)\n');

fprintf('Program paused. Press enter to continue.\n');
pause;

%% ===================== Part 2: Compute Means =========================
%  After implementing the closest centroids function, you should now
%  complete the computeCentroids function.
%
fprintf('\nComputing centroids means.\n\n');

%  Compute means based on the closest centroids found in the previous part.
centroids = computeCentroids(X, idx, K);%computeCentroids()通过计算分配给每个质心的数据点的平均值，返回新质心

fprintf('Centroids computed after initial finding of closest centroids: \n')
fprintf(' %f %f \n' , centroids');
fprintf('\n(the centroids should be\n');
fprintf('   [ 2.428301 3.157924 ]\n');
fprintf('   [ 5.813503 2.633656 ]\n');
fprintf('   [ 7.119387 3.616684 ]\n\n');

fprintf('Program paused. Press enter to continue.\n');
pause;


%% =================== Part 3: K-Means Clustering ======================
%  After you have completed the two functions computeCentroids and
%  findClosestCentroids, you have all the necessary pieces to run the
%  kMeans algorithm. In this part, you will run the K-Means algorithm on
%  the example dataset we have provided. 
%
fprintf('\nRunning K-Means clustering on example dataset.\n\n');

% Load an example dataset
load('ex7data2.mat');

% Settings for running K-Means
K = 3;
max_iters = 10;

% For consistency, here we set centroids to specific values
% but in practice you want to generate them automatically, such as by
% settings them to be random examples (as can be seen in
% kMeansInitCentroids).
initial_centroids = [3 3; 6 2; 8 5];

% Run K-Means algorithm. The 'true' at the end tells our function to plot the progress of K-Means
[centroids, idx] = runkMeans(X, initial_centroids, max_iters, true);%runkMeans()在数据矩阵X上运行K-Means算法，其中X的每一行是一个样本
fprintf('\nK-Means Done.\n\n');

fprintf('Program paused. Press enter to continue.\n');
pause;

%% ============= Part 4: K-Means Clustering on Pixels ===============
%  In this exercise, you will use K-Means to compress an image. To do this,
%  you will first run K-Means on the colors of the pixels in the image and
%  then you will map each pixel onto its closest centroid.
%  
%  You should now complete the code in kMeansInitCentroids.m
%

fprintf('\nRunning K-Means clustering on pixels from an image.\n\n');

%  Load an image of a bird
A = double(imread('bird_small.png'));%处理图像像素点数据，matlab读入图像的数据是uint8，而数值一般采用double型（64位）存储和运算，所以要先将图像转为double格式的才能运算

% If imread does not work for you, you can try instead
%   load ('bird_small.mat');

A = A / 255; % Divide by 255 so that all values are in the range 0 - 1

% Size of the image
img_size = size(A);

% Reshape the image into an Nx3 matrix where N = number of pixels.
% Each row will contain the Red, Green and Blue pixel values
% This gives us our dataset matrix X that we will use K-Means on.
X = reshape(A, img_size(1) * img_size(2), 3);%X为128x128x3

% Run your K-Means algorithm on this data
% You should try different values of K and max_iters here
K = 16; 
max_iters = 10;

% When using K-Means, it is important the initialize the centroids randomly. 
% You should complete the code in kMeansInitCentroids.m before proceeding
initial_centroids = kMeansInitCentroids(X, K);

% Run K-Means
[centroids, idx] = runkMeans(X, initial_centroids, max_iters);

fprintf('Program paused. Press enter to continue.\n');
pause;


%% ================= Part 5: Image Compression ======================
%  In this part of the exercise, you will use the clusters of K-Means to
%  compress an image. To do this, we first find the closest clusters for
%  each example. After that, we 

fprintf('\nApplying K-Means to compress an image.\n\n');

% Find closest cluster members
idx = findClosestCentroids(X, centroids);

% Essentially, now we have represented the image X as in terms of the indices in idx. 

% We can now recover the image from the indices (idx) by mapping each pixel
% (specified by its index in idx) to the centroid value
%我们现在可以从索引（idx）中恢复图像，方法是将每个像素（由idx中的索引指定）映射到质心值
X_recovered = centroids(idx,:);

% Reshape the recovered image into proper dimensions
X_recovered = reshape(X_recovered, img_size(1), img_size(2), 3);

% Display the original image 
subplot(1, 2, 1);
imagesc(A); %imagesc(A) 将矩阵A中的元素数值按大小转化为不同颜色，并在坐标轴对应位置处以这种颜色染色
title('Original');

% Display compressed image side by side
subplot(1, 2, 2);
imagesc(X_recovered)
title(sprintf('Compressed, with %d colors.', K));

3.2 ex7_pca.m

%% Machine Learning Online Class
%  Exercise 7 | Principle Component Analysis and K-Means Clustering
%
%  Instructions
%  ------------
%
%  This file contains code that helps you get started on the
%  exercise. You will need to complete the following functions:
%
%     pca.m
%     projectData.m
%     recoverData.m
%     computeCentroids.m
%     findClosestCentroids.m
%     kMeansInitCentroids.m
%
%  For this exercise, you will not need to change any code in this file,
%  or any other files other than those mentioned above.
%

%% Initialization
clear ; close all; clc

%% ================== Part 1: Load Example Dataset  ===================
%  We start this exercise by using a small dataset that is easily to
%  visualize
%
fprintf('Visualizing example dataset for PCA.\n\n');

%  The following command loads the dataset. You should now have the 
%  variable X in your environment
load ('ex7data1.mat');

%  Visualize the example dataset
plot(X(:, 1), X(:, 2), 'bo');
axis([0.5 6.5 2 8]); axis square;

fprintf('Program paused. Press enter to continue.\n');
pause;


%% =============== Part 2: Principal Component Analysis ===============
%  You should now implement PCA, a dimension reduction technique. You
%  should complete the code in pca.m
%
fprintf('\nRunning PCA on example dataset.\n\n');

%  Before running PCA, it is important to first normalize X
[X_norm, mu, sigma] = featureNormalize(X);

%  Run PCA
[U, S] = pca(X_norm);%pca()在数据集X上运行主成分分析

%  Compute mu, the mean of the each feature

%  Draw the eigenvectors centered at mean of data. These lines show the
%  directions of maximum variations in the dataset.
hold on;
drawLine(mu, mu + 1.5 * S(1,1) * U(:,1)', '-k', 'LineWidth', 2);
drawLine(mu, mu + 1.5 * S(2,2) * U(:,2)', '-k', 'LineWidth', 2);
hold off;

fprintf('Top eigenvector: \n');
fprintf(' U(:,1) = %f %f \n', U(1,1), U(2,1));
fprintf('\n(you should expect to see -0.707107 -0.707107)\n');

fprintf('Program paused. Press enter to continue.\n');
pause;


%% =================== Part 3: Dimension Reduction ===================
%  You should now implement the projection step to map the data onto the 
%  first k eigenvectors. The code will then plot the data in this reduced 
%  dimensional space.  This will show you what the data looks like when 
%  using only the corresponding eigenvectors to reconstruct it.
%
%  You should complete the code in projectData.m
%
fprintf('\nDimension reduction on example dataset.\n\n');

%  Plot the normalized dataset (returned from pca)
plot(X_norm(:, 1), X_norm(:, 2), 'bo');
axis([-4 3 -4 3]); axis square

%  Project the data onto K = 1 dimension
K = 1;
Z = projectData(X_norm, U, K);
fprintf('Projection of the first example: %f\n', Z(1));
fprintf('\n(this value should be about 1.481274)\n\n');

X_rec  = recoverData(Z, U, K);
fprintf('Approximation of the first example: %f %f\n', X_rec(1, 1), X_rec(1, 2));
fprintf('\n(this value should be about  -1.047419 -1.047419)\n\n');

%  Draw lines connecting the projected points to the original points
hold on;
plot(X_rec(:, 1), X_rec(:, 2), 'ro');
for i = 1:size(X_norm, 1)
    drawLine(X_norm(i,:), X_rec(i,:), '--k', 'LineWidth', 1);
end
hold off

fprintf('Program paused. Press enter to continue.\n');
pause;

%% =============== Part 4: Loading and Visualizing Face Data =============
%  We start the exercise by first loading and visualizing the dataset.
%  The following code will load the dataset into your environment
%
fprintf('\nLoading face dataset.\n\n');

%  Load Face dataset
load ('ex7faces.mat')

%  Display the first 100 faces in the dataset
displayData(X(1:100, :));

fprintf('Program paused. Press enter to continue.\n');
pause;

%% =========== Part 5: PCA on Face Data: Eigenfaces  ===================
%  Run PCA and visualize the eigenvectors which are in this case eigenfaces
%  We display the first 36 eigenfaces.
%
fprintf(['\nRunning PCA on face dataset.\n' ...
         '(this might take a minute or two ...)\n\n']);

%  Before running PCA, it is important to first normalize X by subtracting 
%  the mean value from each feature
[X_norm, mu, sigma] = featureNormalize(X);

%  Run PCA
[U, S] = pca(X_norm);

%  Visualize the top 36 eigenvectors found
displayData(U(:, 1:36)');%  绘出前36个特征向量

fprintf('Program paused. Press enter to continue.\n');
pause;


%% ============= Part 6: Dimension Reduction for Faces =================
%  Project images to the eigen space using the top k eigenvectors 
%  If you are applying a machine learning algorithm 
fprintf('\nDimension reduction for face dataset.\n\n');

K = 100;%将面部数据集投影到前100个中心成分上
Z = projectData(X_norm, U, K);

fprintf('The projected data Z has a size of: ')
fprintf('%d ', size(Z));

fprintf('\n\nProgram paused. Press enter to continue.\n');
pause;

%% ==== Part 7: Visualization of Faces after PCA Dimension Reduction ====
%  Project images to the eigen space using the top K eigen vectors and 
%  visualize only using those K dimensions
%  Compare to the original input, which is also displayed

fprintf('\nVisualizing the projected (reduced dimension) faces.\n\n');

K = 100;
X_rec  = recoverData(Z, U, K);

% Display normalized data
subplot(1, 2, 1);
displayData(X_norm(1:100,:));
title('Original faces');
axis square;

% Display reconstructed data from only k eigenfaces
subplot(1, 2, 2);
displayData(X_rec(1:100,:));
title('Recovered faces');
axis square;

fprintf('Program paused. Press enter to continue.\n');
pause;


%% === Part 8(a): Optional (ungraded) Exercise: PCA for Visualization ===
%  One useful application of PCA is to use it to visualize high-dimensional
%  data. In the last K-Means exercise you ran K-Means on 3-dimensional 
%  pixel colors of an image. We first visualize this output in 3D, and then
%  apply PCA to obtain a visualization in 2D.

close all; close all; clc

% Reload the image from the previous exercise and run K-Means on it
% For this to work, you need to complete the K-Means assignment first
A = double(imread('bird_small.png'));

% If imread does not work for you, you can try instead
%   load ('bird_small.mat');

A = A / 255;
img_size = size(A);
X = reshape(A, img_size(1) * img_size(2), 3);
K = 16; 
max_iters = 10;
initial_centroids = kMeansInitCentroids(X, K);
[centroids, idx] = runkMeans(X, initial_centroids, max_iters);

%  Sample 1000 random indexes (since working with all the data is
%  too expensive. If you have a fast computer, you may increase this.
sel = floor(rand(1000, 1) * size(X, 1)) + 1;

%  Setup Color Palette
palette = hsv(K);%c = hsv(m) 返回包含 m 种颜色的颜色图 palette为16x3
colors = palette(idx(sel), :);

%  Visualize the data and centroid memberships in 3D
figure;
scatter3(X(sel, 1), X(sel, 2), X(sel, 3), 10, colors);%scatter3函数:三维散点图函数
title('Pixel dataset plotted in 3D. Color shows centroid memberships');
fprintf('Program paused. Press enter to continue.\n');
pause;

%% === Part 8(b): Optional (ungraded) Exercise: PCA for Visualization ===
% Use PCA to project this cloud to 2D for visualization

% Subtract the mean to use PCA
[X_norm, mu, sigma] = featureNormalize(X);

% PCA and project the data to 2D
[U, S] = pca(X_norm);
Z = projectData(X_norm, U, 2);%projectData()在X中绘制数据点，并将它们着色，以便在idx中具有相同的index赋值的那些点具有相同的颜色

% Plot in 2D
figure;
plotDataPoints(Z(sel, :), idx(sel), K);
title('Pixel dataset plotted in 2D, using PCA for dimensionality reduction');

四、Python实现

4.1 ex7.py

import numpy as np
import matplotlib.pylab as plt
import scipy.io as sio
import matplotlib.colors as pltcolor


# ================= Part 1: Find Closest Centroids ====================
# 找出最邻近点
def findCloseCenter(x, center):
    k, l = np.shape(center)
    xtemp = np.tile(x, k)#Numpy的 tile() 函数,就是将原矩阵横向(x倍)、纵向地复制(k倍)
    centertemp = center.flatten()
    xtemp = np.power(xtemp-centertemp, 2)
    xt = np.zeros((np.size(xtemp, 0), k))
    for i in range(k):
        for j in range(l):
            xt[:, i] = xt[:, i]+xtemp[:, i*l+j]
    idx = np.argmin(xt, 1)+1
    return idx


print('Finding closest centroids.')
datainfo = sio.loadmat('ex7data2.mat')
X = datainfo['X']

K = 3
init_center = np.array([[3, 3], [6, 2], [8, 5]])
idX = findCloseCenter(X, init_center)
print('Closest centroids for the first 3 examples: ', idX[0:3])
print('(the closest centroids should be 1, 3, 2 respectively)')
_ = input('Press [Enter] to continue.')

# ===================== Part 2: Compute Means =========================
# 找出中心点
def computeCenter(x, idx, k):
    m, n = np.shape(x)
    center = np.zeros((k, n))
    for i in range(k):
        pos = np.where(idx == i+1)
        center[i, :] = np.sum(x[pos], 0)/np.size(x[pos], 0)
    return center

print('Computing centroids means.')
center = computeCenter(X, idX, K)
print('Centroids computed after initial finding of closest centroids: ')
print(center)
print('the centroids should be: ')
print('[[ 2.428301 3.157924 ], [ 5.813503 2.633656 ], [ 7.119387 3.616684 ]]')
_ = input('Press [Enter] to continue.')

# =================== Part 3: K-Means Clustering ======================
# 中心点连线
def drawLine(p1, p2):
    x = np.array([p1[0], p2[0]])
    y = np.array([p1[1], p2[1]])
    plt.plot(x, y)

# 绘制数据点
def plotDataPoints(x, idx, k):
    colors = ['red', 'green', 'blue']
    plt.scatter(x[:, 0], x[:, 1], c=idx, cmap=pltcolor.ListedColormap(colors), s=40)

# 绘制中心点
def plotProgresskMeans(x, center, previous, idx, k, i):
    plotDataPoints(x, idx, k)
    plt.plot(center[:, 0], center[:, 1], 'x', ms=10, mew=1)
    for j in range(np.size(center, 0)):
        drawLine(center[j, :], previous[j, :])
    plt.title('Iteration number %d' % (i+1))



# k均值聚类
def runkMeans(x, init_center, max_iter, plot_progress=False):
    m, n = np.shape(x)
    k = np.size(init_center, 0)
    center = init_center
    previous_center = center
    idx = np.zeros((m,))

    if plot_progress:
        plt.ion()
        fig = plt.figure()

    for i in range(max_iter):
        print('K-Means iteration %d/%d...' % (i+1, max_iter))
        idx = findCloseCenter(x, center)
        if plot_progress:
            plotProgresskMeans(x, center, previous_center, idx, k, i)
            previous_center = center
            fig.canvas.draw()
            _ = input('Press [Enter] to continue.')
        center = computeCenter(x, idx, k)
    plt.show(block=True)
    plt.ioff()
    return center, idx

print('Running K-Means clustering on example dataset.')
max_iter = 10
K = 3
init_center = np.array([[3, 3], [6, 2], [8, 5]])
center, idx = runkMeans(X, init_center, max_iter, True)
print('K-Means Done.')
_ = input('Press [Enter] to continue.')

# ============= Part 4: K-Means Clustering on Pixels ===============
# 生成初始点
def kMeansInitCenter(x, k):
    randidx = np.random.permutation(np.size(x, 0))
    center = x[randidx[0: k], :]
    return center

print('Running K-Means clustering on pixels from an image.')
A = plt.imread('bird_small.png')
m, n, l = np.shape(A)
A_x = np.reshape(A, (m*n, l))
A_k = 16
A_max_iter = 10
A_init_center = kMeansInitCenter(A_x, A_k)
A_center, A_idx = runkMeans(A_x, A_init_center, A_max_iter)
print(A_center)
_ = input('Press [Enter] to continue.')

# ================= Part 5: Image Compression ======================
print('Applying K-Means to compress an image.')
A_idx = findCloseCenter(A_x, A_center)
X_recovered = A_center[A_idx-1, :]
X_back = X_recovered.reshape(m, n, l)
fig = plt.figure()
ax1 = fig.add_subplot(121)
ax1.imshow(A)
ax1.set_title('Original')
ax2 = fig.add_subplot(122)
ax2.imshow(X_back)
ax2.set_title('Compressed, with %d colors.' % A_k)
plt.show(block=False)

4.2 ex7_pca.py

import numpy as np
import matplotlib.pylab as plt
import scipy.io as sio
import numpy.linalg as la
import math
import matplotlib.cm as cm
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.colors as pltcolor

# ================== Part 1: Load Example Dataset  ===================
print('Visualizing example dataset for PCA.')
datainfo = sio.loadmat('ex7data1.mat')
X = datainfo['X']
plt.plot(X[:, 0], X[:, 1], 'bo')
plt.axis([0.5, 6.5, 2, 8])
plt.axis('equal')
_ = input('Press [Enter] to continue.')

# =============== Part 2: Principal Component Analysis ===============
# 归一化
def featureNormalize(x):
    mu = np.mean(x, 0)
    sigma = np.std(x, 0, ddof=1)
    x_norm = (x-mu)/sigma
    return x_norm, mu, sigma

# pca
def pca(x):
    m, n = np.shape(x)
    sigma = 1/m*x.T.dot(x)
    u, s, _ = la.svd(sigma)
    return u, s

# 两点连线
def drawLine(p1, p2, lc='k-', lwidth=2):
    x = np.array([p1[0], p2[0]])
    y = np.array([p1[1], p2[1]])
    plt.plot(x, y, lc, lw=lwidth)

print('Running PCA on example dataset.')
x_norm, mu, sigma = featureNormalize(X)
u, s = pca(x_norm)
drawLine(mu, mu+1.5*s[0]*u[:, 0])
drawLine(mu, mu+1.5*s[1]*u[:, 1])
plt.show()

print('Top eigenvector: ')
print(' U(:,1) = %f %f ' %(u[0, 0], u[1, 0]))
print('(you should expect to see -0.707107 -0.707107)')
_ = input('Press [Enter] to continue.')

# =================== Part 3: Dimension Reduction ===================
# 映射数据
def projectData(x, u, k):
    z = x.dot(u[:, 0:k])
    return z

# 数据复原
def recoverData(z, u, k):
    x_rec = np.asmatrix(z).dot(u[:, 0:k].T)
    return np.asarray(x_rec)

print('Dimension reduction on example dataset.')
plt.plot(x_norm[:, 0], x_norm[:, 1], 'bo')
plt.axis([-4, 3, -4, 3])
plt.axis('equal')

k = 1
z = projectData(x_norm, u, k)
print('Projection of the first example: ', z[0])
print('(this value should be about 1.481274)')
x_rec = recoverData(z, u, k)
print('Approximation of the first example: %f %f' % (x_rec[0, 0], x_rec[0, 1]))
plt.plot(x_rec[:, 0], x_rec[:, 1], 'ro')
for i in range(np.size(x_norm, 0)):
    drawLine(x_norm[i, :], x_rec[i, :], 'k--', 1)
plt.show()
_ = input('Press [Enter] to continue.')

# =============== Part 4: Loading and Visualizing Face Data =============
# 显示数据
def displayData(x):
    width = round(math.sqrt(np.size(x, 1)))
    m, n = np.shape(x)
    height = int(n/width)
    # 显示图像的数量
    drows = math.floor(math.sqrt(m))
    dcols = math.ceil(m/drows)

    pad = 1
    # 建立一个空白“背景布”
    darray = -1*np.ones((pad+drows*(height+pad), pad+dcols*(width+pad)))

    curr_ex = 0
    for j in range(drows):
        for i in range(dcols):
            if curr_ex >= m:
                break
            max_val = np.max(np.abs(x[curr_ex, :]))
            darray[pad+j*(height+pad):pad+j*(height+pad)+height, pad+i*(width+pad):pad+i*(width+pad)+width]\
                = x[curr_ex, :].reshape((height, width))/max_val
            curr_ex += 1
        if curr_ex >= m:
            break

    plt.imshow(darray.T, cmap='gray')


print('Loading face dataset.')
datainfo = sio.loadmat('ex7faces.mat')
X = datainfo['X']
displayData(X[0:100, :])
plt.show()
_ = input('Press [Enter] to continue.')

# =========== Part 5: PCA on Face Data: Eigenfaces  ===================
print('Running PCA on face dataset\n(this mght take a minute or two ...)')
x_norm, mu, sigma = featureNormalize(X)
u, s = pca(x_norm)
displayData(u[:, 0:36].T)
plt.show()
_ = input('Press [Enter] to continue.')

# ============= Part 6: Dimension Reduction for Faces =================
print('Dimension reduction for face dataset.')
K = 100
Z = projectData(x_norm, u, K)
print('the project data Z has a size of ', np.shape(Z))
_ = input('Press [Enter] to continue.')

# ==== Part 7: Visualization of Faces after PCA Dimension Reduction ====
print('Visualizing the projected (reduced dimension) faces.')
X_rec = recoverData(Z, u, K)
fig = plt.figure()
plt.subplot(121)
displayData(x_norm[0:100, :])
plt.title('Original faces')
plt.subplot(122)
displayData(X_rec[0:100, :])
plt.title('Recovered faces')
plt.show()
_ = input('Press [Enter] to continue.')

# === Part 8(a): Optional (ungraded) Exercise: PCA for Visualization ===
# 生成初始点
def kMeansInitCenter(x, k):
    randidx = np.random.permutation(np.size(x, 0))
    center = x[randidx[0: k], :]
    return center

# 找出最邻近点
def findCloseCenter(x, center):
    k, l = np.shape(center)
    xtemp = np.tile(x, k)
    centertemp = center.flatten()
    xtemp = np.power(xtemp-centertemp, 2)
    xt = np.zeros((np.size(xtemp, 0), k))
    for i in range(k):
        for j in range(l):
            xt[:, i] = xt[:, i]+xtemp[:, i*l+j]
    idx = np.argmin(xt, 1)+1
    return idx

# 找出中心点
def computeCenter(x, idx, k):
    m, n = np.shape(x)
    center = np.zeros((k, n))
    for i in range(k):
        pos = np.where(idx == i+1)
        center[i, :] = np.sum(x[pos], 0)/np.size(x[pos], 0)
    return center

# k均值聚类
def runkMeans(x, init_center, max_iter):
    m, n = np.shape(x)
    k = np.size(init_center, 0)
    center = init_center
    idx = np.zeros((m,))

    for i in range(max_iter):
        idx = findCloseCenter(x, center)
        center = computeCenter(x, idx, k)
    return center, idx

A = plt.imread('bird_small.png')
img_size = np.shape(A)
X = A.reshape(img_size[0]*img_size[1], img_size[2])
K = 16
max_iter = 10
init_center = kMeansInitCenter(X, K)
center, idx = runkMeans(X, init_center, max_iter)

sel = np.floor(np.random.random((1000,))*np.size(X, 0)).astype(int)+1
colors = cm.rainbow(np.linspace(0, 1, K))
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X[sel, 0], X[sel, 1], X[sel, 2], c=idx[sel], cmap=pltcolor.ListedColormap(colors), marker='o')
ax.set_title('Pixel dataset plotted in 3D. Color shows centroid memberships')
plt.show()
_ = input('Press [Enter] to continue.')

# === Part 8(b): Optional (ungraded) Exercise: PCA for Visualization ===
X_norm, mu, sigma = featureNormalize(X)
u, s = pca(X_norm)
Z = projectData(X_norm, u, 2)

colors = cm.rainbow(np.linspace(0, 1, K))
plt.scatter(Z[sel, 0], Z[sel, 1], c=idx[sel], cmap=pltcolor.ListedColormap(colors), marker='o')
plt.title('Pixel dataset plotted in 2D, using PCA for dimensionality reduction')
plt.show()

你可能感兴趣的:(机器学习,机器学习,kmeans算法,pca降维)

C++ 标准库＜numeric＞
以下对C++标准库中头文件所提供的数值算法与工具做一次系统、深入的梳理，包括算法功能、示例代码、复杂度分析及实践建议。一、概述中定义了一组对数值序列进行累加、内积、差分、扫描等操作的算法，以及部分辅助工具（如std::iota、std::gcd/std::lcm等）。所有算法均作用于迭代器区间，符合STL风格，可与任意容器或原始数组配合使用。从C++17、20起，又陆续加入了并行友好的std::r
具身语义导航算法总揽 Shilong Wang 具身导航算法算法
端到端方法小脑大脑GNMNavDPNaVILAViNTNomadNavidStreamVLNMapNavNavGPTUni-NavidOctoNavNavGPT2模仿学习行为克隆BCDAgger模块化方法GOATVLFMSayPlanLM-NavETPNavVoroNavEmbodiedRAGVL-NavStairwaytoSuccess业内大佬北大王鹤NavidUni-NavidOctoNav吴
android去除gps漂移代码,GPS漂移过滤算法扇贝君
GPS漂移过滤算法基本思想：逐点过滤，再经过基础过滤后，进行判断运动状态，静止状态和运动中。如果静止，则使用电子围栏；如果运动，则先过滤大速度，再过滤加速度，然后过滤距离(包括超大距离，和速度相关距离)。对于要过滤的点，采用之前最近的可靠点，进行替换，同时，无效次数+1，如果后面是有效点，则无效次数-1，如果无效次数归0，认为这个点才是真正可靠点(无效次数为正时，都为要被替换的点)。如果遇到不定点
Python的科学计算库NumPy（一） linlin_1998 python numpy 开发语言
NumPy(NumericalPython)是Python中最基础、最重要的科学计算库之一，提供了高性能的多维数组（ndarray）对象和大量数学函数，是许多数据科学、机器学习库（如Pandas、SciPy、TensorFlow等）的基础依赖。1.创建一个numpy里面的一维数组importnumpyasnp###通过array方法创建一个ndarrayarray1=np.array([1,2,3
项目开发日记
框架整理学习UIMgr：一、数据结构与算法1.1关键数据结构成员变量类型说明m_CtrlsList当前正在显示的所有UI页面m_CachesList已打开过、但现在不显示的页面（缓存池）1.2算法逻辑查找缓存页面：从m_Caches中倒序查找是否已有对应ePageType页面，找到则重用。页面加载：从资源管理器ResMgr加载prefab并绑定控制器/视图组件。页面关闭：从m_Ctrls移除，添加
深度学习图像分类数据集—桃子识别分类 AI街潜水的八角深度学习图像数据集深度学习分类人工智能
该数据集为图像分类数据集，适用于ResNet、VGG等卷积神经网络，SENet、CBAM等注意力机制相关算法，VisionTransformer等Transformer相关算法。数据集信息介绍：桃子识别分类：['B1','M2','R0','S3']训练数据集总共有6637张图片，每个文件夹单独放一种数据各子文件夹图片统计:·B1:1601张图片·M2:1800张图片·R0:1601张图片·S3:
《C++性能优化指南》 linux版代码及原理解读第一章 v俊逸 C++性能优化指南性能优化 C++性能优化性能优化
概述：目录概述：性能优化的必要性：C++代码优化策略总结用好的编译器并用好编译器使用更好的算法使用更好的库减少内存分配和复制移除计算使用更好的数据结构提高并发性优化内存管理性能优化的必要性：按照当今的CPU运行速度来说，执行一条指令所需要的时间是10的-9次方的时间单位，如此快速的执行速度是否就没有性能优化的必要了呢？其实不然，性能优化与CPU的执行速度并无非常大的关系，试想一下，一段代码，如果用
《C++性能优化指南》 linux版代码及原理解读第四章 v俊逸 C++性能优化指南性能优化 C++性能优化指南性能优化
目录概述为什么字符串很麻烦字符串是动态分配的字符串赋值背后的操作如何面对字符串会进行大量复制写时复制COW（copyonwrite）尝试优化字符串避免临时字符串通过预留存储空间减少内存分配通过传递引用减少实参复制使用迭代器操作减少循环中的比较操作减少返回值的复制还没有结束，使用字符数组代替字符串再次优化字符串尝试其他的算法叠加以前的优化方式使用其他的编译器使用其他字符串的库功能丰富的字符串库使用s
rtos内存管理林内克思 java linux 算法
FreeRTOS将内存分配API保留在其可移植层，提供了五种内存管理算法：heap_1：最简单，不允许释放内存。heap_2：允许释放内存，但不会合并相邻的空闲块。heap_3：简单包装了标准malloc()和free()，以保证线程安全。heap_4：合并相邻的空闲块以避免碎片化。包含绝对地址放置选项。heap_5：如同heap_4，能够跨越多个不相邻内存区域的堆。特点缺点heap_1简单、不支
c++中迭代器的本质三月微风 c++开发语言
C++迭代器的本质与实现原理迭代器是C++标准模板库(STL)的核心组件之一，它作为容器与算法之间的桥梁，提供了统一访问容器元素的方式。下面从多个维度深入解析迭代器的本质特性。一、迭代器的基本定义与分类迭代器的本质迭代器是一种行为类似指针的对象，用于遍历和操作容器中的元素。它提供了一种统一的方式来访问不同容器中的元素，而无需关心容器的具体实现细节。标准分类体系C++标准定义了5种迭代器类型，按功能
微算法科技的前沿探索：量子机器学习算法在视觉任务中的革新应用 MicroTech2025 量子计算算法
在信息技术飞速发展的今天，计算机视觉作为人工智能领域的重要分支，正逐步渗透到我们生活的方方面面。从自动驾驶到人脸识别，从医疗影像分析到安防监控，计算机视觉技术展现了巨大的应用潜力。然而，随着视觉任务复杂度的不断提升，传统机器学习算法在处理大规模、高维度数据时遇到了计算瓶颈。在此背景下，量子计算作为一种颠覆性的计算模式，以其独特的并行处理能力和指数级增长的计算空间，为解决这一难题提供了新的思路。微算
目标检测中的NMS算法详解
好的，我们来详细解释一下目标检测中非极大值抑制（Non-MaximumSuppression,NMS）的相关概念和计算过程。1.为什么需要NMS？问题：目标检测模型（如FasterR-CNN,YOLO,SSD等）在推理时，对于同一个目标物体，通常会预测出多个重叠的、不同置信度（confidencescore）的候选边界框（BoundingBoxes）。直接输出所有这些框会导致：结果冗余：同一个物体
AI技术正在深刻重塑A/B测试优化的流程、效率和价值，推动其从传统的“手动实验”向“智能优化引擎”跃迁。 zzywxc787 人工智能
AI技术正在深刻重塑A/B测试优化的流程、效率和价值，推动其从传统的“手动实验”向“智能优化引擎”跃迁。以下是具体变革方向及实际影响：1.实验设计智能化：告别“猜猜看”传统痛点：依赖经验选择测试变量（如按钮颜色、文案），忽略潜在高价值组合。AI解决方案：多臂老虎机算法（MAB）：动态分配流量至表现最优的变体（如：80%流量给当前最优，20%探索新选项），减少流量浪费高达70%（Netflix案例）
分布式选举算法＜一＞ Bully算法
分布式选举算法详解：Bully算法引言在分布式系统中，节点故障是不可避免的。当主节点（Leader）发生故障时，系统需要快速选举出新的主节点来保证服务的连续性。Bully算法是一种经典的分布式选举算法，以其简单高效的特点被广泛应用于各种分布式系统中。什么是Bully算法？Bully算法是一种基于优先级的分布式选举算法。每个节点都有一个唯一的ID，ID值越大的节点优先级越高。当主节点故障时，优先级最
正义的算法迷宫—人工智能重构司法体系的技术悖论与文明试炼
一、法庭的数字化迁徙当美国威斯康星州法院采纳COMPAS算法评估被告再犯风险，当中国"智慧法院"系统年处理1.2亿件案件，司法体系正经历从石柱法典到代码裁判的范式革命。这场转型的核心驱动力是司法效率与公正的永恒张力：美国重罪案件平均审理周期达18个月，中国基层法官年人均结案357件（是德国同行的6倍），而算法能在0.3秒内完成百万份文书比对。人工智能渗透司法引发三重裂变：证据分析从经验推断转向数据
GMSK调制解调算法的仿真与研究(源码+万字报告+讲解) 炳烛之明科技算法
目录GMSK调制解调算法的仿真与研究1摘要1Abstract11绪论51.1研究背景及意义51.2国内外研究现状61.3研究内容102几种数字调制方式112.1GMSK调制112.1.1GMSK简介112.1.2GMSK调制原理122.2QPSK调制152.3二进制相移键控(BPSK)163GMSK调制与解调方案与研究173.1GMSK传统调制方法173.1.1直接产生GMSK信号173.1.2P
LeetCode第317题_离建筑物最近的距离 @蓝莓果粒茶算法 leetcode linux 算法 c#学习 python c++
LeetCode第317题：离建筑物最近的距离文章摘要本文详细解析LeetCode第317题"离建筑物最近的距离"，这是一道图论和广度优先搜索的问题。文章提供了基于多源BFS的解法，包含C#、Python、C++三种语言实现，配有详细的算法分析和性能对比。适合想要提升图论算法能力的程序员。核心知识点：广度优先搜索、图论、矩阵遍历难度等级：困难推荐人群：具有图论基础，想要提升算法能力的程序员题目描述
Matplotlib-图像处理与可视化
Matplotlib-图像处理与可视化一、图像数据的本质：从数组到像素二、基础操作：加载与显示图像1.加载图像数据2.显示单张图像3.显示灰度图像三、进阶可视化：通道分离与色彩调整1.分离RGB通道2.调整亮度与对比度四、实用技巧：色彩映射与像素值分析1.自定义色彩映射（Colormap）2.像素值分布直方图五、多图对比与标注：算法结果可视化1.边缘检测结果对比2.图像标注：突出感兴趣区域六、注意
12. 说一下 https 的加密过程 yqcoder 前端面试-服务协议 https 网络协议 http
总结客户端发送一个http请求，告诉服务器支持哪些hash算法。服务端发送证书（公钥、网址、证书机构等）给客户端。验证证书生成随机密码（RSA签名）：对称密码用公钥加密，服务器用私钥解密。进行传输生成对称加密算法说一下HTTPS的加密过程HTTPS（HyperTextTransferProtocolSecure）是HTTP协议的安全版本，通过SSL/TLS协议实现数据加密传输，确保客户端与服务器之
在mac m1基于llama.cpp运行deepseek
lama.cpp是一个高效的机器学习推理库，目标是在各种硬件上实现LLM推断，保持最小设置和最先进性能。llama.cpp支持1.5位、2位、3位、4位、5位、6位和8位整数量化，通过ARMNEON、Accelerate和Metal支持Apple芯片，使得在MACM1处理器上运行Deepseek大模型成为可能。1下载llama.cppgitclonehttps://github.com/ggerg
资源分享-FPS, 矩阵, 骨骼, 绘制, 自瞄, U3D, UE4逆向辅助实战视频教程小零羊矩阵 3d ue4
文章底部获取资源教程概述本视频教程专为游戏开发者和安全研究人员设计，涵盖FPS游戏设计、矩阵运算、骨骼绘制、自瞄算法、U3D和UE4逆向辅助等实战内容。通过102节详细视频教程，您将掌握从基础到高级的游戏开发与安全防护技能。教程内容1.FPS类型游戏的设计研究和游戏安全,反外挂研究2.二维向量和平面距离3.atan2和tan4.三维向量和空间距离5.补充向量乘法6.矩阵和矩阵的运算7.矩阵的特性8
MATLAB实现快速非局部均值图像去噪方法一只爪子
本文还有配套的精品资源，点击获取简介：非局部均值滤波是一种先进的图像去噪技术，与传统方法相比，它利用图像的全局信息来去除噪声，同时保持图像细节。该算法通过搜索和利用整个图像中相似的像素块，对每个像素点进行去噪处理。本文提供的MATLAB代码FAST_NLM_II.m实现此算法，并包含必要的参数设置、相似性计算、加权平均和图像更新步骤。了解并应用此代码是学习和进一步改进非局部均值滤波技术的基础。1.
【JMeter】接口加密 QA媛_ JMeter jmeter
文章目录哈希对称加密非对称加密JMeter实现加密调用函数示例加密是信息安全的重要手段，常用在身份认证、访问控制等安全场景。原理：对原有内容的特殊变换，从而隐藏内容，无法伪造内容。常见的算法：哈希对称加密非对称加密哈希优点：速度快缺点：无法还原场景：签名、内容校验著名算法：MD5、SHA-512对称加密优点：速度相当快，可以还原，加密密钥和解密密钥相同（逻辑简单）缺点：安全系数不高，解密者完全可以
《算法备案全攻略：规范与流程引领数字时代新秩序》算法及大模型备案顾问刘老师算法备案深度学习 AIGC 语言模型算法人工智能
一、算法备案：开启合规新征程（一）备案规定的起源与发展2022年国家互联网信息办公室、工业和信息化部、公安部、国家市场监督管理总局联合发布《互联网信息服务算法推荐管理规定》，自2022年3月1日起施行。此后，相关规定不断完善和演进。如国家网信办于2022年8月、10月及2023年1月先后三次公布了《境内互联网信息服务算法备案清单》。同时，2022年发布的最高人民法院《关于规范和加强人工智能司法应用
【机器学习笔记Ⅰ】9 特征缩放巴伦是只猫机器学习机器学习笔记人工智能
特征缩放（FeatureScaling）详解特征缩放是机器学习数据预处理的关键步骤，旨在将不同特征的数值范围统一到相近的尺度，从而加速模型训练、提升性能并避免某些特征主导模型。1.为什么需要特征缩放？(1)问题背景量纲不一致：例如：特征1：年龄（范围0-100）特征2：收入（范围0-1,000,000）梯度下降的困境：量纲大的特征（如收入）会导致梯度更新方向偏离最优路径，收敛缓慢。量纲小的特征（如
使用tensorflow的线性回归的例子（七） lishaoan77 tensorflow tensorflow 线性回归人工智能
L1与L2损失这个脚本展示如何用TensorFlow求解线性回归。在算法的收敛性中，理解损失函数的影响是很重要的。这里我们展示L1和L2损失函数是如何影响线性回归的收敛性的。我们使用iris数据集,但是我们将改变损失函数和学习速率来看收敛性的改变。importmatplotlib.pyplotaspltimportnumpyasnpimporttensorflowastffromsklearnim
华为 Mate 80 影像配置揭秘：硬软双升 RUZHUA 华为
7月7日，知名数码博主爆料了华为Mate80系列的影像配置，引发广泛关注。从曝光信息来看，Mate80系列在影像方面延续华为的技术探索，通过硬件升级与算法优化，力图为用户带来更出色的拍摄体验。爆料显示，Mate80系列主摄将采用5000万像素的1/1.28英寸超大底传感器，支持物理可变光圈与定制模组。这一配置虽未达到“超大杯”的极致堆料，但在影像硬件上的创新依旧可圈可点。其主摄传感器型号为SC59
探索Python领域pip的强大功能 Python编程之道 Python人工智能与大数据 Python编程之道 python pip 网络 ai
探索Python领域pip的强大功能关键词：Python包管理、pip工具、依赖管理、虚拟环境、PyPI、wheel包、开发工作流摘要：本文深入探讨Python生态系统中pip工具的核心功能和应用场景。我们将从基础概念出发，逐步分析pip的架构原理、依赖解析算法，并通过实际案例展示其在项目开发中的高级用法。文章还将介绍pip与虚拟环境的协同工作方式，以及如何利用pip优化Python开发工作流。最
Python 取证学习指南第二版（三）
原文：annas-archive.org/md5/46c71d4b3d6fceaba506eebc55284aa5译者：飞龙协议：CCBY-NC-SA4.0第七章：模糊哈希哈希是DFIR中最常见的处理过程之一。这个过程允许我们总结文件内容，并分配一个代表文件内容的独特且可重复的签名。我们通常使用MD5、SHA1和SHA256等算法对文件和内容进行哈希。这些哈希算法非常有价值，因为我们可以用它们进行
Python 实战人工智能数学基础：推荐系统应用 AI天才研究院 AI大模型企业级应用开发实战大数据人工智能语言模型 Java Python 架构设计
作者：禅与计算机程序设计艺术文章目录1.背景介绍2.核心概念与联系2.1用户画像2.2相似性计算2.2.1基于物品的相似度2.2.2基于用户的相似度2.3协同过滤算法2.3.1基于用户的协同过滤算法2.3.2基于物品的协同过滤算法2.3.3基于上下文的协同过滤算法3.核心算法原理和具体操作步骤以及数学模型公式详细讲解3.1基于用户的协同过滤算法3.2基于物品的协同过滤算法3.3混合协同过滤算法3.
ASM系列六利用TreeApi 添加和移除类成员 lijingyao8206 jvm 动态代理 ASM 字节码技术 TreeAPI
同生成的做法一样，添加和移除类成员只要去修改fields和methods中的元素即可。这里我们拿一个简单的类做例子，下面这个Task类，我们来移除isNeedRemove方法，并且添加一个int 类型的addedField属性。 package asm.core; /** * Created by yunshen.ljy on 2015/6/
Springmvc-权限设计 bee1314 spring Web jsp
万丈高楼平地起。权限管理对于管理系统而言已经是标配中的标配了吧，对于我等俗人更是不能免俗。同时就目前的项目状况而言，我们还不需要那么高大上的开源的解决方案，如Spring Security，Shiro。小伙伴一致决定我们还是从基本的功能迭代起来吧。目标： 1.实现权限的管理（CRUD） 2.实现部门管理（CRUD) 3.实现人员的管理（CRUD） 4.实现部门和权限
算法竞赛入门经典（第二版）第2章习题 CrazyMizzz c 算法
2.4.1 输出技巧 #include <stdio.h> int main() { int i, n; scanf("%d", &n); for (i = 1; i <= n; i++) printf("%d\n", i); return 0; } 习题2-2 水仙花数(daffodil
struts2中jsp自动跳转到Action 麦田的设计者 jsp webxml struts2 自动跳转
1、在struts2的开发中，经常需要用户点击网页后就直接跳转到一个Action，执行Action里面的方法，利用mvc分层思想执行相应操作在界面上得到动态数据。毕竟用户不可能在地址栏里输入一个Action（不是专业人士） 2、＜jsp:forward page="xxx.action" /＞，这个标签可以实现跳转，page的路径是相对地址,不同与jsp和j
php 操作webservice实例 IT独行者 PHP webservice
首先大家要简单了解了何谓webservice，接下来就做两个非常简单的例子，webservice还是逃不开server端与client端。我测试的环境为：apache2.2.11 php5.2.10做这个测试之前，要确认你的php配置文件中已经将soap扩展打开，即extension=php_soap.dll; OK 现在我们来体验webservice //server端 serve
Windows下使用Vagrant安装linux系统 _wy_ windows vagrant
准备工作：下载安装 VirtualBox ：https://www.virtualbox.org/ 下载安装 Vagrant ：http://www.vagrantup.com/ 下载需要使用的 box ：官方提供的范例：http://files.vagrantup.com/precise32.box 还可以在 http://www.vagrantbox.es/
更改linux的文件拥有者及用户组(chown和chgrp) 无量 c linux chgrp chown
本文（转） http://blog.163.com/yanenshun@126/blog/static/128388169201203011157308/ http://ydlmlh.iteye.com/blog/1435157 一、基本使用：使用chown命令可以修改文件或目录所属的用户：命令
linux下抓包工具矮蛋蛋 linux
原文地址： http://blog.chinaunix.net/uid-23670869-id-2610683.html tcpdump -nn -vv -X udp port 8888 上面命令是抓取udp包、端口为8888 netstat -tln 命令是用来查看linux的端口使用情况 13 . 列出所有的网络连接 lsof -i 14. 列出所有tcp 网络连接信息 l
我觉得mybatis是垃圾！：“每一个用mybatis的男纸，你伤不起” alafqq mybatis
最近看了每一个用mybatis的男纸，你伤不起原文地址：http://www.iteye.com/topic/1073938 发表一下个人看法。欢迎大神拍砖；个人一直使用的是Ibatis框架，公司对其进行过小小的改良；最近换了公司，要使用新的框架。听说mybatis不错；就对其进行了部分的研究；发现多了一个mapper层；个人感觉就是个dao；
解决java数据交换之谜百合不是茶数据交换
交换两个数字的方法有以下三种，其中第一种最常用 /* 输出最小的一个数 */ public class jiaohuan1 { public static void main(String[] args) { int a =4; int b = 3; if(a<b){ // 第一种交换方式 int tmep =
渐变显示 bijian1013 JavaScript
<style type="text/css"> #wxf { FILTER: progid:DXImageTransform.Microsoft.Gradient(GradientType=0, StartColorStr=#ffffff, EndColorStr=#97FF98); height: 25px; } </style>
探索JUnit4扩展：断言语法assertThat bijian1013 java 单元测试 assertThat
一.概述 JUnit 设计的目的就是有效地抓住编程人员写代码的意图，然后快速检查他们的代码是否与他们的意图相匹配。 JUnit 发展至今，版本不停的翻新，但是所有版本都一致致力于解决一个问题，那就是如何发现编程人员的代码意图，并且如何使得编程人员更加容易地表达他们的代码意图。JUnit 4.4 也是为了如何能够
【Gson三】Gson解析{"data":{"IM":["MSN","QQ","Gtalk"]}} bit1129 gson
如何把如下简单的JSON字符串反序列化为Java的POJO对象? {"data":{"IM":["MSN","QQ","Gtalk"]}} 下面的POJO类Model无法完成正确的解析： import com.google.gson.Gson;
【Kafka九】Kafka High Level API vs. Low Level API bit1129 kafka
1. Kafka提供了两种Consumer API High Level Consumer API Low Level Consumer API(Kafka诡异的称之为Simple Consumer API，实际上非常复杂) 在选用哪种Consumer API时，首先要弄清楚这两种API的工作原理，能做什么不能做什么，能做的话怎么做的以及用的时候，有哪些可能的问题
在nginx中集成lua脚本：添加自定义Http头，封IP等 ronin47 nginx lua
Lua是一个可以嵌入到Nginx配置文件中的动态脚本语言，从而可以在Nginx请求处理的任何阶段执行各种Lua代码。刚开始我们只是用Lua 把请求路由到后端服务器，但是它对我们架构的作用超出了我们的预期。下面就讲讲我们所做的工作。强制搜索引擎只索引mixlr.com Google把子域名当作完全独立的网站，我们不希望爬虫抓取子域名的页面，降低我们的Page rank。 location /{
java-归并排序 bylijinnan java
import java.util.Arrays; public class MergeSort { public static void main(String[] args) { int[] a={20,1,3,8,5,9,4,25}; mergeSort(a,0,a.length-1); System.out.println(Arrays.to
Netty源码学习-CompositeChannelBuffer bylijinnan java netty
CompositeChannelBuffer体现了Netty的“Transparent Zero Copy” 查看API（ http://docs.jboss.org/netty/3.2/api/org/jboss/netty/buffer/package-summary.html#package_description）可以看到，所谓“Transparent Zero Copy”是通
Android中给Activity添加返回键 hotsunshine Activity
// this need android:minSdkVersion="11" getActionBar().setDisplayHomeAsUpEnabled(true); @Override public boolean onOptionsItemSelected(MenuItem item) {
静态页面传参 ctrain 静态
$(document).ready(function () { var request = { QueryString : function (val) { var uri = window.location.search; var re = new RegExp("" + val + "=([^&?]*)", &
Windows中查找某个目录下的所有文件中包含某个字符串的命令 daizj windows 查找某个目录下的所有文件包含某个字符串
findstr可以完成这个工作。 [html] view plain copy >findstr /s /i "string" *.* 上面的命令表示，当前目录以及当前目录的所有子目录下的所有文件中查找"string&qu
改善程序代码质量的一些技巧 dcj3sjt126com 编程 PHP 重构
有很多理由都能说明为什么我们应该写出清晰、可读性好的程序。最重要的一点，程序你只写一次，但以后会无数次的阅读。当你第二天回头来看你的代码时，你就要开始阅读它了。当你把代码拿给其他人看时，他必须阅读你的代码。因此，在编写时多花一点时间，你会在阅读它时节省大量的时间。让我们看一些基本的编程技巧：尽量保持方法简短尽管很多人都遵
SharedPreferences对数据的存储 dcj3sjt126com
SharedPreferences简介： &nbs
linux复习笔记之bash shell (2) bash基础 eksliang bash bash shell
转载请出自出处： http://eksliang.iteye.com/blog/2104329 1.影响显示结果的语系变量（locale） 1.1locale这个命令就是查看当前系统支持多少种语系，命令使用如下： [root@localhost shell]# locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8"
Android零碎知识总结 gqdy365 android
1、CopyOnWriteArrayList add(E) 和remove(int index)都是对新的数组进行修改和新增。所以在多线程操作时不会出现java.util.ConcurrentModificationException错误。所以最后得出结论：CopyOnWriteArrayList适合使用在读操作远远大于写操作的场景里，比如缓存。发生修改时候做copy，新老版本分离，保证读的高
HoverTree.Model.ArticleSelect类的作用 hvt Web .net C#hovertree asp.net
ArticleSelect类在命名空间HoverTree.Model中可以认为是文章查询条件类，用于存放查询文章时的条件，例如HvtId就是文章的id。HvtIsShow就是文章的显示属性，当为-1是，该条件不产生作用，当为0时，查询不公开显示的文章，当为1时查询公开显示的文章。HvtIsHome则为是否在首页显示。HoverTree系统源码完全开放，开发环境为Visual Studio 2013
PHP 判断是否使用代理 PHP Proxy Detector 天梯梦 proxy
1. php 类 I found this class looking for something else actually but I remembered I needed some while ago something similar and I never found one. I'm sure it will help a lot of developers who try to
apache的math库中的回归——regression（翻译） lvdccyb Math apache
这个Math库，虽然不向weka那样专业的ML库，但是用户友好，易用。多元线性回归，协方差和相关性（皮尔逊和斯皮尔曼），分布测试（假设检验，t，卡方，G），统计。数学库中还包含，Cholesky，LU，SVD，QR，特征根分解，真不错。基本覆盖了：线代，统计，矩阵，最优化理论曲线拟合常微分方程遗传算法（GA），还有3维的运算。。。
基础数据结构和算法十三：Undirected Graphs (2) sunwinner Algorithm
Design pattern for graph processing. Since we consider a large number of graph-processing algorithms, our initial design goal is to decouple our implementations from the graph representation
云计算平台最重要的五项技术 sumapp 云计算云平台智城云
云计算平台最重要的五项技术 1、云服务器云服务器提供简单高效，处理能力可弹性伸缩的计算服务，支持国内领先的云计算技术和大规模分布存储技术，使您的系统更稳定、数据更安全、传输更快速、部署更灵活。特性机型丰富通过高性能服务器虚拟化为云服务器，提供丰富配置类型虚拟机，极大简化数据存储、数据库搭建、web服务器搭建等工作；仅需要几分钟，根据CP
《京东技术解密》有奖试读获奖名单公布 ITeye管理员活动
ITeye携手博文视点举办的12月技术图书有奖试读活动已圆满结束，非常感谢广大用户对本次活动的关注与参与。 12月试读活动回顾： http://webmaster.iteye.com/blog/2164754 本次技术图书试读活动获奖名单及相应作品如下：一等奖（两名） Microhardest：http://microhardest.ite