为了更好的学习,充分复习自己学习的知识,总结课内重要知识点,每次完成作业后都会更博。
英文非官方笔记
总结
1.无监督学习——介绍
(1)聚类(从无标签数据中学习)
(2)无监督学习
(3)适用聚类的
a.市场划分
b.社交网络分析
c.计算机集群
d.天文数据分析
2.K-均值算法
(1)将无标签数据分成两组
(2)随机选择两个点作为聚合中心
a.有多少个聚合类就选择多少个聚合中心
(3)集群分配步骤
a.查看每个点。看他们是否接近红色或者蓝色质心,将每个点分配
(4)移动聚合中心
a.取每个质心并移至相应分配数据点的平均值
b.重复(2)(4)直到收敛
(5)K—均值算法实现
a.随机选择K个聚合中心
3.K-均值优化对象
(1)对象
(2)优化目标
4.如何选择聚类数量
(1)手肘法
(2)利用需求来选择
作业
1.找到最近的聚合中心
load('ex7data2.mat');
K = 3;
initial_centroids = [3 3; 6 2; 8 5];
idx = findClosestCentroids(X, initial_centroids);
%findClosestCentroids函数
K = size(centroids, 1);
idx = zeros(size(X,1), 1);
for i=1:length(idx)
for j =1:length(centroids(:,1))
distanse(i,j) = sum((centroids(j,:) - X(i,:)).^2).^0.5;
end % compute the distance(K,1) pdist2 is a good function
[C,idx(i)]=min(distanse(i,:)); % find the minimum
end
end
2.计算平均值
centroids = computeCentroids(X, idx, K);
3.K-mean聚合
load('ex7data2.mat');
% Settings for running K-Means
K = 3;
max_iters = 10;
initial_centroids = [3 3; 6 2; 8 5];
[centroids, idx] = runkMeans(X, initial_centroids, max_iters, true);
%runkmeans函数
if ~exist('plot_progress', 'var') || isempty(plot_progress)
plot_progress = false;
end
if plot_progress
figure;
hold on;
end
[m n] = size(X);
K = size(initial_centroids, 1);
centroids = initial_centroids;
previous_centroids = centroids;
idx = zeros(m, 1);
for i=1:max_iters
fprintf('K-Means iteration %d/%d...\n', i, max_iters);
if exist('OCTAVE_VERSION')
fflush(stdout);
end
idx = findClosestCentroids(X, centroids);
if plot_progress
plotProgresskMeans(X, centroids, previous_centroids, idx, K, i);
previous_centroids = centroids;
fprintf('Press enter to continue.\n');
pause;
end
centroids = computeCentroids(X, idx, K);
end
if plot_progress
hold off;
end
end
4.在像素上K-mean聚合
A = double(imread('bird_small.png'));
A = A / 255;
img_size = size(A);
X = reshape(A, img_size(1) * img_size(2), 3);
K = 16;
max_iters = 10;
initial_centroids = kMeansInitCentroids(X, K);
[centroids, idx] = runkMeans(X, initial_centroids, max_iters);
%kMeansInitCentroids函数
centroids = zeros(K, size(X, 2));
randidx = randperm(size(X, 1));
centroids = X(randidx(1:K), :);
end
5.像素压缩
idx = findClosestCentroids(X, centroids);
X_recovered = centroids(idx,:);
X_recovered = reshape(X_recovered, img_size(1), img_size(2), 3);
subplot(1, 2, 1);
imagesc(A);
title('Original');
subplot(1, 2, 2);
imagesc(X_recovered)
title(sprintf('Compressed, with %d colors.', K));