Now we will see how to apply K-Means algorithm with three examples.
Consider, you have a set of data with only one feature, ie one-dimensional.
For eg, we can take our t-shirt problem where you use only height of people to decide the size of t-shirt.
def lmc_cv_k_means_demo(method):

0: Data with Only One Feature with K-Means Clustering in OpenCV.
# 0: Data with Only One Feature with K-Means Clustering in OpenCV.
if 0 == method:
x = np.random.randint(25, 100, 25)
y = np.random.randint(175, 255, 25)
z = np.hstack((x, y))
z = z.reshape((50, 1))
z = np.float32(z)
pyplot.figure('Data Histogram', figsize=(16, 9))
pyplot.hist(z, 256, [0, 256])
# Define criteria = ( type, max_iter = 10 , epsilon = 1.0 )
criteria = (lmc_cv.TERM_CRITERIA_EPS + lmc_cv.TERM_CRITERIA_MAX_ITER, 10, 1.0)
# Set flags (Just to avoid line break in the code)
# Apply KMeans
compactness, labels, centers = lmc_cv.kmeans(z, 2, None, criteria, 10, flags)
# split the data to different clusters depending on their labels.
cluster_a = z[labels == 0]
cluster_b = z[labels == 1]
# plot 'A' in red, 'B' in blue, 'centers' in yellow
pyplot.figure('Result', figsize=(16, 9))
pyplot.hist(cluster_a, 256, [0, 256], color='r')
pyplot.hist(cluster_b, 256, [0, 256], color='b')
pyplot.hist(centers, 32, [0, 256], color='y')
In previous example, we took only height for t-shirt problem. Here, we will take both height and weight, ie two features.
Remember, in previous case, we made our data to a single column vector. Each feature is arranged in a column, while each row corresponds to an input test sample.
For example, in this case, we set a test data of size 50x2, which are heights and weights of 50 people.
First column corresponds to height of all the 50 people and second column corresponds to their weights.
First row contains two elements where first one is the height of first person and second one his weight.
Similarly remaining rows corresponds to heights and weights of other people.
Check image below:
def lmc_cv_k_means_demo(method):

1: Data with Multiple Features with K-Means Clustering in OpenCV.
# 1: Data with Multiple Features with K-Means Clustering in OpenCV.
if 1 == method:
x = np.random.randint(25, 50, (25, 2))
y = np.random.randint(60, 85, (25, 2))
z = np.vstack((x, y))
# convert to np.float32
z = np.float32(z)
# define criteria and apply kmeans()
criteria = (lmc_cv.TERM_CRITERIA_EPS + lmc_cv.TERM_CRITERIA_MAX_ITER, 10, 1.0)
ret, label, center = lmc_cv.kmeans(z, 2, None, criteria, 10, lmc_cv.KMEANS_RANDOM_CENTERS)
# Now separate the data, Note the flatten()
cluster_a = z[label.ravel() == 0]
cluster_b = z[label.ravel() == 1]
# Plot the data
pyplot.figure('Result', figsize=(16, 9))
pyplot.scatter(cluster_a[:, 0], cluster_a[:, 1])
pyplot.scatter(cluster_b[:, 0], cluster_b[:, 1], c='r')
pyplot.scatter(center[:, 0], center[:, 1], s=80, c='y', marker='s')
Color Quantization is the process of reducing number of colors in an image.
One reason to do so is to reduce the memory. Sometimes, some devices may have limitation such that it can produce only limited number of colors.
In those cases also, color quantization is performed. Here we use k-means clustering for color quantization.
There is nothing new to be explained here. There are 3 features, say, R,G,B. So we need to reshape the image to an array of Mx3 size (M is number of pixels in image).
And after the clustering, we apply centroid values (it is also R,G,B) to all pixels, such that resulting image will have specified number of colors.
And again we need to reshape it back to the shape of original image.
Below is the code:
def lmc_cv_k_means_demo(method):

2: Color Quantization with K-Means Clustering in OpenCV.
# 2: Color Quantization with K-Means Clustering in OpenCV.
if 2 == method:
stacking_images = []
image_file_name = ['D:/99-Research/TestData/image/Castle01.jpg',
for i in range(len(image_file_name)):
image = lmc_cv.imread(image_file_name[i])
image = lmc_cv.cvtColor(image, lmc_cv.COLOR_BGR2RGB)
stacking_image = image.copy()
result_image = image.copy()
z = image.reshape((-1, 3))
# convert to np.float32
z = np.float32(z)
# define criteria, number of clusters and apply kmeans()
criteria = (lmc_cv.TERM_CRITERIA_EPS + lmc_cv.TERM_CRITERIA_MAX_ITER, 10, 1.0)
for clusters_number in range(1, 4):
ret, label, center = lmc_cv.kmeans(z, 2 ** clusters_number, None, criteria, 10,
# Now convert back into uint8, and make original image
center = np.uint8(center)
res = center[label.flatten()]
result_image = res.reshape(image.shape)
# stacking images side-by-side
stacking_image = np.hstack((stacking_image, result_image))
# stacking images side-by-side
# 显示图像
for i in range(len(stacking_images)):
pyplot.figure('Color Quantization with K-Means Clustering %d' % (i + 1))
pyplot.subplot(1, 1, 1)
pyplot.imshow(stacking_images[i], 'gray')
pyplot.title('Color Quantization with K-Means Clustering: k=2 k=4 k=8')
pyplot.savefig('%02d.png' % (i + 1))
# 根据用户输入保存图像
if ord("q") == (lmc_cv.waitKey(0) & 0xFF):
# 销毁窗口