3 min readSep 24, 2019

K-Means

K-means clustering falls under unsupervised learning, which is used when the data is unlabeled data. K- Means algorithm is used to find groups in data with the number of groups represented by the variable k that has been formed organically. It iteratively assign each data point to one of k groups based on the features that are provided. Data points are clustered based on feature similarity. Results of K-Means clustering algorithm are:-

1. K cluster centroids are used to label new data.

2. Label for the training data.

Each centroid of a cluster is a collection of feature values which define the resulting groups. A cluster refer to a collection of data points aggregated together because of certain similarities.

Algorithms

The algorithm inputs are the number of cluster k and data set. The data set is a collection of features for each data point. The algorithm starts with initial estimates for the K centroids, which can either be randomly generated or randomly selected from the data set.

It iterates between two steps:

1. Data Assignment Step: Each centroid defines a single clusters. Data point is assigned to its nearest centroid, based on the Squared Euclidean distance.

Argmin dist(ci,x)2

Ci € C

Ci collection of centroid in set C, data pointx is assigned to a cluster where distance(-) is the standard Euclidean

distance.

2. Centroid update step: Centroid are recomputed. By taking out the mean of all data points assigned to that centroid’s cluster. Algorithm iterates between both the steps until the criteria is met.

Ci= (1/|SI|)∑X € Sixi

The result is assured in this algorithm. The result may be a local optimum.

Uses:

This versatile algorithm that can be used for any type grouping.

Ø Behavioral segmentation

Ø Inventory categorization

Ø Sorting sensor measurements

Ø Detecting anomalies

Working of k-means

Algorithms:

The k-means algorithm in data mining starts with a first group of randomly selected centriods that is used as beginning point of each clusters and then performs iterative calculations to optimize the positions of centroids.

It halts and optimizing cluster when,

1. The centroids have established.

2. No change in values because of successful clustering.

3. The defined number of iteration has been achieved.

K-means Clustering Method:

If k is given, the K-means algorithm can be executed in the following steps:

Partition of objects into k non-empty subsets
Identifying the cluster centroids (mean point) of the current partition.
Assigning each point to a specific cluster
Compute the distances from each point and allot points to the cluster where the distance from the centroid is minimum.
After re-allotting the points, find the centroid of the new cluster formed.

Mathematical Formulation for K-means Algorithm:

D= {x1,x2,…,xi,…,xm} à data set of m records

xi= (xi1,xi2,…,xin) à each record is an n-dimensional vector

Mathematical Formulation for K-means Algorithm:

Example:

Written by Shailesh Varma

No responses yet