K-Means
K-means clustering falls under unsupervised learning, which is used when the data is unlabeled data. K- Means algorithm is used to find groups in data with the number of groups represented by the variable k that has been formed organically. It iteratively assign each data point to one of k groups based on the features that are provided. Data points are clustered based on feature similarity. Results of K-Means clustering algorithm are:-
1. K cluster centroids are used to label new data.
2. Label for the training data.
Each centroid of a cluster is a collection of feature values which define the resulting groups. A cluster refer to a collection of data points aggregated together because of certain similarities.
Algorithms
The algorithm inputs are the number of cluster k and data set. The data set is a collection of features for each data point. The algorithm starts with initial estimates for the K centroids, which can either be randomly generated or randomly selected from the data set.
It iterates between two steps:
1. Data Assignment Step: Each centroid defines a single clusters. Data point is assigned to its nearest centroid, based on the Squared Euclidean distance.
Argmin dist(ci,x)2
Ci € C
Ci collection of centroid in set C, data pointx is assigned to a cluster where distance(-) is the standard Euclidean
distance.
2. Centroid update step: Centroid are recomputed. By taking out the mean of all data points assigned to that centroid’s cluster. Algorithm iterates between both the steps until the criteria is met.
Ci= (1/|SI|)∑X € Sixi
The result is assured in this algorithm. The result may be a local optimum.
Uses:
This versatile algorithm that can be used for any type grouping.
Ø Behavioral segmentation
Ø Inventory categorization
Ø Sorting sensor measurements
Ø Detecting anomalies
Working of k-means
Algorithms:
The k-means algorithm in data mining starts with a first group of randomly selected centriods that is used as beginning point of each clusters and then performs iterative calculations to optimize the positions of centroids.
It halts and optimizing cluster when,
1. The centroids have established.
2. No change in values because of successful clustering.
3. The defined number of iteration has been achieved.
K-means Clustering Method:
If k is given, the K-means algorithm can be executed in the following steps:
- Partition of objects into k non-empty subsets
- Identifying the cluster centroids (mean point) of the current partition.
- Assigning each point to a specific cluster
- Compute the distances from each point and allot points to the cluster where the distance from the centroid is minimum.
- After re-allotting the points, find the centroid of the new cluster formed.
Mathematical Formulation for K-means Algorithm:
D= {x1,x2,…,xi,…,xm} à data set of m records
xi= (xi1,xi2,…,xin) à each record is an n-dimensional vector