Now we compute the real center of each cluster. These form the new positions of the two centroids. The centroids in Fig. 3.6(b) are based on the clusters shown in Fig. 3.6(a). In Fig. 3.6(b), we again assign all instances to the centroid that is closest. This results in the two new clusters shown in Fig. 3.6(b). All instances with an open dot are assigned to one centroid whereas all the instances with a closed dot are assigned to the other one. Now we compute the real centers of these two new clusters. This results in a relocation of the centroids as shown in Fig. 3.6(c). Again we assign the instances to the centroid that is closest. However, now nothing changes and the location of the centroids remains the same. After converging the k-means algorithm outputs, the two clusters and related statistics.
The quality of a particular clustering can be defined as the average distance from an instance to its corresponding centroid. k-means clustering is only a heuristic and does not guarantee that it finds the k clusters that minimize the average distance from an instance to its corresponding centroid. In fact, the result depends on the initialization. Therefore, it is good to repeatedly execute the algorithm with different initializations and select the best one.
There are many variants of the algorithm just described. However, we refer to standard literature for details [5, 15, 52, 67, 129]. One of the problems when using the k-means algorithm is determining the number of clusters k. For k-means this is fixed from the beginning. Note that the average distance from an instance to its corresponding centroid decreases as k is increased. In the extreme case every instance has its own cluster and the average distance from an instance to its corresponding centroid is zero. This is not very useful. Therefore, a frequently used approach is to start with a small number of clusters and then gradually increase k as long as there are significant improvements.
Another popular clustering technique is Agglomerative Hierarchical Clustering (AHC). Here, a variable number of clusters is generated. Figure 3.7 illustrates the idea. The approach works as follows. Assign each instance to a dedicated singleton cluster. Now search for the two clusters that are closest to one another. Merge these two clusters into a new cluster. For example, the initial clusters consisting of just a and just b are merged into a new cluster ab. Now search again for the two clusters