Operations Research Transactions

Previous Articles     Next Articles

A survey on the initialization methods for the k-means algorithm

XU Dachuan1  XU Yicheng1,2  ZHANG Dongmei3,*   

  1. 1. College of Applied Sciences, Beijing University of Technology, Beijing 100124, China;  2.  Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, Guangdong, China; 3. School of Computer Science and Technology, Shandong Jianzhu University, Jinan 250101, China
  • Received:2017-10-16 Online:2018-06-15 Published:2018-06-15

Abstract:

The k-means problem which is one of the classical NP-hard problems keeps attracting wide attention in combinatorial optimization and computer science area since been proposed. Given a set of observations consisting of N d-dimensional real vectors, the objective is to partition the N observations into k(\leqslant N) sets, so as to minimize the within-cluster sum of squares, where the center of a cluster is defined as the mean of all the observations in it. As one of the heuristic algorithms for solving k-means problem, k-means algorithm is quite popular in practice because of its excellent perform of convergence. The k-means algorithm can be described as: given initial solution for k-means problem, repeat assignment (assign observations to its nearest center) and update (compute the new centers in the new clusters) step until convergence to a solution. Although the algorithm is usually considered as linearly convergency, its shortcomings are obvious. The k-means algorithm is unable to ensure the global optimality and the output is rely badly on the choice of initial solutions. Therefore, variants of initialization methods are proposed to improve the performance of the k-means algorithm. In this paper we mainly filter and list some of them for readers.

Key words: k-means algorithm, initialization method