Operations Research Transactions

Previous Articles     Next Articles

A survey on algorithms for k-means problem and its variants

XU Dachuan1,*  XU Yicheng1 ZHANG Dongmei2   

  1. 1. College of Applied Sciences, Beijing University of Technology, Beijing 100124, China; 2. School of Computer Science and Technology, Shandong Jianzhu University, Jinan 250101, China
  • Received:2017-04-05 Online:2017-06-15 Published:2017-06-15

Abstract:

The k-means is one of the most classical problems in both computer science and combinatorial optimization. The k-means clustering is popular as one of the most significant and easiest methods of cluster analysis in data mining. The k-means problem can be described as: Given a set of observations with n elements, where each element is a d-dimensional real vector. The objective is to partition the n observations into k sets, so as to minimize within-cluster sum of squares, where we call the center of the cluster the mean of the observations in it. The k-means is NP-hard in theory however, there are efficient heuristic algorithms employed widely in market segmentation, machine vision, geostatistics, astronomy, agriculture and so on. With more and more complicated the problem we meet in practical, and larger and larger the data grows, further research is needed. This article aims at listing the classical algorithms for solving k-means as well as the variety of the variants and generalization of it, and summarize some problems in k-means which have not been solved for readers.

Key words: clustering problems, k-means, NP-hard