运筹学学报 >
2018 , Vol. 22 >Issue 2: 31 - 40
DOI: https://doi.org/10.15960/j.cnki.issn.1007-6093.2018.02.003
k-均值算法的初始化方法综述
收稿日期: 2017-10-16
网络出版日期: 2018-06-15
基金资助
国家自然科学基金(No. 11531014), 山东省济南市科技发展计划项目(No. 201401211)
A survey on the initialization methods for the k-means algorithm
Received date: 2017-10-16
Online published: 2018-06-15
k-均值问题自提出以来一直吸引组合优化和计算机科学领域的广泛关注, 是经典的NP-难问题之一. 给定N个d维实向量构成的观测集, 目标是把这N个观测点划分到k(\leq N)个集合中, 使得所有集合中的点到对应的聚类中心距离的平方和最小, 一个集合的聚类中心指的是该集合 中所有观测点的均值. k-均值算法作为解决k-均值问题的启发式算法,在实际应用中因其出色的收敛速度而倍受欢迎. k-均值算法可描述为: 给定问题的初始化分组, 交替进行指派(将观测点分配到离其最近的均值点)和更新(计算新的聚类的均值点)直到收敛到某一解. 该算法通常被认为几乎是线性收敛的. 但缺点也很明显, 无法保证得到的是全局最优解, 并且算法结果好坏过于依赖初始解的选取. 于是学者们纷纷提出不同的初始化方法来提高k-均值算法的质量. 现筛选和罗列了关于选取初始解的k-均值算法的初始化方法供读者参考.
徐大川, 许宜诚, 张冬梅 . k-均值算法的初始化方法综述[J]. 运筹学学报, 2018 , 22(2) : 31 -40 . DOI: 10.15960/j.cnki.issn.1007-6093.2018.02.003
The k-means problem which is one of the classical NP-hard problems keeps attracting wide attention in combinatorial optimization and computer science area since been proposed. Given a set of observations consisting of N d-dimensional real vectors, the objective is to partition the N observations into k(\leqslant N) sets, so as to minimize the within-cluster sum of squares, where the center of a cluster is defined as the mean of all the observations in it. As one of the heuristic algorithms for solving k-means problem, k-means algorithm is quite popular in practice because of its excellent perform of convergence. The k-means algorithm can be described as: given initial solution for k-means problem, repeat assignment (assign observations to its nearest center) and update (compute the new centers in the new clusters) step until convergence to a solution. Although the algorithm is usually considered as linearly convergency, its shortcomings are obvious. The k-means algorithm is unable to ensure the global optimality and the output is rely badly on the choice of initial solutions. Therefore, variants of initialization methods are proposed to improve the performance of the k-means algorithm. In this paper we mainly filter and list some of them for readers.
Key words: k-means algorithm; initialization method
/
| 〈 |
|
〉 |