运筹学学报

• 运筹学 • 上一篇    下一篇

k-平均问题及其变形的算法综述

徐大川1,*  许宜诚1  张冬梅2   

  1. 1. 北京工业大学应用数理学院, 北京 100124; 2. 山东建筑大学计算机科学与技术学院, 济南 250101
  • 收稿日期:2017-04-05 出版日期:2017-06-15 发布日期:2017-06-15
  • 通讯作者: 徐大川 E-mail: xudc@bjut.edu.cn
  • 基金资助:

    国家自然科学基金(No. 11531014), 山东省高等学校科研计划项目(No. J15LN23)

A survey on algorithms for k-means problem and its variants

XU Dachuan1,*  XU Yicheng1 ZHANG Dongmei2   

  1. 1. College of Applied Sciences, Beijing University of Technology, Beijing 100124, China; 2. School of Computer Science and Technology, Shandong Jianzhu University, Jinan 250101, China
  • Received:2017-04-05 Online:2017-06-15 Published:2017-06-15

摘要:

k-平均问题是计算机科学和组合优化领域的经典问题之一. k-平均聚类作为最受重视而且最简单易懂的一种聚类分析方法流行于数据挖掘领域. k-平均问题可描述为: 给定n个元素的观测集, 其中每个观测点都是d维实向量, 目标是把这n个观测点划分到k(\le n)个集合中, 使得所有集合中的点到对应的聚类中心的距离的平方和最小, 其中一个集合的聚类中心指的是该集合中所有观测点的均值. k-平均问题在理论上是NP-难的, 但有高效的启发式算法, 广泛应用在市场划分、机器视觉、地质统计学、天文学和农业等实际背景中. 随着实际问题中遇到的k-平均问题更加复杂, 数据量更加庞大, 还需学者进行 更深一步的研究. 罗列出k-平均问题及其诸多变形及推广问题的经典算法, 并总结$k$-平均中尚待研究的若干问题.

关键词: 聚类问题, k-平均, NP-难

Abstract:

The k-means is one of the most classical problems in both computer science and combinatorial optimization. The k-means clustering is popular as one of the most significant and easiest methods of cluster analysis in data mining. The k-means problem can be described as: Given a set of observations with n elements, where each element is a d-dimensional real vector. The objective is to partition the n observations into k sets, so as to minimize within-cluster sum of squares, where we call the center of the cluster the mean of the observations in it. The k-means is NP-hard in theory however, there are efficient heuristic algorithms employed widely in market segmentation, machine vision, geostatistics, astronomy, agriculture and so on. With more and more complicated the problem we meet in practical, and larger and larger the data grows, further research is needed. This article aims at listing the classical algorithms for solving k-means as well as the variety of the variants and generalization of it, and summarize some problems in k-means which have not been solved for readers.

Key words: clustering problems, k-means, NP-hard