带惩罚μ-相似Bregman散度k-均值问题的初始化算法

展开
  • 1. 山东建筑大学计算机科学与技术学院, 山东济南 250101
    2. 山东大学软件学院, 山东济南 250101
    3. 曲阜师范大学数学科学学院, 山东曲阜 273165
张冬梅   E-mail: zhangdongmei@sdjzu.edu.cn

收稿日期: 2021-07-21

  网络出版日期: 2022-03-14

基金资助

国家自然科学基金(11871081);国家自然科学基金(61972228);国家自然科学基金(11801310);山东省自然科学基金(ZR2021MF006)

The seeding algorithm for $\mu$-similar Bregman divergences $k$-means problem with penalties

Expand
  • 1. School of Computer Science and Technology, Shandong Jianzhu University, Jinan 250101, Shandong, China
    2. School of Software, Shandong University, Jinan 250101, Shandong, China
    3. School of Mathematical Sciences, Qufu Normal University, Qufu 273165, Shandong, China

Received date: 2021-07-21

  Online published: 2022-03-14

摘要

$k$-均值问题是聚类中的经典问题,亦是NP-难问题。如果允许数据点不聚类,而是支付惩罚费用,则引出带惩罚的$k$-均值问题。本文将带惩罚的$k$-均值问题从欧氏距离推广到更一般的$\mu$-相似Bregman散度,研究了带惩罚$\mu$-相似Bregman散度$k$-均值问题的初始化算法。本文给出的初始化算法,近似比与$\mu$和数据点惩罚最大值与最小值的比例$r$相关。

本文引用格式

刘文杰, 张冬梅, 张鹏, 邹娟 . 带惩罚μ-相似Bregman散度k-均值问题的初始化算法[J]. 运筹学学报, 2022 , 26(1) : 99 -112 . DOI: 10.15960/j.cnki.issn.1007-6093.2022.01.007

Abstract

The $k$-means problem is a classic NP-hard problem in clustering analysis. If somedata points are allowed not to be clustered but to pay some penalty, the $k$-means problem with penalty is induced. In this paper, weextend the $k$-means problem with penalty from Euclidean distance tothe more general $\mu $-similar Bregman divergences, and study theinitialization algorithm for the $\mu $-similar Bregman divergences $k$-means problem with penalty. The seeding algorithm is given wherethe approximation ratio is only related to $\mu $ and the ratio ofthe maximum value to the minimum value of the data point penalty.

参考文献

1 Aloise D , Deshpande A , Hansen P , et al. NP-hardness of Euclidean sum-of-squares clustering[J]. Machine Learning, 2009, 75, 245- 248.
2 Drineas P , Frieze A , Kannan R , et al. Clustering large graphs via the singular value decomposition[J]. Machine Learning, 2004, 56, 9- 33.
3 Awasthi P, Charikar M, Krishnaswamy R, et al. The hardness of approximation of Euclidean k-means[C]//Proceedings of SoCG, 2015: 754-767.
4 Bachem O, Lucic M, Hassani S H, et al. Approximate k-means++ in sublinear time[C]//Proceedings of AAAI, 2016: 1459-1467.
5 Bl?mer J, Lammersen C, Schmidt M, et al. Theoretical analysis of the k-means algorithm|A survey[M]//Algorithm Engineering|Selected Results and Surveys, Cham: Springer, 2016: 81-116.
6 Kanungo T , Mount D M , Netanyahu N S , et al. A local search approximation algorithm for k-means clustering[J]. Computational Geometry: Theory and Applications, 2004, 28, 89- 112.
7 Lee E , Schmidt M , Wright J . Improved and simplified inapproximability for k-means[J]. Information Processing Letters, 2017, 120, 40- 43.
8 Xu D , Xu Y , Zhang D . A survey on algorithm for k-means problem and its variants[J]. Operations Research Transactions, 2017, 21, 101- 109.
9 Ahmadian S, Norouzi-Fard A, Svensson O, et al. Better guarantees for k-means and Euclidean k-median by primal-dual algorithms[C]//Proceedings of FOCS, 2017: 61-72.
10 Lloyd S . Least squares quantization in PCM[J]. IEEE Transactions on Information Theory, 1982, 28, 21- 33.
11 Arthur D, Vassilvitskii S. k-means++: The advantages of careful seeding[C]//Proceedings of SODA, 2007: 1027-1035.
12 Aggarwal A, Deshpande A, Kannan R. Adaptive sampling for k-means clustering[C]//Proceedings of APPROX and RANDOM, 2009: 15-28.
13 Bachem O, Lucic M, Hassani S H, et al. Fast and provably good seedings for k-means[C]//Proceedings of NIPS, 2016: 55-63.
14 Bachem O, Lucic M, Krause A. Distributed and provably good seedings for k-means in constant rounds[C]//Proceedings of ICML, 2017: 292-300.
15 Bahmani B, Moseley B, Vattani A, et al. Scalable k-means++[C]//Proceedings of the VLDB Endowment, 2012: 622-633.
16 Ostrovsky R , Rabani Y , Schulman L , et al. The effectiveness of Lloyd-type methods for the k-means problem[J]. Journal of the ACM, 2012, 59, 1- 22.
17 Tseng G C . Penalized and weighted k-means for clustering with scattered objects and prior information in high-throughput biological data[J]. Bioinformatics, 2007, 23, 2247- 2255.
18 Zhang D, Hao C, Wu C, et al. A local search approximation algorithm for the k-means problem with penalties[C]//Proceedings of COCOON, 2017: 568-574.
19 Chang X Y, Wang Y, Li R, et al. Sparse k-means with l/l0 penalty for high-dimensional data clustering[EB/OL]. (2014-03-31)[2021-07-08]. https://arxiv.org/pdf/1403.7890.pdf.
20 Ackermann M R. Algorithms for the Bregman k-Median problem[D]. Paderborn: University of Paderborn, 2009.
21 Banerjee A , Merugu S , Dhillon I S , et al. Clustering with Bregman divergences[J]. Journal of Machine Learning Research, 2005, 6, 1705- 1749.
22 Li M , Xu D , Yue J , et al. The seeding algorithm for k-means problem with penalties[J]. Journal of Combinatorial Optimization, 2020, 39, 15- 32.
文章导航

/