带惩罚μ-相似Bregman散度k-均值问题的初始化算法

刘文杰; 张冬梅; 张鹏; 邹娟

doi:10.15960/j.cnki.issn.1007-6093.2022.01.007

运筹学学报 >

2022 , Vol. 26 >Issue 1: 99 - 112

DOI: https://doi.org/10.15960/j.cnki.issn.1007-6093.2022.01.007

带惩罚μ-相似Bregman散度k-均值问题的初始化算法

刘文杰 ,
张冬梅 ,
张鹏 ,
邹娟

展开

1. 山东建筑大学计算机科学与技术学院, 山东济南 250101
2. 山东大学软件学院, 山东济南 250101
3. 曲阜师范大学数学科学学院, 山东曲阜 273165

张冬梅 E-mail: zhangdongmei@sdjzu.edu.cn

收稿日期: 2021-07-21

网络出版日期: 2022-03-14

基金资助

国家自然科学基金(11871081);国家自然科学基金(61972228);国家自然科学基金(11801310);山东省自然科学基金(ZR2021MF006)

收起

The seeding algorithm for $\mu$-similar Bregman divergences $k$-means problem with penalties

Wenjie LIU ,
Dongmei ZHANG ,
Peng ZHANG ,
Juan ZOU

Expand

1. School of Computer Science and Technology, Shandong Jianzhu University, Jinan 250101, Shandong, China
2. School of Software, Shandong University, Jinan 250101, Shandong, China
3. School of Mathematical Sciences, Qufu Normal University, Qufu 273165, Shandong, China

Received date: 2021-07-21

Online published: 2022-03-14

Fold

摘要

$k$-均值问题是聚类中的经典问题，亦是NP-难问题。如果允许数据点不聚类，而是支付惩罚费用，则引出带惩罚的$k$-均值问题。本文将带惩罚的$k$-均值问题从欧氏距离推广到更一般的$\mu$-相似Bregman散度，研究了带惩罚$\mu$-相似Bregman散度$k$-均值问题的初始化算法。本文给出的初始化算法，近似比与$\mu$和数据点惩罚最大值与最小值的比例$r$相关。

关键词： 近似算法; $k$-均值; 惩罚; $\mu$-相似Bregman散度; 初始化算法

本文引用格式

刘文杰 , 张冬梅 , 张鹏 , 邹娟 . 带惩罚μ-相似Bregman散度k-均值问题的初始化算法[J]. 运筹学学报, 2022 , 26(1) : 99 -112 . DOI: 10.15960/j.cnki.issn.1007-6093.2022.01.007

Abstract

The $k$-means problem is a classic NP-hard problem in clustering analysis. If somedata points are allowed not to be clustered but to pay some penalty, the $k$-means problem with penalty is induced. In this paper, weextend the $k$-means problem with penalty from Euclidean distance tothe more general $\mu $-similar Bregman divergences, and study theinitialization algorithm for the $\mu $-similar Bregman divergences $k$-means problem with penalty. The seeding algorithm is given wherethe approximation ratio is only related to $\mu $ and the ratio ofthe maximum value to the minimum value of the data point penalty.

Key words： approximation algorithm; $k$-means; penalty; $\mu$-similar Bregman divergences; seeding algorithm

参考文献

1	Aloise D , Deshpande A , Hansen P , et al. NP-hardness of Euclidean sum-of-squares clustering[J]. Machine Learning, 2009, 75, 245- 248.
2	Drineas P , Frieze A , Kannan R , et al. Clustering large graphs via the singular value decomposition[J]. Machine Learning, 2004, 56, 9- 33.
3	Awasthi P, Charikar M, Krishnaswamy R, et al. The hardness of approximation of Euclidean k-means[C]//Proceedings of SoCG, 2015: 754-767.
4	Bachem O, Lucic M, Hassani S H, et al. Approximate k-means++ in sublinear time[C]//Proceedings of AAAI, 2016: 1459-1467.
5	Bl?mer J, Lammersen C, Schmidt M, et al. Theoretical analysis of the k-means algorithm\|A survey[M]//Algorithm Engineering\|Selected Results and Surveys, Cham: Springer, 2016: 81-116.
6	Kanungo T , Mount D M , Netanyahu N S , et al. A local search approximation algorithm for k-means clustering[J]. Computational Geometry: Theory and Applications, 2004, 28, 89- 112.
7	Lee E , Schmidt M , Wright J . Improved and simplified inapproximability for k-means[J]. Information Processing Letters, 2017, 120, 40- 43.
8	Xu D , Xu Y , Zhang D . A survey on algorithm for k-means problem and its variants[J]. Operations Research Transactions, 2017, 21, 101- 109.
9	Ahmadian S, Norouzi-Fard A, Svensson O, et al. Better guarantees for k-means and Euclidean k-median by primal-dual algorithms[C]//Proceedings of FOCS, 2017: 61-72.
10	Lloyd S . Least squares quantization in PCM[J]. IEEE Transactions on Information Theory, 1982, 28, 21- 33.
11	Arthur D, Vassilvitskii S. k-means++: The advantages of careful seeding[C]//Proceedings of SODA, 2007: 1027-1035.
12	Aggarwal A, Deshpande A, Kannan R. Adaptive sampling for k-means clustering[C]//Proceedings of APPROX and RANDOM, 2009: 15-28.
13	Bachem O, Lucic M, Hassani S H, et al. Fast and provably good seedings for k-means[C]//Proceedings of NIPS, 2016: 55-63.
14	Bachem O, Lucic M, Krause A. Distributed and provably good seedings for k-means in constant rounds[C]//Proceedings of ICML, 2017: 292-300.
15	Bahmani B, Moseley B, Vattani A, et al. Scalable k-means++[C]//Proceedings of the VLDB Endowment, 2012: 622-633.
16	Ostrovsky R , Rabani Y , Schulman L , et al. The effectiveness of Lloyd-type methods for the k-means problem[J]. Journal of the ACM, 2012, 59, 1- 22.
17	Tseng G C . Penalized and weighted k-means for clustering with scattered objects and prior information in high-throughput biological data[J]. Bioinformatics, 2007, 23, 2247- 2255.
18	Zhang D, Hao C, Wu C, et al. A local search approximation algorithm for the k-means problem with penalties[C]//Proceedings of COCOON, 2017: 568-574.
19	Chang X Y, Wang Y, Li R, et al. Sparse k-means with l_∞/l₀ penalty for high-dimensional data clustering[EB/OL]. (2014-03-31)[2021-07-08].
20	Ackermann M R. Algorithms for the Bregman k-Median problem[D]. Paderborn: University of Paderborn, 2009.
21	Banerjee A , Merugu S , Dhillon I S , et al. Clustering with Bregman divergences[J]. Journal of Machine Learning Research, 2005, 6, 1705- 1749.
22	Li M , Xu D , Yue J , et al. The seeding algorithm for k-means problem with penalties[J]. Journal of Combinatorial Optimization, 2020, 39, 15- 32.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献