The seeding algorithm for $\mu$-similar Bregman divergences $k$-means problem with penalties

Wenjie LIU; Dongmei ZHANG; Peng ZHANG; Juan ZOU

doi:10.15960/j.cnki.issn.1007-6093.2022.01.007

Operations Research Transactions >

2022 , Vol. 26 >Issue 1: 99 - 112

DOI: https://doi.org/10.15960/j.cnki.issn.1007-6093.2022.01.007

The seeding algorithm for $\mu$-similar Bregman divergences $k$-means problem with penalties

Wenjie LIU ,
Dongmei ZHANG ,
Peng ZHANG ,
Juan ZOU

Expand

1. School of Computer Science and Technology, Shandong Jianzhu University, Jinan 250101, Shandong, China
2. School of Software, Shandong University, Jinan 250101, Shandong, China
3. School of Mathematical Sciences, Qufu Normal University, Qufu 273165, Shandong, China

Received date: 2021-07-21

Online published: 2022-03-14

Fold

Abstract

The $k$-means problem is a classic NP-hard problem in clustering analysis. If somedata points are allowed not to be clustered but to pay some penalty, the $k$-means problem with penalty is induced. In this paper, weextend the $k$-means problem with penalty from Euclidean distance tothe more general $\mu $-similar Bregman divergences, and study theinitialization algorithm for the $\mu $-similar Bregman divergences $k$-means problem with penalty. The seeding algorithm is given wherethe approximation ratio is only related to $\mu $ and the ratio ofthe maximum value to the minimum value of the data point penalty.

Key words： approximation algorithm; $k$-means; penalty; $\mu$-similar Bregman divergences; seeding algorithm

Cite this article

Wenjie LIU , Dongmei ZHANG , Peng ZHANG , Juan ZOU . The seeding algorithm for $\mu$-similar Bregman divergences $k$-means problem with penalties[J]. Operations Research Transactions, 2022 , 26(1) : 99 -112 . DOI: 10.15960/j.cnki.issn.1007-6093.2022.01.007

References

1	Aloise D , Deshpande A , Hansen P , et al. NP-hardness of Euclidean sum-of-squares clustering[J]. Machine Learning, 2009, 75, 245- 248.
2	Drineas P , Frieze A , Kannan R , et al. Clustering large graphs via the singular value decomposition[J]. Machine Learning, 2004, 56, 9- 33.
3	Awasthi P, Charikar M, Krishnaswamy R, et al. The hardness of approximation of Euclidean k-means[C]//Proceedings of SoCG, 2015: 754-767.
4	Bachem O, Lucic M, Hassani S H, et al. Approximate k-means++ in sublinear time[C]//Proceedings of AAAI, 2016: 1459-1467.
5	Bl?mer J, Lammersen C, Schmidt M, et al. Theoretical analysis of the k-means algorithm\|A survey[M]//Algorithm Engineering\|Selected Results and Surveys, Cham: Springer, 2016: 81-116.
6	Kanungo T , Mount D M , Netanyahu N S , et al. A local search approximation algorithm for k-means clustering[J]. Computational Geometry: Theory and Applications, 2004, 28, 89- 112.
7	Lee E , Schmidt M , Wright J . Improved and simplified inapproximability for k-means[J]. Information Processing Letters, 2017, 120, 40- 43.
8	Xu D , Xu Y , Zhang D . A survey on algorithm for k-means problem and its variants[J]. Operations Research Transactions, 2017, 21, 101- 109.
9	Ahmadian S, Norouzi-Fard A, Svensson O, et al. Better guarantees for k-means and Euclidean k-median by primal-dual algorithms[C]//Proceedings of FOCS, 2017: 61-72.
10	Lloyd S . Least squares quantization in PCM[J]. IEEE Transactions on Information Theory, 1982, 28, 21- 33.
11	Arthur D, Vassilvitskii S. k-means++: The advantages of careful seeding[C]//Proceedings of SODA, 2007: 1027-1035.
12	Aggarwal A, Deshpande A, Kannan R. Adaptive sampling for k-means clustering[C]//Proceedings of APPROX and RANDOM, 2009: 15-28.
13	Bachem O, Lucic M, Hassani S H, et al. Fast and provably good seedings for k-means[C]//Proceedings of NIPS, 2016: 55-63.
14	Bachem O, Lucic M, Krause A. Distributed and provably good seedings for k-means in constant rounds[C]//Proceedings of ICML, 2017: 292-300.
15	Bahmani B, Moseley B, Vattani A, et al. Scalable k-means++[C]//Proceedings of the VLDB Endowment, 2012: 622-633.
16	Ostrovsky R , Rabani Y , Schulman L , et al. The effectiveness of Lloyd-type methods for the k-means problem[J]. Journal of the ACM, 2012, 59, 1- 22.
17	Tseng G C . Penalized and weighted k-means for clustering with scattered objects and prior information in high-throughput biological data[J]. Bioinformatics, 2007, 23, 2247- 2255.
18	Zhang D, Hao C, Wu C, et al. A local search approximation algorithm for the k-means problem with penalties[C]//Proceedings of COCOON, 2017: 568-574.
19	Chang X Y, Wang Y, Li R, et al. Sparse k-means with l_∞/l₀ penalty for high-dimensional data clustering[EB/OL]. (2014-03-31)[2021-07-08].
20	Ackermann M R. Algorithms for the Bregman k-Median problem[D]. Paderborn: University of Paderborn, 2009.
21	Banerjee A , Merugu S , Dhillon I S , et al. Clustering with Bregman divergences[J]. Journal of Machine Learning Research, 2005, 6, 1705- 1749.
22	Li M , Xu D , Yue J , et al. The seeding algorithm for k-means problem with penalties[J]. Journal of Combinatorial Optimization, 2020, 39, 15- 32.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References