带惩罚的相同容量k-均值问题的局部搜索算法

展开
  • 1. 北京工业大学数学学院运筹学与信息工程系, 北京 100124
    2. 山东师范大学数学与统计学院, 山东济南 250014
    3. 浙江师范大学数学与计算机科学学院, 浙江金华 321004
刘茜   E-mail: lqqsh@163.com

收稿日期: 2021-06-10

  网络出版日期: 2022-03-14

基金资助

国家自然科学基金(12131003);山东省自然科学基金(ZR2020MA029);山东省自然科学基金(ZR2021MA100)

A local search analysis for the uniform capacitated $k$-means problem with penalty

Expand
  • 1. Department of Operations Research and Information Engineering, College of Mathematics, Beijing University of Technology, Beijing 100124, China
    2. School of Mathematics and Statistics, Shandong Normal University, Jinan 250014, Shandong, China
    3. College of Mathematics and Computer Science, Zhejiang Normal University, Jinhua 321004, Zhejing, China

Received date: 2021-06-10

  Online published: 2022-03-14

摘要

经典$k$-均值问题是一类应用广泛的聚类问题,它是指给定$\mathbb{R}^d$中观测点集合$D$和整数$k$,目的是在空间中寻找$k$个点作为中心集合$S$,使得集合$D$中的每个观测点到$S$中离它最近的中心的距离平方求和最小。这是个NP-难问题。经典$k$-均值问题有很多推广,本文研究的带惩罚的相同容量$k$-均值问题就是其中之一。与经典$k$-均值问题相比,惩罚性质是指每个观测点都给定惩罚费用,当某个观测点到最近中心的距离大于惩罚费用时,其对目标函数的贡献就用该观测点的惩罚费用来代替最近的距离的平方,相同容量约束要求每个中心至多连接$U$个观测点。针对这种问题,我们设计了局部搜索算法,该算法在至多选取$(3+\alpha)k$个中心的情况下,可以达到$\beta$-近似,其中,参数$\alpha>34$,$\beta>\frac{\alpha+34}{\alpha-34}$。

本文引用格式

剧嘉琛, 刘茜, 张昭, 周洋 . 带惩罚的相同容量k-均值问题的局部搜索算法[J]. 运筹学学报, 2022 , 26(1) : 113 -124 . DOI: 10.15960/j.cnki.issn.1007-6093.2022.01.008

Abstract

The classical $k$-means problem has numerous application in the real world. Given a set of data points $D$ in ${\mathbb R}^d$ and an integer $k$, the aim is to select $k$ points in ${\mathbb R}^d$ as a central set $S$, such that the sum of the square of the distance between each point and its closest center in $S$ is minimized. It is a NP-hard problem and has many generalizations, one of which is the uniform capacitated $k$-means problem with penalty. Compared with the classical $k$-means problem, the penalty implies that each data point has a penalty cost, when the distance from some data points to their nearest center is greater than the penalty cost, they contribute the penalty cost. As for the uniform capacity, it suggests that each center is connected at most $U$ times. For this variant problem, we design a local search algorithm which selects at most $(3+\alpha)k$ centers and reaches $\beta$-approximate, where $\alpha>34$, $\beta>\frac{\alpha+34}{\alpha-34}$.

参考文献

1 Drineas P , Frieze A , Kannan R , et al. Clustering large graphs via the singular value decomposition[J]. Machine Learning, 2004, 56 (1): 9- 33.
2 Lloyd S . Least squares quantization in PCM[J]. IEEE Transactions on Information Theory, 1982, 28 (2): 129- 137.
3 Agarwal P K, Mustafa N H. k-means projective clustering[C]//Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 2004: 155-165.
4 Gibou F, Fedkiw R. A fast hybrid k-means level set algorithm for segmentation[C]//Proceedings of the 4th Annual Hawaii International Conference on Statistics and Mathematics, 2005: 281-291.
5 Herwig R , Poustka A J , Muller C , et al. Large-scale clustering of cDNA-fingerprinting data[J]. Genome Research, 1999, 9 (11): 1093- 1105.
6 Arthur D, Vassilvitskii S. k-means++: the advantages of careful seeding[C]//Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms, 2007: 1027-1035.
7 Khan S S , Ahmad A . Cluster center initialization algorithm for k-means clustering[J]. Pattern Recognition Letters, 2004, 25 (11): 1293- 1302.
8 Ahmadian S, Norouzi-Fard A, Svensson O, et al. Better guarantees for k-means and euclidean k-median by primal-dual algorithms[C]//Proceedings of the 58th Annual IEEE Symposium on Foundations of Computer Science, 2017: 61-72.
9 Krishnaswamy R, Li S, Sandeep S. Constant approximation for k-median and k-means with outliers via iterative rounding[C]//Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, 2018: 646-659.
10 Zhang D , Hao C , Wu C , et al. Local search approximation algorithms for the k-means problem with penalties[J]. Journal of Combinatorial Optimization, 2019, 37 (2): 439- 453.
11 Feng Q, Zhang Z, Shi F, et al. An improved approximation algorithm for the k-means problem with penalties[C]//Proceedings of the 2nd Annual International Workshop on Frontiers in Algorithmics, 2019: 170-181.
12 Wang Y, Möhring R H, Wu C, et al. Outliers detection is not so hard: approximation algorithms for robust clustering problems using local search techniques[EB/OL]. (2020-12-30)[2021-05-28]. https://arxiv.org/abs/2012.10884v2.
13 Geetha S , Poonthalir G , Vanathi P T . Improved k-means algorithm for capacitated clustering problem[J]. Journal of Computer Science, 2009, 8 (4): 52- 59.
14 Koskosidis Y A , Powell W B . Clustering algorithms for consolidation of customer orders into vehicle shipments[J]. Transportation Research Part B: Methodological, 1992, 26 (5): 365- 379.
15 Mulvey J M , Beck M P . Solving capacitated clustering problems[J]. European Journal of Operational Research, 1984, 18 (3): 339- 348.
16 Osman I H , Christofides N . Capacitated clustering problems by hybrid simulated annealing and tabu search[J]. International Transactions in Operational Research, 1994, 1 (3): 317- 336.
17 Shieh H M , May M D . Solving the capacitated clustering problem with genetic algorithms[J]. Journal of the Chinese Institute of Industrial Engineers, 2001, 18 (3): 1- 12.
18 Xu Y , Xu D , Du D , et al. Improved approximation algorithm for universal facility location problem with linear penalties[J]. Theoretical Computer Science, 2019, 774 (25): 143- 151.
19 Xu Y , Xu D , Du D , et al. Local search algorithm for universal facility location problem with linear penalties[J]. Journal of Global Optimization, 2017, 67 (1-2): 367- 378.
20 Li S. On uniform capacitated k-median beyond the natural LP relaxation[C]//Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms, 2015: 696-707.
21 Li S. Approximating capacitated k-median with (1+ ε)k open facilities[C]//Proceedings of the 27th annual ACM-SIAM symposium on Discrete algorithms, 2016: 786-796.
22 Demirci G, Li S. Constant approximation for capacitated k-median with (1+ε)-capacity violation[C]//Proceedings of the 43rd International Colloquium on Automata, Languages, and Programming, 2016: 262-274.
23 Han L , Xu D , Du D , et al. A local search approximation algorithm for the uniform capacitated k-facility location problem[J]. Journal of Combinatorial Optimization, 2018, 35 (2): 409- 423.
24 Han L , Xu D , Du D , et al. An approximation algorithm for the uniform capacitated k-means problem[J]. Journal of Combinatorial Optimization, 2020,
25 Cygan M, Hajiaghayi M T, Khuller S. LP rounding for k-centers with non-uniform hard capacities[C]//Proceedings of 53rd IEEE Symposium on Foundations of Computer Science, 2012: 273-282.
26 Ding H, Xu J. Solving the chromatic cone clustering problem via minimum spanning sphere[C]//Proceedings of 38th International Colloquium on Automata, Languages and Programming, 2011: 773-784.
27 Lammersen C, Schmidt M, Sohler C. Probabilistic k-median clustering in data streams[C]//Proceedings of 10th International Workshop on Approximation and Online Algorithms, 2012: 70-81.
28 Li J, Yi K, Zhang Q. Clustering with diversity[C]//Proceedings of 37th International Colloquium on Automata Languages and Programming, 2010: 188-200.
29 Sweeney L . k-Anonymity: a model for protecting privacy[J]. International Journal of Uncertainty Fuzziness and Knowledge-Based Systems, 2002, 10 (5): 557- 570.
30 Ding H, Xu J. A unified framework for clustering constrained data without locality property[C]//Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms, 2015: 1471-1490.
31 Papadimitriou C H , Stegilitz K . Combinatorial Optimization: Algorithms and Complexity[M]. New York: Dover Publication, 1982.
32 Korupolu M R , Plaxton C G , Rajaraman R . Analysis of a local search heuristic for facility location problems[J]. Journal of Algorithms, 2000, 37 (1): 146- 188.
文章导航

/