带惩罚的相同容量k-均值问题的局部搜索算法

doi:10.15960/j.cnki.issn.1007-6093.2022.01.008

运筹学学报 ›› 2022, Vol. 26 ›› Issue (1): 113-124.doi: 10.15960/j.cnki.issn.1007-6093.2022.01.008

带惩罚的相同容量k-均值问题的局部搜索算法

剧嘉琛¹, 刘茜^2*^,*(), 张昭³, 周洋²

1. 北京工业大学数学学院运筹学与信息工程系, 北京 100124
2. 山东师范大学数学与统计学院, 山东济南 250014
3. 浙江师范大学数学与计算机科学学院, 浙江金华 321004

收稿日期:2021-06-10 出版日期:2022-03-15 发布日期:2022-03-14
通讯作者: 刘茜 E-mail:lqqsh@163.com
作者简介:刘茜 E-mail: lqqsh@163.com
基金资助:
国家自然科学基金(12131003);山东省自然科学基金(ZR2020MA029);山东省自然科学基金(ZR2021MA100)

A local search analysis for the uniform capacitated $k$-means problem with penalty

Jiachen JU¹, Qian LIU^2*^,*(), Zhao ZHANG³, Yang ZHOU²

1. Department of Operations Research and Information Engineering, College of Mathematics, Beijing University of Technology, Beijing 100124, China
2. School of Mathematics and Statistics, Shandong Normal University, Jinan 250014, Shandong, China
3. College of Mathematics and Computer Science, Zhejiang Normal University, Jinhua 321004, Zhejing, China

Received:2021-06-10 Online:2022-03-15 Published:2022-03-14
Contact: Qian LIU E-mail:lqqsh@163.com

摘要/Abstract

摘要：

经典$k$-均值问题是一类应用广泛的聚类问题，它是指给定$\mathbb{R}^d$中观测点集合$D$和整数$k$，目的是在空间中寻找$k$个点作为中心集合$S$，使得集合$D$中的每个观测点到$S$中离它最近的中心的距离平方求和最小。这是个NP-难问题。经典$k$-均值问题有很多推广，本文研究的带惩罚的相同容量$k$-均值问题就是其中之一。与经典$k$-均值问题相比，惩罚性质是指每个观测点都给定惩罚费用，当某个观测点到最近中心的距离大于惩罚费用时，其对目标函数的贡献就用该观测点的惩罚费用来代替最近的距离的平方，相同容量约束要求每个中心至多连接$U$个观测点。针对这种问题，我们设计了局部搜索算法，该算法在至多选取$（3+\alpha）k$个中心的情况下，可以达到$\beta$-近似，其中，参数$\alpha>34$，$\beta>\frac{\alpha+34}{\alpha-34}$。

关键词: $k$-均值问题, 惩罚, 相同容量, 双准则, 局部搜索, 近似算法

Abstract:

The classical $k$-means problem has numerous application in the real world. Given a set of data points $D$ in ${\mathbb R}^d$ and an integer $k$, the aim is to select $k$ points in ${\mathbb R}^d$ as a central set $S$, such that the sum of the square of the distance between each point and its closest center in $S$ is minimized. It is a NP-hard problem and has many generalizations, one of which is the uniform capacitated $k$-means problem with penalty. Compared with the classical $k$-means problem, the penalty implies that each data point has a penalty cost, when the distance from some data points to their nearest center is greater than the penalty cost, they contribute the penalty cost. As for the uniform capacity, it suggests that each center is connected at most $U$ times. For this variant problem, we design a local search algorithm which selects at most $(3+\alpha)k$ centers and reaches $\beta$-approximate, where $\alpha>34$, $\beta>\frac{\alpha+34}{\alpha-34}$.

Key words: $k$-means problems, penalty, uniform capacity, bi-criteria, local search, approximation algorithms

中图分类号:

O221.7

剧嘉琛, 刘茜, 张昭, 周洋. 带惩罚的相同容量k-均值问题的局部搜索算法[J]. 运筹学学报, 2022, 26(1): 113-124.

Jiachen JU, Qian LIU, Zhao ZHANG, Yang ZHOU. A local search analysis for the uniform capacitated $k$-means problem with penalty[J]. Operations Research Transactions, 2022, 26(1): 113-124.

图/表 1

参考文献 32

1	Drineas P , Frieze A , Kannan R , et al. Clustering large graphs via the singular value decomposition[J]. Machine Learning, 2004, 56 (1): 9- 33.
2	Lloyd S . Least squares quantization in PCM[J]. IEEE Transactions on Information Theory, 1982, 28 (2): 129- 137. doi: 10.1109/TIT.1982.1056489
3	Agarwal P K, Mustafa N H. k-means projective clustering[C]//Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 2004: 155-165.
4	Gibou F, Fedkiw R. A fast hybrid k-means level set algorithm for segmentation[C]//Proceedings of the 4th Annual Hawaii International Conference on Statistics and Mathematics, 2005: 281-291.
5	Herwig R , Poustka A J , Muller C , et al. Large-scale clustering of cDNA-fingerprinting data[J]. Genome Research, 1999, 9 (11): 1093- 1105. doi: 10.1101/gr.9.11.1093
6	Arthur D, Vassilvitskii S. k-means++: the advantages of careful seeding[C]//Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms, 2007: 1027-1035.
7	Khan S S , Ahmad A . Cluster center initialization algorithm for k-means clustering[J]. Pattern Recognition Letters, 2004, 25 (11): 1293- 1302. doi: 10.1016/j.patrec.2004.04.007
8	Ahmadian S, Norouzi-Fard A, Svensson O, et al. Better guarantees for k-means and euclidean k-median by primal-dual algorithms[C]//Proceedings of the 58th Annual IEEE Symposium on Foundations of Computer Science, 2017: 61-72.
9	Krishnaswamy R, Li S, Sandeep S. Constant approximation for k-median and k-means with outliers via iterative rounding[C]//Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, 2018: 646-659.
10	Zhang D , Hao C , Wu C , et al. Local search approximation algorithms for the k-means problem with penalties[J]. Journal of Combinatorial Optimization, 2019, 37 (2): 439- 453. doi: 10.1007/s10878-018-0278-6
11	Feng Q, Zhang Z, Shi F, et al. An improved approximation algorithm for the k-means problem with penalties[C]//Proceedings of the 2nd Annual International Workshop on Frontiers in Algorithmics, 2019: 170-181.
12	Wang Y, Möhring R H, Wu C, et al. Outliers detection is not so hard: approximation algorithms for robust clustering problems using local search techniques[EB/OL]. (2020-12-30)[2021-05-28]. https://arxiv.org/abs/2012.10884v2.
13	Geetha S , Poonthalir G , Vanathi P T . Improved k-means algorithm for capacitated clustering problem[J]. Journal of Computer Science, 2009, 8 (4): 52- 59.
14	Koskosidis Y A , Powell W B . Clustering algorithms for consolidation of customer orders into vehicle shipments[J]. Transportation Research Part B: Methodological, 1992, 26 (5): 365- 379. doi: 10.1016/0191-2615(92)90032-R
15	Mulvey J M , Beck M P . Solving capacitated clustering problems[J]. European Journal of Operational Research, 1984, 18 (3): 339- 348. doi: 10.1016/0377-2217(84)90155-3
16	Osman I H , Christofides N . Capacitated clustering problems by hybrid simulated annealing and tabu search[J]. International Transactions in Operational Research, 1994, 1 (3): 317- 336. doi: 10.1016/0969-6016(94)90032-9
17	Shieh H M , May M D . Solving the capacitated clustering problem with genetic algorithms[J]. Journal of the Chinese Institute of Industrial Engineers, 2001, 18 (3): 1- 12. doi: 10.1080/10170660109509453
18	Xu Y , Xu D , Du D , et al. Improved approximation algorithm for universal facility location problem with linear penalties[J]. Theoretical Computer Science, 2019, 774 (25): 143- 151.
19	Xu Y , Xu D , Du D , et al. Local search algorithm for universal facility location problem with linear penalties[J]. Journal of Global Optimization, 2017, 67 (1-2): 367- 378. doi: 10.1007/s10898-015-0394-0
20	Li S. On uniform capacitated k-median beyond the natural LP relaxation[C]//Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms, 2015: 696-707.
21	Li S. Approximating capacitated k-median with (1+ ε)k open facilities[C]//Proceedings of the 27th annual ACM-SIAM symposium on Discrete algorithms, 2016: 786-796.
22	Demirci G, Li S. Constant approximation for capacitated k-median with (1+ε)-capacity violation[C]//Proceedings of the 43rd International Colloquium on Automata, Languages, and Programming, 2016: 262-274.
23	Han L , Xu D , Du D , et al. A local search approximation algorithm for the uniform capacitated k-facility location problem[J]. Journal of Combinatorial Optimization, 2018, 35 (2): 409- 423. doi: 10.1007/s10878-017-0179-0
24	Han L , Xu D , Du D , et al. An approximation algorithm for the uniform capacitated k-means problem[J]. Journal of Combinatorial Optimization, 2020,
25	Cygan M, Hajiaghayi M T, Khuller S. LP rounding for k-centers with non-uniform hard capacities[C]//Proceedings of 53rd IEEE Symposium on Foundations of Computer Science, 2012: 273-282.
26	Ding H, Xu J. Solving the chromatic cone clustering problem via minimum spanning sphere[C]//Proceedings of 38th International Colloquium on Automata, Languages and Programming, 2011: 773-784.
27	Lammersen C, Schmidt M, Sohler C. Probabilistic k-median clustering in data streams[C]//Proceedings of 10th International Workshop on Approximation and Online Algorithms, 2012: 70-81.
28	Li J, Yi K, Zhang Q. Clustering with diversity[C]//Proceedings of 37th International Colloquium on Automata Languages and Programming, 2010: 188-200.
29	Sweeney L . k-Anonymity: a model for protecting privacy[J]. International Journal of Uncertainty Fuzziness and Knowledge-Based Systems, 2002, 10 (5): 557- 570. doi: 10.1142/S0218488502001648
30	Ding H, Xu J. A unified framework for clustering constrained data without locality property[C]//Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms, 2015: 1471-1490.
31	Papadimitriou C H , Stegilitz K . Combinatorial Optimization: Algorithms and Complexity[M]. New York: Dover Publication, 1982.
32	Korupolu M R , Plaxton C G , Rajaraman R . Analysis of a local search heuristic for facility location problems[J]. Journal of Algorithms, 2000, 37 (1): 146- 188. doi: 10.1006/jagm.2000.1100

[1]	李建平, 蔡力健, 李陈筠然, 潘鹏翔. 限制性多源点偏心距增广问题[J]. 运筹学学报, 2022, 26(1): 60-68.
[2]	陈婳, 张国川. 经典一维装箱问题近似算法的研究进展[J]. 运筹学学报, 2022, 26(1): 69-84.
[3]	刘文杰, 张冬梅, 张鹏, 邹娟. 带惩罚μ-相似Bregman散度k-均值问题的初始化算法[J]. 运筹学学报, 2022, 26(1): 99-112.
[4]	李小玮, 成夏炎, 李荣珩. 带有线性惩罚的鲁棒k-种产品设施选址问题的近似算法[J]. 运筹学学报, 2021, 25(4): 31-44.
[5]	包晓光, 路超, 黄冬梅, 余炜. 混合图上最小-最大圈覆盖问题的近似算法[J]. 运筹学学报, 2021, 25(1): 107-113.
[6]	任建峰, 田晓云. 带有异常点的平方度量设施选址问题[J]. 运筹学学报, 2021, 25(1): 114-122.
[7]	张玉忠. 工件可拒绝排序问题综述[J]. 运筹学学报, 2020, 24(2): 111-130.
[8]	刘晓霞, 余山杉, 罗文昌. 工作有到达时间且拒绝工件总个数受限的单机平行分批排序问题的近似算法[J]. 运筹学学报, 2020, 24(1): 131-139.
[9]	王翼展, 张安, 陈永, 陈光亭. 堆取料机调度问题的一个近似算法[J]. 运筹学学报, 2020, 24(1): 147-154.
[10]	张国川, 陈林. 负载均衡问题[J]. 运筹学学报, 2019, 23(3): 1-14.
[11]	姜燕君, 徐大川, 张冬梅. 平方度量动态设施选址问题的近似算法[J]. 运筹学学报, 2018, 22(3): 49-58.
[12]	余善恩, 徐文洋, 刘光宇. 圆形区域分散布局问题研究[J]. 运筹学学报, 2018, 22(3): 132-138.
[13]	王一水, 徐大川, 吴晨晨. 关联聚类问题的半定规划舍入算法[J]. 运筹学学报, 2018, 22(1): 67-76.
[14]	陈中文, 赵奇, 卞凯. 非线性半定规划的逐次线性化柔性惩罚法[J]. 运筹学学报, 2017, 21(2): 84-100.
[15]	陈光亭, 陈蕾, 张安, 陈永. 可转包两台流水作业机排序的近似算法[J]. 运筹学学报, 2016, 20(4): 109-114.

带惩罚的相同容量k-均值问题的局部搜索算法

A local search analysis for the uniform capacitated $k$-means problem with penalty

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 1

参考文献 32

相关文章 15

编辑推荐

Metrics

本文评价