运筹学学报 ›› 2022, Vol. 26 ›› Issue (1): 113-124.doi: 10.15960/j.cnki.issn.1007-6093.2022.01.008

• • 上一篇    下一篇

带惩罚的相同容量k-均值问题的局部搜索算法

剧嘉琛1, 刘茜2*,*(), 张昭3, 周洋2   

  1. 1. 北京工业大学数学学院运筹学与信息工程系, 北京 100124
    2. 山东师范大学数学与统计学院, 山东济南 250014
    3. 浙江师范大学数学与计算机科学学院, 浙江金华 321004
  • 收稿日期:2021-06-10 出版日期:2022-03-15 发布日期:2022-03-14
  • 通讯作者: 刘茜 E-mail:lqqsh@163.com
  • 作者简介:刘茜   E-mail: lqqsh@163.com
  • 基金资助:
    国家自然科学基金(12131003);山东省自然科学基金(ZR2020MA029);山东省自然科学基金(ZR2021MA100)

A local search analysis for the uniform capacitated $k$-means problem with penalty

Jiachen JU1, Qian LIU2*,*(), Zhao ZHANG3, Yang ZHOU2   

  1. 1. Department of Operations Research and Information Engineering, College of Mathematics, Beijing University of Technology, Beijing 100124, China
    2. School of Mathematics and Statistics, Shandong Normal University, Jinan 250014, Shandong, China
    3. College of Mathematics and Computer Science, Zhejiang Normal University, Jinhua 321004, Zhejing, China
  • Received:2021-06-10 Online:2022-03-15 Published:2022-03-14
  • Contact: Qian LIU E-mail:lqqsh@163.com

摘要:

经典$k$-均值问题是一类应用广泛的聚类问题,它是指给定$\mathbb{R}^d$中观测点集合$D$和整数$k$,目的是在空间中寻找$k$个点作为中心集合$S$,使得集合$D$中的每个观测点到$S$中离它最近的中心的距离平方求和最小。这是个NP-难问题。经典$k$-均值问题有很多推广,本文研究的带惩罚的相同容量$k$-均值问题就是其中之一。与经典$k$-均值问题相比,惩罚性质是指每个观测点都给定惩罚费用,当某个观测点到最近中心的距离大于惩罚费用时,其对目标函数的贡献就用该观测点的惩罚费用来代替最近的距离的平方,相同容量约束要求每个中心至多连接$U$个观测点。针对这种问题,我们设计了局部搜索算法,该算法在至多选取$(3+\alpha)k$个中心的情况下,可以达到$\beta$-近似,其中,参数$\alpha>34$,$\beta>\frac{\alpha+34}{\alpha-34}$。

关键词: $k$-均值问题, 惩罚, 相同容量, 双准则, 局部搜索, 近似算法

Abstract:

The classical $k$-means problem has numerous application in the real world. Given a set of data points $D$ in ${\mathbb R}^d$ and an integer $k$, the aim is to select $k$ points in ${\mathbb R}^d$ as a central set $S$, such that the sum of the square of the distance between each point and its closest center in $S$ is minimized. It is a NP-hard problem and has many generalizations, one of which is the uniform capacitated $k$-means problem with penalty. Compared with the classical $k$-means problem, the penalty implies that each data point has a penalty cost, when the distance from some data points to their nearest center is greater than the penalty cost, they contribute the penalty cost. As for the uniform capacity, it suggests that each center is connected at most $U$ times. For this variant problem, we design a local search algorithm which selects at most $(3+\alpha)k$ centers and reaches $\beta$-approximate, where $\alpha>34$, $\beta>\frac{\alpha+34}{\alpha-34}$.

Key words: $k$-means problems, penalty, uniform capacity, bi-criteria, local search, approximation algorithms

中图分类号: