k-均值问题的差分隐私算法综述

展开
  • 1. 北京工业大学北京科学与工程计算研究院, 北京 100124
    2. 山东建筑大学计算机科学与技术学院, 山东济南 250101
张冬梅, E-mail: zhangdongmei@sdjzu.edu.cn

收稿日期: 2021-12-20

  网络出版日期: 2022-09-07

基金资助

国家自然科学基金(11871081);国家自然科学基金(12131003)

A survey of differential privacy algorithms for the $k$-means problem

Expand
  • 1. Beijing Institute for Scientific and Engineering Computing, Beijing University of Technology, Beijing 100124, China
    2. School of Computer Science and Technology, Shandong Jianzhu University, Jinan 250101, Shandong, China

Received date: 2021-12-20

  Online published: 2022-09-07

摘要

$k$-均值问题是机器学习和组合优化领域十分重要的问题。它是经典的NP-难问题, 被广泛的应用于数据挖掘、企业生产决策、图像处理、生物医疗科技等领域。随着时代的发展, 人们越来越注重于个人的隐私保护:在决策通常由人工智能算法做出的情况下, 如何保证尽可能多地从数据中挖掘更多信息,同时不泄露个人隐私。近十年来不断有专家学者研究探索带隐私保护的$k$-均值问题, 得到了许多具有理论指导意义和实际应用价值的结果, 本文主要介绍关于$k$-均值问题的差分隐私算法供读者参考。

本文引用格式

袁藩, 徐大川, 张冬梅 . k-均值问题的差分隐私算法综述[J]. 运筹学学报, 2022 , 26(3) : 1 -16 . DOI: 10.15960/j.cnki.issn.1007-6093.2022.03.001

Abstract

The $k$-means problem is a very important problem in the field of machine learning and combinatorial optimization. It is a classic NP-hard problem, which is widely used in data mining, business production decision-making, image processing, biomedical technology, and more. As people in these fields pay more and more attention to personal privacy protection, which raises a question: In the case that decisions are usually made by artificial intelligence algorithms, how to ensure that as much information as possible is extracted from the data without revealing personal privacy? In the past ten years, experts and scholars have continuously studied and explored the $k$-means problem with privacy protection, and has also obtained many results with theoretical guiding significance and practical application value. In this paper we mainly introduce these differential privacy algorithms of the $k$-means problem for readers.

参考文献

1 Dasgupta S. The hardness of $k$-means clustering[R]. San Diego: Department of Computer Science and Engineering, University of California, 2008: CS2008-0916.
2 AhmadianS,Norouzi-FardA,SvenssonO,et al.Better guarantees for $k$-means and euclidean $k$-median by primal-dual algorithms[J].SIAM Journal on Computing,2019,49(4):FOCS17-97-FOCS17-156.
3 张冬梅,李敏,徐大川,等.$k$-均值问题的理论与算法综述[J].中国科学: 数学,2020,50(9):1387-1404.
4 LuR,ZhuH,LiuX,et al.Toward efficient and privacy-preserving computing in big data era[J].IEEE Network,2014,28(4):46-50.
5 WangT,ZhengZ,RehmaniM H,et al.Privacy preservation in big data from the communication perspective–-A survey[J].IEEE Communications Surveys Tutorials,2018,21(1):753-778.
6 Yao X, Zhou X, Ma J. Differential privacy of big data: An overview[C]//Proceedings of the 2nd IEEE International Conference on Big Data Security on Cloud, High Performance and Smart Computing and Intelligent Data and Security, 2016: 7-12.
7 JainP,GyanchandaniM,KhareN.Differential privacy: its technological prescriptive using big data[J].Journal of Big Data,2018,5(1):1-24.
8 Skarkala M E, Maragoudakis M, Gritzalis S, et al. Privacy preservation by $k$-anonymization of weighted social networks[C]//Proceedings of the 4th IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2012: 423-428.
9 Zheleva E, Getoor L. Privacy in social networks: A survey[M]//Social Network Data Analytics, Boston: Springer, 2011: 277-306.
10 Task C, Clifton C. What should we protect? Defining differential privacy for social network analysis[M]//State of the Art Applications of Social Network Analysis. Cham: Springer, 2014: 139-161.
11 Gowtham M, Ahila S S. Privacy enhanced data communication protocol for wireless body area network[C]//Proceedings of the 4th International Conference on Advanced Computing and Communication Systems, 2017: 1-5.
12 LiM,LouW,RenK.Data security and privacy in wireless body area networks[J].IEEE Wireless communications,2010,17(1):51-58.
13 Abadi M, Chu A, Goodfellow I, et al. Deep learning with differential privacy[C]//Proceedings of the 16th ACM SIGSAC Conference on Computer and Communications Security, 2016: 308-318.
14 KaissisG A,MakowskiM R,RückertD,et al.Secure, privacy-preserving and federated machine learning in medical imaging[J].Nature Machine Intelligence,2020,2(6):305-311.
15 KaissisG,ZillerA,Passerat-PalmbachJ,et al.End-to-end privacy preserving deep learning on multi-institutional medical imaging[J].Nature Machine Intelligence,2021,3(6):473-484.
16 Chaturvedi A, Nguy$\rm \tilde{\hat{e}}$n H L, Zakynthinou L. Differentially private decomposable submodular maximization[C]//Proceedings of the 35th AAAI Conference on Artificial Intelligence, 2021: 6984-6992.
17 Mitrovic M, Bun M, Krause A, et al. Differentially private submodular maximization: data summarization in disguise[C]//Proceedings of the 34th International Conference on Machine Learning, 2017: 2478-2487.
18 Rafiey A, Yoshida Y. Fast and private submodular and $k$-submodular functions maximization with matroid constraints[C]//Proceedings of the 37th International Conference on Machine Learning, 2020: 7887-7897.
19 Gupta A, Ligett K, McSherry F, et al. Differentially private combinatorial optimization[C]//Proceedings of 21st Annual ACM-SIAM Symposium on Discrete Algorithms, 2010: 1106-1125.
20 Matou?ekJ.On approximate geometric $k$-clustering[J].Discrete and Computational Geometry,2000,24,61-84.
21 Dwork C. Differential privacy[C]//Proceedings of the 33rd International Colloquium on Automata, Languages, and Programming, 2006: 1-12.
22 Dwork C, McSherry F, Nissim K, et al. Calibrating noise to sensitivity in private data analysis[C]//Proceedings of the 3rd Theory of Cryptography Conference, 2006: 265-284.
23 Mcsherry F, Talwar K. Mechanism design via differential privacy[C]//Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science, 2007: 94-103.
24 McSherry F. Privacy integrated queries: an extensible platform for privacy-preserving data analysis[C]//Proceedings of the 35th ACM SIGMOD International Conference on Management of Data, 2009: 19-30.
25 Fisher C. Over 267 million facebook users reportedly had data exposed online[EB/OL]. [2021-12-20]. https://www.engadget.com/2019/12/19/facebook-data-exposed-online/.
26 Daniel V, Kershner I. Personal data of all 6.5 million israeli voters is exposed[EB/OL]. [2021-12-20]. https://www.nytimes.com/2020/02/10/world/middleeast/israeli-voters-leak.html.
27 Bassily R, Smith A, Thakurta A. Private empirical risk minimization: Efficient algorithms and tight error bounds[C]//Proceedings of the IEEE 55th Annual Symposium on Foundations of Computer Science, 2014: 464-473.
28 Balcan M F, Dick T, Liang Y, et al. Differentially private clustering in high-dimensional euclidean spaces[C]//Proceedings of the 34th International Conference on Machine Learning, 2017: 322-331.
29 DworkC,RothA.The algorithmic foundations of differential privacy[J].Foundations and Trends in Theoretical Computer Science,2014,9(3/4):211-407.
30 Wang Y, Wang Y X, Singh A. Differentially private subspace clustering[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems, 2015: 1000-1008.
31 Su D, Cao J, Li N, et al. Differentially private $k$-means clustering[C]//Proceedings of the 6th ACM Conference on Data and Application Security and Privacy, 2016: 26-37.
32 Feldman D, Xiang C, Zhu R, et al. Coresets for differentially private $k$-means clustering and applications to privacy in mobile sensor networks[C]//Proceedings of the 16th ACM/IEEE International Conference on Information Processing in Sensor Networks, 2017: 3-16.
33 Huang Z, Liu J. Optimal differentially private algorithms for $k$-means clustering[C]//Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 2018: 395-408.
34 Nock R, Canyasse R, Boreli R, et al. $k$-variates++: more pluses in the $k$-means++[C]//Proceedings of the 33rd International Conference on Machine Learning, 2016: 145-154.
35 Jones M, Nguyen H L, Nguyen T D. Differentially private clustering via maximum coverage[C]//Proceedings of the 35th AAAI Conference on Artificial Intelligence, 2021: 11555-11563.
36 Nguyen H L, Chaturvedi A, Xu E Z. Differentially private $k$-means via exponential mechanism and max cover[C]//Proceedings of the 35th AAAI Conference on Artificial Intelligence, 2021: 9101-9108.
37 Arthur D, Vassilvitskii S. $k$-means++: the advantages of careful seeding[C]//Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms, 2007: 1027-1035.
38 Nissim K, Stemmer U, Vadhan S P. Locating a small cluster privately[C]//Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 2016: 413-427.
39 Blum A, Dwork C, McSherry F, et al. Practical privacy: the SuLQ framework[C]//Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 2005: 128-138.
40 Nissim K, Raskhodnikova S, Smith A. Smooth sensitivity and sampling in private data analysis[C]//Proceedings of the 39th Annual ACM Symposium on Theory of Computing, 2007: 75-84.
41 Zhang J, Xiao X, Yang Y, et al. PrivGene: differentially private model fitting using genetic algorithms[C]//Proceedings of the 13th ACM SIGMOD International Conference on Management of Data, 2013: 665-676.
42 Ghazi B, Kumar R, Manurangsi P. Differentially private clustering: Tight approximation ratios[C]//Proceedings of the 34th Conference on Neural Information Processing Systems, 2020: 33-93.
43 Makarychev K, Makarychev Y, Razenshteyn I. Performance of johnson-lindenstrauss transform for $k$-means and $k$-medians clustering[C]//Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, 2019: 1027-1038.
44 KirszbraunM.ü ber die zusammenziehende und lipschitzsche transformationen[J].Fundamenta Mathematicae,1934,22(1):77-108.
45 Feldman D, Fiat A, Kaplan H, et al. Private coresets[C]//Proceedings of the 41st Annual ACM Symposium on Theory of Computing, 2009: 361-370.
46 Stemmer U. Locally private $k$-means clustering[C]//Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms, 2020: 548-559.
47 Nissim K, Stemmer U. Clustering algorithms for the centralized and local models[C]//Proceedings of the 29th Algorithmic Learning Theory, 2018: 619-653.
48 Stemmer U, Kaplan H. Differentially private $k$-means with constant multiplicative error[C]//Proceedings of the 31st Annual Conference on Neural Information Processing Systems, 2018: 5431-5441.
49 Hsu J, Khanna S, Roth A. Distributed private heavy hitters[C]//Proceedings of the 39th International Colloquium on Automata, Languages, and Programming, 2012: 461-472.
50 Chang A, Ghazi B, Kumar R, et al. Locally private $k$-means in one round[C]//IProceedings of the 38th International Conference on Machine Learning, 2021: 1441-1451.
51 Har-PeledS,MendelM.Fast construction of nets in low-dimensional metrics and their applications[J].SIAM Journal on Computing,2006,35(5):1148-1184.
52 Chaturvedi A, Jones M, Nguyen H L. Locally private $k$-means clustering with constant multiplicative approximation and near-optimal additive error[EB/OL]. [2021-12-19]. https://doi.org/10.48550/arXiv.2105.15007.
53 BassilyR,NissimK,StemmerU,et al.Practical locally private heavy hitters[J].Journal of Machine Learning Research,2020,21(16):1-42.
文章导航

/