A survey of differential privacy algorithms for the $k$-means problem

Expand
  • 1. Beijing Institute for Scientific and Engineering Computing, Beijing University of Technology, Beijing 100124, China
    2. School of Computer Science and Technology, Shandong Jianzhu University, Jinan 250101, Shandong, China

Received date: 2021-12-20

  Online published: 2022-09-07

Abstract

The $k$-means problem is a very important problem in the field of machine learning and combinatorial optimization. It is a classic NP-hard problem, which is widely used in data mining, business production decision-making, image processing, biomedical technology, and more. As people in these fields pay more and more attention to personal privacy protection, which raises a question: In the case that decisions are usually made by artificial intelligence algorithms, how to ensure that as much information as possible is extracted from the data without revealing personal privacy? In the past ten years, experts and scholars have continuously studied and explored the $k$-means problem with privacy protection, and has also obtained many results with theoretical guiding significance and practical application value. In this paper we mainly introduce these differential privacy algorithms of the $k$-means problem for readers.

Cite this article

Fan YUAN, Dachuan XU, Dongmei ZHANG . A survey of differential privacy algorithms for the $k$-means problem[J]. Operations Research Transactions, 2022 , 26(3) : 1 -16 . DOI: 10.15960/j.cnki.issn.1007-6093.2022.03.001

References

1 Dasgupta S. The hardness of $k$-means clustering[R]. San Diego: Department of Computer Science and Engineering, University of California, 2008: CS2008-0916.
2 AhmadianS,Norouzi-FardA,SvenssonO,et al.Better guarantees for $k$-means and euclidean $k$-median by primal-dual algorithms[J].SIAM Journal on Computing,2019,49(4):FOCS17-97-FOCS17-156.
3 张冬梅,李敏,徐大川,等.$k$-均值问题的理论与算法综述[J].中国科学: 数学,2020,50(9):1387-1404.
4 LuR,ZhuH,LiuX,et al.Toward efficient and privacy-preserving computing in big data era[J].IEEE Network,2014,28(4):46-50.
5 WangT,ZhengZ,RehmaniM H,et al.Privacy preservation in big data from the communication perspective–-A survey[J].IEEE Communications Surveys Tutorials,2018,21(1):753-778.
6 Yao X, Zhou X, Ma J. Differential privacy of big data: An overview[C]//Proceedings of the 2nd IEEE International Conference on Big Data Security on Cloud, High Performance and Smart Computing and Intelligent Data and Security, 2016: 7-12.
7 JainP,GyanchandaniM,KhareN.Differential privacy: its technological prescriptive using big data[J].Journal of Big Data,2018,5(1):1-24.
8 Skarkala M E, Maragoudakis M, Gritzalis S, et al. Privacy preservation by $k$-anonymization of weighted social networks[C]//Proceedings of the 4th IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2012: 423-428.
9 Zheleva E, Getoor L. Privacy in social networks: A survey[M]//Social Network Data Analytics, Boston: Springer, 2011: 277-306.
10 Task C, Clifton C. What should we protect? Defining differential privacy for social network analysis[M]//State of the Art Applications of Social Network Analysis. Cham: Springer, 2014: 139-161.
11 Gowtham M, Ahila S S. Privacy enhanced data communication protocol for wireless body area network[C]//Proceedings of the 4th International Conference on Advanced Computing and Communication Systems, 2017: 1-5.
12 LiM,LouW,RenK.Data security and privacy in wireless body area networks[J].IEEE Wireless communications,2010,17(1):51-58.
13 Abadi M, Chu A, Goodfellow I, et al. Deep learning with differential privacy[C]//Proceedings of the 16th ACM SIGSAC Conference on Computer and Communications Security, 2016: 308-318.
14 KaissisG A,MakowskiM R,RückertD,et al.Secure, privacy-preserving and federated machine learning in medical imaging[J].Nature Machine Intelligence,2020,2(6):305-311.
15 KaissisG,ZillerA,Passerat-PalmbachJ,et al.End-to-end privacy preserving deep learning on multi-institutional medical imaging[J].Nature Machine Intelligence,2021,3(6):473-484.
16 Chaturvedi A, Nguy$\rm \tilde{\hat{e}}$n H L, Zakynthinou L. Differentially private decomposable submodular maximization[C]//Proceedings of the 35th AAAI Conference on Artificial Intelligence, 2021: 6984-6992.
17 Mitrovic M, Bun M, Krause A, et al. Differentially private submodular maximization: data summarization in disguise[C]//Proceedings of the 34th International Conference on Machine Learning, 2017: 2478-2487.
18 Rafiey A, Yoshida Y. Fast and private submodular and $k$-submodular functions maximization with matroid constraints[C]//Proceedings of the 37th International Conference on Machine Learning, 2020: 7887-7897.
19 Gupta A, Ligett K, McSherry F, et al. Differentially private combinatorial optimization[C]//Proceedings of 21st Annual ACM-SIAM Symposium on Discrete Algorithms, 2010: 1106-1125.
20 Matou?ekJ.On approximate geometric $k$-clustering[J].Discrete and Computational Geometry,2000,24,61-84.
21 Dwork C. Differential privacy[C]//Proceedings of the 33rd International Colloquium on Automata, Languages, and Programming, 2006: 1-12.
22 Dwork C, McSherry F, Nissim K, et al. Calibrating noise to sensitivity in private data analysis[C]//Proceedings of the 3rd Theory of Cryptography Conference, 2006: 265-284.
23 Mcsherry F, Talwar K. Mechanism design via differential privacy[C]//Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science, 2007: 94-103.
24 McSherry F. Privacy integrated queries: an extensible platform for privacy-preserving data analysis[C]//Proceedings of the 35th ACM SIGMOD International Conference on Management of Data, 2009: 19-30.
25 Fisher C. Over 267 million facebook users reportedly had data exposed online[EB/OL]. [2021-12-20]. https://www.engadget.com/2019/12/19/facebook-data-exposed-online/.
26 Daniel V, Kershner I. Personal data of all 6.5 million israeli voters is exposed[EB/OL]. [2021-12-20]. https://www.nytimes.com/2020/02/10/world/middleeast/israeli-voters-leak.html.
27 Bassily R, Smith A, Thakurta A. Private empirical risk minimization: Efficient algorithms and tight error bounds[C]//Proceedings of the IEEE 55th Annual Symposium on Foundations of Computer Science, 2014: 464-473.
28 Balcan M F, Dick T, Liang Y, et al. Differentially private clustering in high-dimensional euclidean spaces[C]//Proceedings of the 34th International Conference on Machine Learning, 2017: 322-331.
29 DworkC,RothA.The algorithmic foundations of differential privacy[J].Foundations and Trends in Theoretical Computer Science,2014,9(3/4):211-407.
30 Wang Y, Wang Y X, Singh A. Differentially private subspace clustering[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems, 2015: 1000-1008.
31 Su D, Cao J, Li N, et al. Differentially private $k$-means clustering[C]//Proceedings of the 6th ACM Conference on Data and Application Security and Privacy, 2016: 26-37.
32 Feldman D, Xiang C, Zhu R, et al. Coresets for differentially private $k$-means clustering and applications to privacy in mobile sensor networks[C]//Proceedings of the 16th ACM/IEEE International Conference on Information Processing in Sensor Networks, 2017: 3-16.
33 Huang Z, Liu J. Optimal differentially private algorithms for $k$-means clustering[C]//Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 2018: 395-408.
34 Nock R, Canyasse R, Boreli R, et al. $k$-variates++: more pluses in the $k$-means++[C]//Proceedings of the 33rd International Conference on Machine Learning, 2016: 145-154.
35 Jones M, Nguyen H L, Nguyen T D. Differentially private clustering via maximum coverage[C]//Proceedings of the 35th AAAI Conference on Artificial Intelligence, 2021: 11555-11563.
36 Nguyen H L, Chaturvedi A, Xu E Z. Differentially private $k$-means via exponential mechanism and max cover[C]//Proceedings of the 35th AAAI Conference on Artificial Intelligence, 2021: 9101-9108.
37 Arthur D, Vassilvitskii S. $k$-means++: the advantages of careful seeding[C]//Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms, 2007: 1027-1035.
38 Nissim K, Stemmer U, Vadhan S P. Locating a small cluster privately[C]//Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 2016: 413-427.
39 Blum A, Dwork C, McSherry F, et al. Practical privacy: the SuLQ framework[C]//Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 2005: 128-138.
40 Nissim K, Raskhodnikova S, Smith A. Smooth sensitivity and sampling in private data analysis[C]//Proceedings of the 39th Annual ACM Symposium on Theory of Computing, 2007: 75-84.
41 Zhang J, Xiao X, Yang Y, et al. PrivGene: differentially private model fitting using genetic algorithms[C]//Proceedings of the 13th ACM SIGMOD International Conference on Management of Data, 2013: 665-676.
42 Ghazi B, Kumar R, Manurangsi P. Differentially private clustering: Tight approximation ratios[C]//Proceedings of the 34th Conference on Neural Information Processing Systems, 2020: 33-93.
43 Makarychev K, Makarychev Y, Razenshteyn I. Performance of johnson-lindenstrauss transform for $k$-means and $k$-medians clustering[C]//Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, 2019: 1027-1038.
44 KirszbraunM.ü ber die zusammenziehende und lipschitzsche transformationen[J].Fundamenta Mathematicae,1934,22(1):77-108.
45 Feldman D, Fiat A, Kaplan H, et al. Private coresets[C]//Proceedings of the 41st Annual ACM Symposium on Theory of Computing, 2009: 361-370.
46 Stemmer U. Locally private $k$-means clustering[C]//Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms, 2020: 548-559.
47 Nissim K, Stemmer U. Clustering algorithms for the centralized and local models[C]//Proceedings of the 29th Algorithmic Learning Theory, 2018: 619-653.
48 Stemmer U, Kaplan H. Differentially private $k$-means with constant multiplicative error[C]//Proceedings of the 31st Annual Conference on Neural Information Processing Systems, 2018: 5431-5441.
49 Hsu J, Khanna S, Roth A. Distributed private heavy hitters[C]//Proceedings of the 39th International Colloquium on Automata, Languages, and Programming, 2012: 461-472.
50 Chang A, Ghazi B, Kumar R, et al. Locally private $k$-means in one round[C]//IProceedings of the 38th International Conference on Machine Learning, 2021: 1441-1451.
51 Har-PeledS,MendelM.Fast construction of nets in low-dimensional metrics and their applications[J].SIAM Journal on Computing,2006,35(5):1148-1184.
52 Chaturvedi A, Jones M, Nguyen H L. Locally private $k$-means clustering with constant multiplicative approximation and near-optimal additive error[EB/OL]. [2021-12-19]. https://doi.org/10.48550/arXiv.2105.15007.
53 BassilyR,NissimK,StemmerU,et al.Practical locally private heavy hitters[J].Journal of Machine Learning Research,2020,21(16):1-42.
Outlines

/