一类自适应梯度裁剪的差分隐私随机梯度下降算法

doi:10.15960/j.cnki.issn.1007-6093.2024.02.003

运筹学学报（中英文） ›› 2024, Vol. 28 ›› Issue (2): 47-57.doi: 10.15960/j.cnki.issn.1007-6093.2024.02.003

一类自适应梯度裁剪的差分隐私随机梯度下降算法

张家棋¹, 李觉友²^,*()

1. 重庆大学数学与统计学院, 重庆 400044
2. 重庆师范大学数学科学学院, 重庆 401331

收稿日期:2022-06-22 出版日期:2024-06-15 发布日期:2024-06-07
通讯作者: 李觉友 E-mail:lijueyou@cqnu.edu.cn
基金资助:
国家重点研发计划项目(2023YFA1011303);国家自然科学基金(11971083);国家自然科学基金(11991024);重庆市自然科学基金(cstc2020jcyj-msxmX0287)

A class of differential privacy stochastic gradient descent algorithm with adaptive gradient clipping

Jiaqi ZHANG¹, Jueyou LI²^,*()

1. School of Mathematics and Statistics, Chongqing University, Chongqing 400044, China
2. School of Mathematical Sciences, Chongqing Normal University, Chongqing 401331, China

Received:2022-06-22 Online:2024-06-15 Published:2024-06-07
Contact: Jueyou LI E-mail:lijueyou@cqnu.edu.cn

摘要/Abstract

摘要：

梯度裁剪是一种防止梯度爆炸的有效方法, 但梯度裁剪参数的选取通常对训练模型的性能有较大的影响。为此, 本文针对标准的差分隐私随机梯度下降算法进行改进。首先, 提出一种自适应的梯度裁剪方法, 即在传统裁剪方法基础上利用分位数和指数平均策略对梯度裁剪参数进行自适应动态调整, 进而提出一类自适应梯度裁剪的差分隐私随机梯度下降算法。其次, 在非凸目标函数的情况下对提出的自适应算法给出收敛性分析和隐私性分析。最后, 在MNIST、Fasion-MNIST和IMDB数据集上进行数值仿真。其结果表明, 与传统梯度裁剪算法相比, 本文提出的自适应梯度裁剪算法显著提高了模型精度。

关键词: 随机梯度下降算法, 差分隐私, 梯度裁剪, 自适应性

Abstract:

Gradient clipping is an effective method to prevent gradient explosion, but the selection of the gradient clipping parameter usually has a great influence on the performance of training models.To address this issue, this paper proposes an improved differentially private stochastic gradient descent algorithm by adaptively adjusting the gradient clipping parameter. First, an adaptive gradient clipping method is proposed by using the quantile and exponential averaging strategy to dynamically and adaptively adjust the gradient clipping parameter. Second, the convergence and privacy of the proposed algorithm for the case of non-convex objective function are analyzed. Finally, numerical simulations are performed on MNIST, Fasion-MNIST and IMDB datasets. The results show that the proposed algorithm can significantly improve the model accuracy compared to traditional stochastic gradient descent methods.

Key words: stochastic gradient descent algorithm, differential privacy, gradient clipping, adaptivity

中图分类号:

O221.2

张家棋, 李觉友. 一类自适应梯度裁剪的差分隐私随机梯度下降算法[J]. 运筹学学报（中英文）, 2024, 28(2): 47-57.

Jiaqi ZHANG, Jueyou LI. A class of differential privacy stochastic gradient descent algorithm with adaptive gradient clipping[J]. Operations Research Transactions, 2024, 28(2): 47-57.

图/表 6

图1

图2

图3

表1

表2

表3

参考文献 12

1	孙聪, 张亚. 梯度法简述[J]. 运筹学学报, 2021, 25 (3): 119- 132.
2	胡佳, 郭田德, 韩丛英. 小批量随机块坐标下降算法[J]. 运筹学学报, 2022, 26 (1): 1- 22.
3	Dwork C. Differential privacy[C]//Proceedings of the 33rd International Conference on Automata, Languages and Programming, 2006: 1-12.
4	Li N, Li T, Venkatasubramanian S. T-closeness: Privacy beyond k-anonymity and l-diversity[C]//IEEE 23rd International Conference on Data Engineering, 2007: 106-115.
5	Mironov I. Rényi Differential Privacy[C]//IEEE 30th Computer Security Foundations Symposium, 2017: 263-275.
6	Bu Z , Dong J , Long Q , et al. Deep learning with Gaussian differential privacy[J]. Harvard Data Science Review, 2020, 2 (3): 1- 31.
7	Seetharaman P, Wichern G, Pardo B, et al. Autoclip: Adaptive gradient clipping for source separation networks[C]//2020 IEEE 30th International Workshop on Machine Learning for Signal Processing, 2020: 1-6.
8	Abadi M, Chu A, Goodfellow I, et al. Deep learning with differential privacy[C]//Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 2016: 308-318.
9	LeCun Y, Cortes C. MNIST handwritten digit database[EB/OL]. (2010-01-01)[2021-06-28]. http://yann.lecun.com/exdb/mnist/.
10	Xiao H, Rasul K, Vollgraf R. Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms[EB/OL]. (2017-9-15)[2022-05-01]. arXiv: 1708.07747.
11	Kingma D, Ba J. Adam: A method for stochastic optimization[C]//Proceedings of the 3rd International Conference for Learning Representatio, 2015.
12	Maas A, Daly R E, Pham P T, et al. Learning word vectors for sentiment analysis[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011: 142-150.

算法	参数设置	迭代次数	测试准确率/%
SGD	—	7 020	98.79
DPSGD^[8]	$ C=0.1, \sigma = 1.0, \mu=0.234 $	7 020	81.92
	$ C=0.5, \sigma = 1.0, \mu=0.234 $	7 020	93.64
	$ C=1.0, \sigma = 1.0, \mu=0.234 $	7 020	94.28
	$ C=1.5, \sigma = 1.0, \mu=0.234 $	7 020	94.95
	$ C=2.0, \sigma = 1.0, \mu=0.234 $	7 020	93.98
	$ C=1.5, \sigma = 1.5, \mu=0.134 $	7 020	92.29
	$ C=1.5, \sigma = 2.0, \mu=0.095 $	7 020	90.05
	$ C=1.5, \sigma = 2.5, \mu=0.075 $	7 020	85.95
Ada-DPSGD	$ q = 0.1, \sigma = 1.0, \mu=0.234 $	7 020	96.51
	$ q = 0.3, \sigma = 1.0, \mu=0.234 $	7 020	96.25
	$ q = 0.5, \sigma = 1.0, \mu=0.234 $	7 020	98.89
	$ q = 0.7, \sigma = 1.0, \mu=0.234 $	7 020	95.45
	$ q = 0.9, \sigma = 1.0, \mu=0.234 $	7 020	96.17
	$ q = 0.5, \sigma = 1.5, \mu=0.134 $	7 020	94.83
	$ q = 0.5, \sigma = 2.0, \mu=0.095 $	7 020	91.80
	$ q = 0.5, \sigma = 2.5, \mu=0.075 $	7 020	87.75

算法	参数设置	迭代次数	测试准确率/%
SGD	—	7 020	85.48
DPSGD^[8]	$ C=0.1, \sigma=1.0, \mu=0.234 $	7 020	68.33
	$ C=0.5, \sigma=1.0, \mu=0.234 $	7 020	79.29
	$ C=1.0, \sigma=1.0, \mu=0.234 $	7 020	81.32
	$ C=1.5, \sigma=1.0, \mu=0.234 $	7 020	81.04
	$ C=2.0, \sigma=1.0, \mu=0.234 $	7 020	81.69
	$ C=1.0, \sigma=1.5, \mu=0.134 $	7 020	80.00
	$ C=1.0, \sigma=2.0, \mu=0.095 $	7 020	78.03
	$ C=1.0, \sigma=2.5, \mu=0.075 $	7 020	75.69
Ada-DPSGD	$ q = 0.1, \sigma=1.0, \mu=0.234 $	7 020	83.35
	$ q = 0.3, \sigma=1.0, \mu=0.234 $	7 020	83.24
	$ q = 0.5, \sigma=1.0, \mu=0.234 $	7 020	82.48
	$ q = 0.7, \sigma=1.0, \mu=0.234 $	7 020	82.82
	$ q = 0.9, \sigma=1.0, \mu=0.234 $	7 020	82.64
	$ q = 0.5, \sigma=1.5, \mu=0.134 $	7 020	82.35
	$ q = 0.5, \sigma=2.0, \mu=0.095 $	7 020	81.91
	$ q = 0.5, \sigma=2.5, \mu=0.075 $	7 020	81.74

算法	参数设置	迭代次数	测试准确率/%
Adam^[11]	—	1 649	87.34
DPAdam^[6]	$ C=0.2 $	1 649	82.38
	$ C=0.4 $	1 649	81.91
	$ C=0.5 $	1 649	81.08
	$ C=0.8 $	1 649	81.21
	$ C=1.0 $	1 649	81.58
	$ C=1.2 $	1 649	81.65
Ada-DPAdam	$ q=0.1 $	1 649	85.71
	$ q=0.3 $	1 649	85.32
	$ q=0.5 $	1 649	85.28
	$ q=0.7 $	1 649	85.32
	$ q=0.9 $	1 649	85.23

一类自适应梯度裁剪的差分隐私随机梯度下降算法

A class of differential privacy stochastic gradient descent algorithm with adaptive gradient clipping

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 6

参考文献 12

相关文章 1

编辑推荐

Metrics

本文评价