基于Polyak步长的加速临近随机方差缩减算法

王福胜, 史鲁玉

doi:10.15960/j.cnki.issn.1007-6093.2024.02.010

运筹学学报 >

2024 , Vol. 28 >Issue 2: 131 - 142

DOI: https://doi.org/10.15960/j.cnki.issn.1007-6093.2024.02.010

基于Polyak步长的加速临近随机方差缩减算法

展开

1. 太原师范学院数学与统计学院, 山西晋中 030619

王福胜 E-mail: fswang2005@163.com

收稿日期: 2023-07-01

网络出版日期: 2024-06-07

基金资助

山西省基础研究计划(自由探索类) 基金(202103021224303)

版权

收起

An accelerated proximal stochastic gradient method with variance reduction based on Polyak step size

Expand

1. School of Mathematics and Statistics, Taiyuan Normal University, Jinzhong 030619, Shanxi, China

Received date: 2023-07-01

Online published: 2024-06-07

Copyright

Fold

摘要

针对大规模机器学习中随机复合优化问题, 本文将加速临近随机方差缩减算法(Acc-Prox-SVRG)和Polyak步长方法相结合, 提出了一种新的加速临近随机方差缩减算法(Acc-Prox-SVRG-Polyak)。相比于已有算法, 新算法充分利用加速技术和Polyak步长的优越性, 提高了准确率。在通常的假定下论证了算法的收敛性, 并分析了复杂度。最后, 数值实验验证了新算法的有效性。

关键词： Polyak步长; 方差缩减; 机器学习; 随机梯度

本文引用格式

王福胜, 史鲁玉 . 基于Polyak步长的加速临近随机方差缩减算法[J]. 运筹学学报, 2024 , 28(2) : 131 -142 . DOI: 10.15960/j.cnki.issn.1007-6093.2024.02.010

Abstract

To solve stochastic composite optimization problems in machine learning, we propose a new accelerated proximal variance reduction gradient algorithm called Acc-Prox-SVRG-Polyak, which combines the Acc-Prox-SVRG algorithm with the Polyak step size method. Compared to the existing algorithms, the new algorithm can make full use of the advantages of acceleration technology and Polyak step size to improve its accuracy, the convergence of the algorithm is demonstrated under the usual assumptions, and the complexity is analyzed. Finally, numerical experiments on standard data sets verify the effectiveness of the new algorithm.

Key words： Polyak step size; variance reduction; machine learning; stochastic gradient

参考文献

1	WangC,WangX,ZhangC,et al.Geometric correction based color image watermarking using fuzzy least squares support vector machine and Bessel K form distribution[J].Signal Process,2017,134,197-208.
2	ZhuJ,RossetS,HastieT,et al.1-norm Support Vector Machines[J].Advances in Neural Information Processing Systems,2004,16,49-56.
3	Tong S, Chang E. Support vector machine active learning for image retrieval[C]//ACM International Conference on Multimedia, 2001: 107-118.
4	Adalbjornsson,Stefan,Ingi,et al.Group-sparse regression using the covariance fitting criterion[J].Signal Process,2017,139,116-130.
5	MateosG,BazerqueJ A,GiannakisG B,et al.Distributed sparse linear regression[J].IEEE Transactions on Signal Processing,2010,58(10):5262-5276.
6	PlanY,VershyninR.Robust 1-bit compressed sensing and sparse logistic regression: a convex programming approach[J].IEEE Transactions on Information Theory,2013,59(1):482-494.
7	ShamirO.A stochastic PCA and SVD algorithm with an exponential convergence rate[J].Mathematics,2015,37,144-152.
8	Garber D, Hazan E, Jin C, et al. Faster eigenvector computation via shift-and-invert preconditioning[C]//International Conference on Machine Learning, 2016: 2626-2634.
9	Allen-Zhu Z. Katyusha: The first direct acceleration of stochastic gradient methods[C]//Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, 2017: 1200-1205.
10	BeckA,TeboulleM.A fast iterative shrinkage-thresholding algorithm for linear inverse problems[J].SIAM Journal on Imaging Sciences,2009,2(1):183-202.
11	BeckA,TeboulleM.Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems[J].IEEE Transactions on Image Processing,2009,18(11):2419-2434.
12	NesterovY.Gradient methods for minimizing composite objective function[J].Center for Operations Research and Econometrics,2007,140(1):125-161.
13	NesterovY.Smooth minimization of non-smooth functions[J].Center for Operations Research and Econometrics,2005,103(1):127-152.
14	BottouL.Stochastic gradient descent tricks[J].Neural Networks, Tricks of the Trade, Reloaded,2012,7700,421-436.
15	KleinS,PluimJ P W,StaringM,et al.Adaptive stochastic gradient descent optimisation for image registration[J].Internation Journal of Computer Vision,2009,81(3):227-239.
16	Tsuruoka Y, Tsujii J, Ananiadou S, et al. Stochastic gradient descent training for $l_1$-regularized log-linear models with cumulative penalty[C]//Joint Conference of the Meeting of the ACL and the International Joint Conference on Natural Language Processing of the AFNLP, 2009: 477-485.
17	Zhang T. Solving large scale linear prediction problems using stochastic gradient descent algorithms[C]//International Conference on Machine learning, 2004: 919-926.
18	NesterovY.Introductory Lectures on Convex Optimization: A Basic Course[M].New York:Springer,2004.
19	GhadimiS,LanG.Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, Ⅱ: A generic algorithmic framework[J].Society for Industrial and Applied Mathematics,2012,22(4):1469-1492.
20	Shalev-Shwartz S, Zhang T. Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization[C]//International Conference on Machine Learning, 2014: 64-72.
21	Shalev-Shwartz S, Zhang T. Proximal stochastic dual coordinate ascent[EB/OL]. (2012-11-12)[2023-06-28]. arXiv: 1211.2717.
22	Shalev-ShwartzS,ZhangT.Stochastic dual coordinate ascent methods for regularized loss minimization[J].Journal of Machine Learning Research,2013,14,567-599.
23	Nitanda A. Stochastic proximal gradient descent with acceleration techniques[C]//Advances in Neural Information Processing Systems, 2014: 1574-1582.
24	XiaoL,ZhangT.A proximal stochastic gradient method with progressive variance reduction[J].SIAM journal on optimization,2014,24(4):2057-2075.
25	RobbinH,MonroS.A stochastic approximation method[J].Annals of Mathematical Statistics,1951,22,400-407.
26	DuchiJ,HazanE,SingerY,et al.Adaptive subgradient methods for online learning and stochastic optimization[J].Journal of Machine Learning Research,2011,12(7):257-269.
27	Kingma D P, Ba J. Adam: A method for stochastic optimization[EB/OL]. (2017-01-30)[2023-06-28]. arXiv: 1412.6980.
28	BarzilaiJ,BorweinJ M.Two-point step size gradient methods[J].IMA Journal on Numerical Analysis,1988,8(1):141-148.
29	ZhouB,GaoL,DaiY H,et al.Gradient methods with adaptive step-sizes[J].Computational Optimization and Applications,2006,35(1):69-86.
30	DaiY H,FletcherR.Projected Barzilai-Borweinmethods for large-scale box-constrained quadratic programming[J].Numerische Mathematik,2005,100(1):21-47.
31	TanC,MaS,DaiY H,et al.Barzilai-Borwein step size for stochastic gradient descent[J].Advances in Neural Information Processing Systems,2016,29,685-693.
32	PolyakB T.Introduction to Optimization[M].New York:Chapman and Hall,1987.
33	李蝶. 基于Polyak步长的方差缩减算法[D]. 天津: 河北工业大学, 2021.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献