基于Polyak步长的加速临近随机方差缩减算法

doi:10.15960/j.cnki.issn.1007-6093.2024.02.010

Abstract

Abstract:

To solve stochastic composite optimization problems in machine learning, we propose a new accelerated proximal variance reduction gradient algorithm called Acc-Prox-SVRG-Polyak, which combines the Acc-Prox-SVRG algorithm with the Polyak step size method. Compared to the existing algorithms, the new algorithm can make full use of the advantages of acceleration technology and Polyak step size to improve its accuracy, the convergence of the algorithm is demonstrated under the usual assumptions, and the complexity is analyzed. Finally, numerical experiments on standard data sets verify the effectiveness of the new algorithm.

Key words: Polyak step size, variance reduction, machine learning, stochastic gradient

CLC Number:

O221.2

Fusheng WANG, Luyu SHI. An accelerated proximal stochastic gradient method with variance reduction based on Polyak step size[J]. Operations Research Transactions, 2024, 28(2): 131-142.

Figures/Tables 4

References 33

1	WangC,WangX,ZhangC,et al.Geometric correction based color image watermarking using fuzzy least squares support vector machine and Bessel K form distribution[J].Signal Process,2017,134,197-208. doi: 10.1016/j.sigpro.2016.12.010
2	ZhuJ,RossetS,HastieT,et al.1-norm Support Vector Machines[J].Advances in Neural Information Processing Systems,2004,16,49-56.
3	Tong S, Chang E. Support vector machine active learning for image retrieval[C]//ACM International Conference on Multimedia, 2001: 107-118.
4	Adalbjornsson,Stefan,Ingi,et al.Group-sparse regression using the covariance fitting criterion[J].Signal Process,2017,139,116-130. doi: 10.1016/j.sigpro.2017.03.025
5	MateosG,BazerqueJ A,GiannakisG B,et al.Distributed sparse linear regression[J].IEEE Transactions on Signal Processing,2010,58(10):5262-5276. doi: 10.1109/TSP.2010.2055862
6	PlanY,VershyninR.Robust 1-bit compressed sensing and sparse logistic regression: a convex programming approach[J].IEEE Transactions on Information Theory,2013,59(1):482-494. doi: 10.1109/TIT.2012.2207945
7	ShamirO.A stochastic PCA and SVD algorithm with an exponential convergence rate[J].Mathematics,2015,37,144-152.
8	Garber D, Hazan E, Jin C, et al. Faster eigenvector computation via shift-and-invert preconditioning[C]//International Conference on Machine Learning, 2016: 2626-2634.
9	Allen-Zhu Z. Katyusha: The first direct acceleration of stochastic gradient methods[C]//Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, 2017: 1200-1205.
10	BeckA,TeboulleM.A fast iterative shrinkage-thresholding algorithm for linear inverse problems[J].SIAM Journal on Imaging Sciences,2009,2(1):183-202. doi: 10.1137/080716542
11	BeckA,TeboulleM.Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems[J].IEEE Transactions on Image Processing,2009,18(11):2419-2434. doi: 10.1109/TIP.2009.2028250
12	NesterovY.Gradient methods for minimizing composite objective function[J].Center for Operations Research and Econometrics,2007,140(1):125-161.
13	NesterovY.Smooth minimization of non-smooth functions[J].Center for Operations Research and Econometrics,2005,103(1):127-152.
14	BottouL.Stochastic gradient descent tricks[J].Neural Networks, Tricks of the Trade, Reloaded,2012,7700,421-436.
15	KleinS,PluimJ P W,StaringM,et al.Adaptive stochastic gradient descent optimisation for image registration[J].Internation Journal of Computer Vision,2009,81(3):227-239. doi: 10.1007/s11263-008-0168-y
16	Tsuruoka Y, Tsujii J, Ananiadou S, et al. Stochastic gradient descent training for $l_1$-regularized log-linear models with cumulative penalty[C]//Joint Conference of the Meeting of the ACL and the International Joint Conference on Natural Language Processing of the AFNLP, 2009: 477-485.
17	Zhang T. Solving large scale linear prediction problems using stochastic gradient descent algorithms[C]//International Conference on Machine learning, 2004: 919-926.
18	NesterovY.Introductory Lectures on Convex Optimization: A Basic Course[M].New York:Springer,2004.
19	GhadimiS,LanG.Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, Ⅱ: A generic algorithmic framework[J].Society for Industrial and Applied Mathematics,2012,22(4):1469-1492.
20	Shalev-Shwartz S, Zhang T. Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization[C]//International Conference on Machine Learning, 2014: 64-72.
21	Shalev-Shwartz S, Zhang T. Proximal stochastic dual coordinate ascent[EB/OL]. (2012-11-12)[2023-06-28]. arXiv: 1211.2717.
22	Shalev-ShwartzS,ZhangT.Stochastic dual coordinate ascent methods for regularized loss minimization[J].Journal of Machine Learning Research,2013,14,567-599.
23	Nitanda A. Stochastic proximal gradient descent with acceleration techniques[C]//Advances in Neural Information Processing Systems, 2014: 1574-1582.
24	XiaoL,ZhangT.A proximal stochastic gradient method with progressive variance reduction[J].SIAM journal on optimization,2014,24(4):2057-2075. doi: 10.1137/140961791
25	RobbinH,MonroS.A stochastic approximation method[J].Annals of Mathematical Statistics,1951,22,400-407. doi: 10.1214/aoms/1177729586
26	DuchiJ,HazanE,SingerY,et al.Adaptive subgradient methods for online learning and stochastic optimization[J].Journal of Machine Learning Research,2011,12(7):257-269.
27	Kingma D P, Ba J. Adam: A method for stochastic optimization[EB/OL]. (2017-01-30)[2023-06-28]. arXiv: 1412.6980.
28	BarzilaiJ,BorweinJ M.Two-point step size gradient methods[J].IMA Journal on Numerical Analysis,1988,8(1):141-148.
29	ZhouB,GaoL,DaiY H,et al.Gradient methods with adaptive step-sizes[J].Computational Optimization and Applications,2006,35(1):69-86.
30	DaiY H,FletcherR.Projected Barzilai-Borweinmethods for large-scale box-constrained quadratic programming[J].Numerische Mathematik,2005,100(1):21-47.
31	TanC,MaS,DaiY H,et al.Barzilai-Borwein step size for stochastic gradient descent[J].Advances in Neural Information Processing Systems,2016,29,685-693.
32	PolyakB T.Introduction to Optimization[M].New York:Chapman and Hall,1987.
33	李蝶. 基于Polyak步长的方差缩减算法[D]. 天津: 河北工业大学, 2021.

数据集	n	d	$ \lambda_{1} $	$ \lambda_{2} $
splice	1 000	60	$ 10^{-5} $	$ 10^{-4} $
ijcnn1	49 990	22	$ 10^{-5} $	$ 10^{-4} $

[1]	Jiaqi ZHANG, Jueyou LI. A class of differential privacy stochastic gradient descent algorithm with adaptive gradient clipping [J]. Operations Research Transactions, 2024, 28(2): 47-57.
[2]	Xiangfeng WANG, Wenhao LI. Machine learning-driven multi-agent pathfinding: An overview [J]. Operations Research Transactions, 2023, 27(4): 106-135.
[3]	Hu SHAO, Yue ZHUO, Pengjie LIU, Feng SHAO. Operational research methods for urban traffic flow estimation [J]. Operations Research Transactions, 2023, 27(2): 27-48.
[4]	ZHANG Huiling, XU Yang, XU Zi. Block alternating proximal gradient algorithm for convex-nonconcave minimax problems [J]. Operations Research Transactions, 2022, 26(4): 64-74.
[5]	XU Zi, ZHANG Huiling. Optimization algorithms and their complexity analysis for non-convex minimax problems [J]. Operations Research Transactions, 2021, 25(3): 74-86.
[6]	HE Zhiquan. Simulation of value of CM strategy multi-period return guarantee under asset jump [J]. Operations Research Transactions, 2017, 21(1): 87-102.
[7]	CHEN Zheng, GAO Yan. Approximating viability kernel for control systems [J]. Operations Research Transactions, 2013, 17(4): 24-32.

An accelerated proximal stochastic gradient method with variance reduction based on Polyak step size

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 4

References 33

Related Articles 7

Recommended Articles

Metrics

Comments