运筹学学报 >
2024 , Vol. 28 >Issue 2: 131 - 142
DOI: https://doi.org/10.15960/j.cnki.issn.1007-6093.2024.02.010
基于Polyak步长的加速临近随机方差缩减算法
收稿日期: 2023-07-01
网络出版日期: 2024-06-07
基金资助
山西省基础研究计划(自由探索类) 基金(202103021224303)
版权
An accelerated proximal stochastic gradient method with variance reduction based on Polyak step size
Received date: 2023-07-01
Online published: 2024-06-07
Copyright
王福胜, 史鲁玉 . 基于Polyak步长的加速临近随机方差缩减算法[J]. 运筹学学报, 2024 , 28(2) : 131 -142 . DOI: 10.15960/j.cnki.issn.1007-6093.2024.02.010
To solve stochastic composite optimization problems in machine learning, we propose a new accelerated proximal variance reduction gradient algorithm called Acc-Prox-SVRG-Polyak, which combines the Acc-Prox-SVRG algorithm with the Polyak step size method. Compared to the existing algorithms, the new algorithm can make full use of the advantages of acceleration technology and Polyak step size to improve its accuracy, the convergence of the algorithm is demonstrated under the usual assumptions, and the complexity is analyzed. Finally, numerical experiments on standard data sets verify the effectiveness of the new algorithm.
| 1 | WangC,WangX,ZhangC,et al.Geometric correction based color image watermarking using fuzzy least squares support vector machine and Bessel K form distribution[J].Signal Process,2017,134,197-208. |
| 2 | ZhuJ,RossetS,HastieT,et al.1-norm Support Vector Machines[J].Advances in Neural Information Processing Systems,2004,16,49-56. |
| 3 | Tong S, Chang E. Support vector machine active learning for image retrieval[C]//ACM International Conference on Multimedia, 2001: 107-118. |
| 4 | Adalbjornsson,Stefan,Ingi,et al.Group-sparse regression using the covariance fitting criterion[J].Signal Process,2017,139,116-130. |
| 5 | MateosG,BazerqueJ A,GiannakisG B,et al.Distributed sparse linear regression[J].IEEE Transactions on Signal Processing,2010,58(10):5262-5276. |
| 6 | PlanY,VershyninR.Robust 1-bit compressed sensing and sparse logistic regression: a convex programming approach[J].IEEE Transactions on Information Theory,2013,59(1):482-494. |
| 7 | ShamirO.A stochastic PCA and SVD algorithm with an exponential convergence rate[J].Mathematics,2015,37,144-152. |
| 8 | Garber D, Hazan E, Jin C, et al. Faster eigenvector computation via shift-and-invert preconditioning[C]//International Conference on Machine Learning, 2016: 2626-2634. |
| 9 | Allen-Zhu Z. Katyusha: The first direct acceleration of stochastic gradient methods[C]//Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, 2017: 1200-1205. |
| 10 | BeckA,TeboulleM.A fast iterative shrinkage-thresholding algorithm for linear inverse problems[J].SIAM Journal on Imaging Sciences,2009,2(1):183-202. |
| 11 | BeckA,TeboulleM.Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems[J].IEEE Transactions on Image Processing,2009,18(11):2419-2434. |
| 12 | NesterovY.Gradient methods for minimizing composite objective function[J].Center for Operations Research and Econometrics,2007,140(1):125-161. |
| 13 | NesterovY.Smooth minimization of non-smooth functions[J].Center for Operations Research and Econometrics,2005,103(1):127-152. |
| 14 | BottouL.Stochastic gradient descent tricks[J].Neural Networks, Tricks of the Trade, Reloaded,2012,7700,421-436. |
| 15 | KleinS,PluimJ P W,StaringM,et al.Adaptive stochastic gradient descent optimisation for image registration[J].Internation Journal of Computer Vision,2009,81(3):227-239. |
| 16 | Tsuruoka Y, Tsujii J, Ananiadou S, et al. Stochastic gradient descent training for $l_1$-regularized log-linear models with cumulative penalty[C]//Joint Conference of the Meeting of the ACL and the International Joint Conference on Natural Language Processing of the AFNLP, 2009: 477-485. |
| 17 | Zhang T. Solving large scale linear prediction problems using stochastic gradient descent algorithms[C]//International Conference on Machine learning, 2004: 919-926. |
| 18 | NesterovY.Introductory Lectures on Convex Optimization: A Basic Course[M].New York:Springer,2004. |
| 19 | GhadimiS,LanG.Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, Ⅱ: A generic algorithmic framework[J].Society for Industrial and Applied Mathematics,2012,22(4):1469-1492. |
| 20 | Shalev-Shwartz S, Zhang T. Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization[C]//International Conference on Machine Learning, 2014: 64-72. |
| 21 | Shalev-Shwartz S, Zhang T. Proximal stochastic dual coordinate ascent[EB/OL]. (2012-11-12)[2023-06-28]. arXiv: 1211.2717. |
| 22 | Shalev-ShwartzS,ZhangT.Stochastic dual coordinate ascent methods for regularized loss minimization[J].Journal of Machine Learning Research,2013,14,567-599. |
| 23 | Nitanda A. Stochastic proximal gradient descent with acceleration techniques[C]//Advances in Neural Information Processing Systems, 2014: 1574-1582. |
| 24 | XiaoL,ZhangT.A proximal stochastic gradient method with progressive variance reduction[J].SIAM journal on optimization,2014,24(4):2057-2075. |
| 25 | RobbinH,MonroS.A stochastic approximation method[J].Annals of Mathematical Statistics,1951,22,400-407. |
| 26 | DuchiJ,HazanE,SingerY,et al.Adaptive subgradient methods for online learning and stochastic optimization[J].Journal of Machine Learning Research,2011,12(7):257-269. |
| 27 | Kingma D P, Ba J. Adam: A method for stochastic optimization[EB/OL]. (2017-01-30)[2023-06-28]. arXiv: 1412.6980. |
| 28 | BarzilaiJ,BorweinJ M.Two-point step size gradient methods[J].IMA Journal on Numerical Analysis,1988,8(1):141-148. |
| 29 | ZhouB,GaoL,DaiY H,et al.Gradient methods with adaptive step-sizes[J].Computational Optimization and Applications,2006,35(1):69-86. |
| 30 | DaiY H,FletcherR.Projected Barzilai-Borweinmethods for large-scale box-constrained quadratic programming[J].Numerische Mathematik,2005,100(1):21-47. |
| 31 | TanC,MaS,DaiY H,et al.Barzilai-Borwein step size for stochastic gradient descent[J].Advances in Neural Information Processing Systems,2016,29,685-693. |
| 32 | PolyakB T.Introduction to Optimization[M].New York:Chapman and Hall,1987. |
| 33 | 李蝶. 基于Polyak步长的方差缩减算法[D]. 天津: 河北工业大学, 2021. |
/
| 〈 |
|
〉 |