运筹学学报 >
2022 , Vol. 26 >Issue 1: 1 - 22
DOI: https://doi.org/10.15960/j.cnki.issn.1007-6093.2022.01.001
小批量随机块坐标下降算法
收稿日期: 2021-06-14
网络出版日期: 2022-03-14
基金资助
国家自然科学基金(11731013);国家自然科学基金(U19B2040);国家自然科学基金(11991022);中国科学院先导专项基金(XDA27000000)
Mini-batch stochastic block coordinate descent algorithm
Received date: 2021-06-14
Online published: 2022-03-14
胡佳, 郭田德, 韩丛英 . 小批量随机块坐标下降算法[J]. 运筹学学报, 2022 , 26(1) : 1 -22 . DOI: 10.15960/j.cnki.issn.1007-6093.2022.01.001
We study the mini-batch stochastic block coordinate descent algorithm (mSBD) for a class of problems, i.e., structured stochastic optimization problems (where "structured" means that feasible region of the problem has a block structure and the nonsmooth regularized part of the objective function is separable across the variable blocks), widely used in machine learning. We give the basic mSBD and its variant according to solving non-composite and composite problems respectively. For the noncomposite problem, we analyze the convergence properties of the algorithm without the assumption of uniformly bounded gradient variance. For the composite problem, we obtain the convergence of the algorithm without the usual Lipschitz gradient continuity assumption. Finally, we verify the effectiveness of mSBD by numerical experiments.
| 1 | Tibshirani R . Regression shrinkage and selection via the lasso[J]. Journal of the Royal Statistical Society: Series B (Methodological), 1996, 58 (1): 267- 288. |
| 2 | Shevade S K , Keerthi S S . A simple and efficient algorithm for gene selection using sparse logistic regression[J]. Bioinformatics, 2003, 19 (17): 2246- 2253. |
| 3 | Sra S , Nowozin S , Wright S J . Optimization for Machine Learning[M]. Cambridge: MIT Press, 2012. |
| 4 | Bottou L , Curtis F E , Nocedal J . Optimization methods for large-scale machine learning[J]. SIAM Review, 2018, 60 (2): 223- 311. |
| 5 | Zeng J, Lau T T K, Lin S, et al. Global convergence of block coordinate descent in deep learning[C]//International Conference on Machine Learning, 2019: 7313-7323. |
| 6 | Li J, Fang C, Lin Z. Lifted proximal operator machines[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019: 4181-4188. |
| 7 | Luo Z Q , Tseng P . Error bound and convergence analysis of matrix splitting algorithms for the affine variational inequality problem[J]. SIAM Journal on Optimization, 1992, 2 (1): 43- 54. |
| 8 | Tseng P , Yun S . A coordinate gradient descent method for nonsmooth separable minimization[J]. Mathematical Programming, 2009, 117 (1): 387- 423. |
| 9 | Wright S J . Accelerated block-coordinate relaxation for regularized optimization[J]. SIAM Journal on Optimization, 2012, 22 (1): 159- 186. |
| 10 | Beck A , Tetruashvili L . On the convergence of block coordinate descent type methods[J]. SIAM Journal on Optimization, 2013, 23 (4): 2037- 2060. |
| 11 | Leventhal D , Lewis A S . Randomized methods for linear constraints: convergence rates and conditioning[J]. Mathematics of Operations Research, 2010, 35 (3): 641- 654. |
| 12 | Nesterov Y . Efficiency of coordinate descent methods on huge-scale optimization problems[J]. SIAM Journal on Optimization, 2012, 22 (2): 341- 362. |
| 13 | Richtárik P , Taká? M . Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function[J]. Mathematical Programming, 2014, 144 (1): 1- 38. |
| 14 | Shalev-Shwartz S , Tewari A . Stochastic methods for $ \ell$1-regularized loss minimization[J]. The Journal of Machine Learning Research, 2011, 12, 1865- 1892. |
| 15 | Kurdyka K. On gradients of functions definable in o-minimal structures[J]//Annales de l'institut Fourier. 1998, 48(3): 769-783. |
| 16 | ?ojasiewicz S . Une propriété topologique des sous-ensembles analytiques réels[J]. Les équations aux dérivées partielles, 1963, 117, 87- 89. |
| 17 | Xu Y , Yin W . A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion[J]. SIAM Journal on Imaging Sciences, 2013, 6 (3): 1758- 1789. |
| 18 | Xu Y , Yin W . A globally convergent algorithm for nonconvex optimization based on block coordinate update[J]. Journal of Scientific Computing, 2017, 72 (2): 700- 734. |
| 19 | Kleywegt A J , Shapiro A , Homem-de-Mello T . The sample average approximation method for stochastic discrete optimization[J]. SIAM Journal on Optimization, 2002, 12 (2): 479- 502. |
| 20 | Robbins H , Monro S . A stochastic approximation method[J]. The Annals of Mathematical Statistics, 1951, 22, 400- 407. |
| 21 | Ghadimi S , Lan G . Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: A generic algorithmic framework[J]. SIAM Journal on Optimization, 2012, 22 (4): 1469- 1492. |
| 22 | Ghadimi S , Lan G . Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization II: Shrinking procedures and optimal algorithms[J]. SIAM Journal on Optimization, 2013, 23 (4): 2061- 2089. |
| 23 | Ghadimi S , Lan G . Stochastic first-and zeroth-order methods for nonconvex stochastic programming[J]. SIAM Journal on Optimization, 2013, 23 (4): 2341- 2368. |
| 24 | Ghadimi S , Lan G , Zhang H . Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization[J]. Mathematical Programming, 2016, 155 (1-2): 267- 305. |
| 25 | Lan G . An optimal method for stochastic composite optimization[J]. Mathematical Programming, 2012, 133 (1): 365- 397. |
| 26 | Nedic A , Lee S . On stochastic subgradient mirror-descent algorithm with weighted averaging[J]. SIAM Journal on Optimization, 2014, 24 (1): 84- 107. |
| 27 | Nemirovski A , Juditsky A , Lan G , et al. Robust stochastic approximation approach to stochastic programming[J]. SIAM Journal on Optimization, 2009, 19 (4): 1574- 1609. |
| 28 | Xu Y , Yin W . Block stochastic gradient iteration for convex and nonconvex optimization[J]. SIAM Journal on Optimization, 2015, 25 (3): 1686- 1716. |
| 29 | Dang C D , Lan G . Stochastic block mirror descent methods for nonsmooth and stochastic optimization[J]. SIAM Journal on Optimization, 2015, 25 (2): 856- 881. |
| 30 | Zhao T, Yu M, Wang Y, et al. Accelerated mini-batch randomized block coordinate descent method[C]//Advances in Neural Information Processing Systems, 2014: 3329-3337. |
| 31 | Johnson R, Zhang T. Accelerating stochastic gradient descent using predictive variance reduction[C]//Advances in Neural Information Processing Systems, 2013: 315-323. |
| 32 | Nesterov Y . Introductory Lectures on Convex Optimization: A Basic Course[M]. New York: Springer Science & Business Media, 2003. |
| 33 | Lei Y , Hu T , Li G , et al. Stochastic gradient descent for nonconvex learning without bounded gradient assumptions[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 31 (10): 4394- 4400. |
| 34 | Ying Y , Zhou D X . Unregularized online learning algorithms with general loss functions[J]. Applied and Computational Harmonic Analysis, 2017, 42 (2): 224- 244. |
| 35 | Rockafellar R T . Convex Analysis[M]. Princeton: Princeton university press, 2015. |
| 36 | Grimmer B . Convergence rates for deterministic and stochastic subgradient methods without Lipschitz continuity[J]. SIAM Journal on Optimization, 2019, 29 (2): 1350- 1365. |
| 37 | Robbins H, Siegmund D. A convergence theorem for non negative almost supermartingales and some applications[M]//Optimizing Methods in Statistics. Pittsburgh: Academic Press, 1971: 233-257. |
| 38 | Beck A , Teboulle M . A fast iterative shrinkage-thresholding algorithm for linear inverse problems[J]. SIAM Journal on Imaging Sciences, 2009, 2 (1): 183- 202. |
/
| 〈 |
|
〉 |