运筹学学报 ›› 2022, Vol. 26 ›› Issue (1): 1-22.doi: 10.15960/j.cnki.issn.1007-6093.2022.01.001
收稿日期:
2021-06-14
出版日期:
2022-03-15
发布日期:
2022-03-14
通讯作者:
韩丛英
E-mail:hancy@ucas.ac.cn
作者简介:
韩丛英 E-mail: hancy@ucas.ac.cn基金资助:
Jia HU1, Tiande GUO1,2, Congying HAN1,2*,*()
Received:
2021-06-14
Online:
2022-03-15
Published:
2022-03-14
Contact:
Congying HAN
E-mail:hancy@ucas.ac.cn
摘要:
针对机器学习中广泛存在的一类问题:结构化随机优化问题(其中“结构化”是指问题的可行域具有块状结构,且目标函数的非光滑正则化部分在变量块之间是可分离的),我们研究了小批量随机块坐标下降算法(mSBD)。按照求解非复合问题和复合问题分别给出了基本的mSBD和它的变体,对于非复合问题,分析了算法在没有一致有界梯度方差假设情况下的收敛性质。而对于复合问题,在不需要通常的Lipschitz梯度连续性假设条件下得到了算法的收敛性。最后通过数值实验验证了mSBD的有效性。
中图分类号:
胡佳, 郭田德, 韩丛英. 小批量随机块坐标下降算法[J]. 运筹学学报, 2022, 26(1): 1-22.
Jia HU, Tiande GUO, Congying HAN. Mini-batch stochastic block coordinate descent algorithm[J]. Operations Research Transactions, 2022, 26(1): 1-22.
1 |
Tibshirani R . Regression shrinkage and selection via the lasso[J]. Journal of the Royal Statistical Society: Series B (Methodological), 1996, 58 (1): 267- 288.
doi: 10.1111/j.2517-6161.1996.tb02080.x |
2 |
Shevade S K , Keerthi S S . A simple and efficient algorithm for gene selection using sparse logistic regression[J]. Bioinformatics, 2003, 19 (17): 2246- 2253.
doi: 10.1093/bioinformatics/btg308 |
3 | Sra S , Nowozin S , Wright S J . Optimization for Machine Learning[M]. Cambridge: MIT Press, 2012. |
4 |
Bottou L , Curtis F E , Nocedal J . Optimization methods for large-scale machine learning[J]. SIAM Review, 2018, 60 (2): 223- 311.
doi: 10.1137/16M1080173 |
5 | Zeng J, Lau T T K, Lin S, et al. Global convergence of block coordinate descent in deep learning[C]//International Conference on Machine Learning, 2019: 7313-7323. |
6 | Li J, Fang C, Lin Z. Lifted proximal operator machines[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019: 4181-4188. |
7 |
Luo Z Q , Tseng P . Error bound and convergence analysis of matrix splitting algorithms for the affine variational inequality problem[J]. SIAM Journal on Optimization, 1992, 2 (1): 43- 54.
doi: 10.1137/0802004 |
8 | Tseng P , Yun S . A coordinate gradient descent method for nonsmooth separable minimization[J]. Mathematical Programming, 2009, 117 (1): 387- 423. |
9 |
Wright S J . Accelerated block-coordinate relaxation for regularized optimization[J]. SIAM Journal on Optimization, 2012, 22 (1): 159- 186.
doi: 10.1137/100808563 |
10 |
Beck A , Tetruashvili L . On the convergence of block coordinate descent type methods[J]. SIAM Journal on Optimization, 2013, 23 (4): 2037- 2060.
doi: 10.1137/120887679 |
11 |
Leventhal D , Lewis A S . Randomized methods for linear constraints: convergence rates and conditioning[J]. Mathematics of Operations Research, 2010, 35 (3): 641- 654.
doi: 10.1287/moor.1100.0456 |
12 |
Nesterov Y . Efficiency of coordinate descent methods on huge-scale optimization problems[J]. SIAM Journal on Optimization, 2012, 22 (2): 341- 362.
doi: 10.1137/100802001 |
13 | Richtárik P , Takáč M . Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function[J]. Mathematical Programming, 2014, 144 (1): 1- 38. |
14 | Shalev-Shwartz S , Tewari A . Stochastic methods for $ \ell$1-regularized loss minimization[J]. The Journal of Machine Learning Research, 2011, 12, 1865- 1892. |
15 | Kurdyka K. On gradients of functions definable in o-minimal structures[J]//Annales de l'institut Fourier. 1998, 48(3): 769-783. |
16 | Łojasiewicz S . Une propriété topologique des sous-ensembles analytiques réels[J]. Les équations aux dérivées partielles, 1963, 117, 87- 89. |
17 |
Xu Y , Yin W . A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion[J]. SIAM Journal on Imaging Sciences, 2013, 6 (3): 1758- 1789.
doi: 10.1137/120887795 |
18 |
Xu Y , Yin W . A globally convergent algorithm for nonconvex optimization based on block coordinate update[J]. Journal of Scientific Computing, 2017, 72 (2): 700- 734.
doi: 10.1007/s10915-017-0376-0 |
19 |
Kleywegt A J , Shapiro A , Homem-de-Mello T . The sample average approximation method for stochastic discrete optimization[J]. SIAM Journal on Optimization, 2002, 12 (2): 479- 502.
doi: 10.1137/S1052623499363220 |
20 |
Robbins H , Monro S . A stochastic approximation method[J]. The Annals of Mathematical Statistics, 1951, 22, 400- 407.
doi: 10.1214/aoms/1177729586 |
21 |
Ghadimi S , Lan G . Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: A generic algorithmic framework[J]. SIAM Journal on Optimization, 2012, 22 (4): 1469- 1492.
doi: 10.1137/110848864 |
22 |
Ghadimi S , Lan G . Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization II: Shrinking procedures and optimal algorithms[J]. SIAM Journal on Optimization, 2013, 23 (4): 2061- 2089.
doi: 10.1137/110848876 |
23 |
Ghadimi S , Lan G . Stochastic first-and zeroth-order methods for nonconvex stochastic programming[J]. SIAM Journal on Optimization, 2013, 23 (4): 2341- 2368.
doi: 10.1137/120880811 |
24 |
Ghadimi S , Lan G , Zhang H . Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization[J]. Mathematical Programming, 2016, 155 (1-2): 267- 305.
doi: 10.1007/s10107-014-0846-1 |
25 | Lan G . An optimal method for stochastic composite optimization[J]. Mathematical Programming, 2012, 133 (1): 365- 397. |
26 |
Nedic A , Lee S . On stochastic subgradient mirror-descent algorithm with weighted averaging[J]. SIAM Journal on Optimization, 2014, 24 (1): 84- 107.
doi: 10.1137/120894464 |
27 |
Nemirovski A , Juditsky A , Lan G , et al. Robust stochastic approximation approach to stochastic programming[J]. SIAM Journal on Optimization, 2009, 19 (4): 1574- 1609.
doi: 10.1137/070704277 |
28 |
Xu Y , Yin W . Block stochastic gradient iteration for convex and nonconvex optimization[J]. SIAM Journal on Optimization, 2015, 25 (3): 1686- 1716.
doi: 10.1137/140983938 |
29 |
Dang C D , Lan G . Stochastic block mirror descent methods for nonsmooth and stochastic optimization[J]. SIAM Journal on Optimization, 2015, 25 (2): 856- 881.
doi: 10.1137/130936361 |
30 | Zhao T, Yu M, Wang Y, et al. Accelerated mini-batch randomized block coordinate descent method[C]//Advances in Neural Information Processing Systems, 2014: 3329-3337. |
31 | Johnson R, Zhang T. Accelerating stochastic gradient descent using predictive variance reduction[C]//Advances in Neural Information Processing Systems, 2013: 315-323. |
32 | Nesterov Y . Introductory Lectures on Convex Optimization: A Basic Course[M]. New York: Springer Science & Business Media, 2003. |
33 | Lei Y , Hu T , Li G , et al. Stochastic gradient descent for nonconvex learning without bounded gradient assumptions[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 31 (10): 4394- 4400. |
34 |
Ying Y , Zhou D X . Unregularized online learning algorithms with general loss functions[J]. Applied and Computational Harmonic Analysis, 2017, 42 (2): 224- 244.
doi: 10.1016/j.acha.2015.08.007 |
35 | Rockafellar R T . Convex Analysis[M]. Princeton: Princeton university press, 2015. |
36 |
Grimmer B . Convergence rates for deterministic and stochastic subgradient methods without Lipschitz continuity[J]. SIAM Journal on Optimization, 2019, 29 (2): 1350- 1365.
doi: 10.1137/18M117306X |
37 | Robbins H, Siegmund D. A convergence theorem for non negative almost supermartingales and some applications[M]//Optimizing Methods in Statistics. Pittsburgh: Academic Press, 1971: 233-257. |
38 |
Beck A , Teboulle M . A fast iterative shrinkage-thresholding algorithm for linear inverse problems[J]. SIAM Journal on Imaging Sciences, 2009, 2 (1): 183- 202.
doi: 10.1137/080716542 |
[1] | 徐姿, 张慧灵. 非凸极小极大问题的优化算法与复杂度分析[J]. 运筹学学报, 2021, 25(3): 74-86. |
[2] | 李红武, 谢敏, 张榕. 一类非光滑凸优化问题的邻近梯度算法[J]. 运筹学学报, 2021, 25(1): 61-72. |
[3] | 苗小楠, 顾剑, 肖现涛. 求解非光滑方程组的三次正则化方法[J]. 运筹学学报, 2019, 23(2): 17-30. |
[4] | 王伟祥, 尚有林, 王朵. 求解带箱子集约束的非光滑全局优化问题的填充函数方法[J]. 运筹学学报, 2019, 23(1): 28-34. |
[5] | 简金宝, 劳译娴, 晁绵涛, 马国栋. 线性约束两分块非凸优化的ADMM-SQP算法[J]. 运筹学学报, 2018, 22(2): 79-92. |
[6] | 梁娜, 杜守强. 求解对称张量绝对值方程问题的非光滑牛顿法[J]. 运筹学学报, 2017, 21(3): 95-102. |
[7] | 陈元媛, 高岩, 刘志敏, 杜守强. 一类特殊优化问题的光滑梯度法[J]. 运筹学学报, 2017, 21(2): 119-125. |
[8] | 张勇, 叶万洲. 一种求解弹性l_2-l_q正则化问题的算法[J]. 运筹学学报, 2016, 20(4): 11-20. |
[9] | 张清叶, 高岩. 求解非光滑凸规划的一种混合束方法[J]. 运筹学学报, 2016, 20(2): 113-120. |
[10] | 韩艳丽, 高岩. 基于生存理论的线性微分博弈系统识别域的判别[J]. 运筹学学报, 2016, 20(1): 105-111. |
[11] | 李丽花, 高岩, 王隔霞. 一类变结构动态系统的非光滑最优性条件[J]. 运筹学学报, 2013, 17(3): 65-72. |
[12] | 赵克全,杨新民. 一类非光滑优化问题解集的性质[J]. 运筹学学报, 2012, 16(3): 109-118. |
[13] | 高岩. 非光滑非线性互补问题的牛顿法[J]. 运筹学学报, 2011, 15(2): 53-58. |
[14] | 白乙拉, 吕巍. 一类非光滑分布参数系统的可辨识性及最优性条件[J]. 运筹学学报, 2011, 15(2): 119-126. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||