小批量随机块坐标下降算法

doi:10.15960/j.cnki.issn.1007-6093.2022.01.001

摘要/Abstract

摘要：

针对机器学习中广泛存在的一类问题：结构化随机优化问题（其中“结构化”是指问题的可行域具有块状结构，且目标函数的非光滑正则化部分在变量块之间是可分离的），我们研究了小批量随机块坐标下降算法（mSBD）。按照求解非复合问题和复合问题分别给出了基本的mSBD和它的变体，对于非复合问题，分析了算法在没有一致有界梯度方差假设情况下的收敛性质。而对于复合问题，在不需要通常的Lipschitz梯度连续性假设条件下得到了算法的收敛性。最后通过数值实验验证了mSBD的有效性。

关键词: 块坐标下降, 随机近似, 随机(复合)优化, Hölder连续, 非光滑, 非凸优化

Abstract:

We study the mini-batch stochastic block coordinate descent algorithm (mSBD) for a class of problems, i.e., structured stochastic optimization problems (where "structured" means that feasible region of the problem has a block structure and the nonsmooth regularized part of the objective function is separable across the variable blocks), widely used in machine learning. We give the basic mSBD and its variant according to solving non-composite and composite problems respectively. For the noncomposite problem, we analyze the convergence properties of the algorithm without the assumption of uniformly bounded gradient variance. For the composite problem, we obtain the convergence of the algorithm without the usual Lipschitz gradient continuity assumption. Finally, we verify the effectiveness of mSBD by numerical experiments.

Key words: block coordinate descent, stochastic approximation, stochastic (composite) optimization, Hölder continuity, nonsmooth, nonconvex optimization

中图分类号:

O221.2

胡佳, 郭田德, 韩丛英. 小批量随机块坐标下降算法[J]. 运筹学学报, 2022, 26(1): 1-22.

Jia HU, Tiande GUO, Congying HAN. Mini-batch stochastic block coordinate descent algorithm[J]. Operations Research Transactions, 2022, 26(1): 1-22.

图/表 3

参考文献 38

1	Tibshirani R . Regression shrinkage and selection via the lasso[J]. Journal of the Royal Statistical Society: Series B (Methodological), 1996, 58 (1): 267- 288. doi: 10.1111/j.2517-6161.1996.tb02080.x
2	Shevade S K , Keerthi S S . A simple and efficient algorithm for gene selection using sparse logistic regression[J]. Bioinformatics, 2003, 19 (17): 2246- 2253. doi: 10.1093/bioinformatics/btg308
3	Sra S , Nowozin S , Wright S J . Optimization for Machine Learning[M]. Cambridge: MIT Press, 2012.
4	Bottou L , Curtis F E , Nocedal J . Optimization methods for large-scale machine learning[J]. SIAM Review, 2018, 60 (2): 223- 311. doi: 10.1137/16M1080173
5	Zeng J, Lau T T K, Lin S, et al. Global convergence of block coordinate descent in deep learning[C]//International Conference on Machine Learning, 2019: 7313-7323.
6	Li J, Fang C, Lin Z. Lifted proximal operator machines[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019: 4181-4188.
7	Luo Z Q , Tseng P . Error bound and convergence analysis of matrix splitting algorithms for the affine variational inequality problem[J]. SIAM Journal on Optimization, 1992, 2 (1): 43- 54. doi: 10.1137/0802004
8	Tseng P , Yun S . A coordinate gradient descent method for nonsmooth separable minimization[J]. Mathematical Programming, 2009, 117 (1): 387- 423.
9	Wright S J . Accelerated block-coordinate relaxation for regularized optimization[J]. SIAM Journal on Optimization, 2012, 22 (1): 159- 186. doi: 10.1137/100808563
10	Beck A , Tetruashvili L . On the convergence of block coordinate descent type methods[J]. SIAM Journal on Optimization, 2013, 23 (4): 2037- 2060. doi: 10.1137/120887679
11	Leventhal D , Lewis A S . Randomized methods for linear constraints: convergence rates and conditioning[J]. Mathematics of Operations Research, 2010, 35 (3): 641- 654. doi: 10.1287/moor.1100.0456
12	Nesterov Y . Efficiency of coordinate descent methods on huge-scale optimization problems[J]. SIAM Journal on Optimization, 2012, 22 (2): 341- 362. doi: 10.1137/100802001
13	Richtárik P , Takáč M . Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function[J]. Mathematical Programming, 2014, 144 (1): 1- 38.
14	Shalev-Shwartz S , Tewari A . Stochastic methods for $ \ell$₁-regularized loss minimization[J]. The Journal of Machine Learning Research, 2011, 12, 1865- 1892.
15	Kurdyka K. On gradients of functions definable in o-minimal structures[J]//Annales de l'institut Fourier. 1998, 48(3): 769-783.
16	Łojasiewicz S . Une propriété topologique des sous-ensembles analytiques réels[J]. Les équations aux dérivées partielles, 1963, 117, 87- 89.
17	Xu Y , Yin W . A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion[J]. SIAM Journal on Imaging Sciences, 2013, 6 (3): 1758- 1789. doi: 10.1137/120887795
18	Xu Y , Yin W . A globally convergent algorithm for nonconvex optimization based on block coordinate update[J]. Journal of Scientific Computing, 2017, 72 (2): 700- 734. doi: 10.1007/s10915-017-0376-0
19	Kleywegt A J , Shapiro A , Homem-de-Mello T . The sample average approximation method for stochastic discrete optimization[J]. SIAM Journal on Optimization, 2002, 12 (2): 479- 502. doi: 10.1137/S1052623499363220
20	Robbins H , Monro S . A stochastic approximation method[J]. The Annals of Mathematical Statistics, 1951, 22, 400- 407. doi: 10.1214/aoms/1177729586
21	Ghadimi S , Lan G . Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: A generic algorithmic framework[J]. SIAM Journal on Optimization, 2012, 22 (4): 1469- 1492. doi: 10.1137/110848864
22	Ghadimi S , Lan G . Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization II: Shrinking procedures and optimal algorithms[J]. SIAM Journal on Optimization, 2013, 23 (4): 2061- 2089. doi: 10.1137/110848876
23	Ghadimi S , Lan G . Stochastic first-and zeroth-order methods for nonconvex stochastic programming[J]. SIAM Journal on Optimization, 2013, 23 (4): 2341- 2368. doi: 10.1137/120880811
24	Ghadimi S , Lan G , Zhang H . Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization[J]. Mathematical Programming, 2016, 155 (1-2): 267- 305. doi: 10.1007/s10107-014-0846-1
25	Lan G . An optimal method for stochastic composite optimization[J]. Mathematical Programming, 2012, 133 (1): 365- 397.
26	Nedic A , Lee S . On stochastic subgradient mirror-descent algorithm with weighted averaging[J]. SIAM Journal on Optimization, 2014, 24 (1): 84- 107. doi: 10.1137/120894464
27	Nemirovski A , Juditsky A , Lan G , et al. Robust stochastic approximation approach to stochastic programming[J]. SIAM Journal on Optimization, 2009, 19 (4): 1574- 1609. doi: 10.1137/070704277
28	Xu Y , Yin W . Block stochastic gradient iteration for convex and nonconvex optimization[J]. SIAM Journal on Optimization, 2015, 25 (3): 1686- 1716. doi: 10.1137/140983938
29	Dang C D , Lan G . Stochastic block mirror descent methods for nonsmooth and stochastic optimization[J]. SIAM Journal on Optimization, 2015, 25 (2): 856- 881. doi: 10.1137/130936361
30	Zhao T, Yu M, Wang Y, et al. Accelerated mini-batch randomized block coordinate descent method[C]//Advances in Neural Information Processing Systems, 2014: 3329-3337.
31	Johnson R, Zhang T. Accelerating stochastic gradient descent using predictive variance reduction[C]//Advances in Neural Information Processing Systems, 2013: 315-323.
32	Nesterov Y . Introductory Lectures on Convex Optimization: A Basic Course[M]. New York: Springer Science & Business Media, 2003.
33	Lei Y , Hu T , Li G , et al. Stochastic gradient descent for nonconvex learning without bounded gradient assumptions[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 31 (10): 4394- 4400.
34	Ying Y , Zhou D X . Unregularized online learning algorithms with general loss functions[J]. Applied and Computational Harmonic Analysis, 2017, 42 (2): 224- 244. doi: 10.1016/j.acha.2015.08.007
35	Rockafellar R T . Convex Analysis[M]. Princeton: Princeton university press, 2015.
36	Grimmer B . Convergence rates for deterministic and stochastic subgradient methods without Lipschitz continuity[J]. SIAM Journal on Optimization, 2019, 29 (2): 1350- 1365. doi: 10.1137/18M117306X
37	Robbins H, Siegmund D. A convergence theorem for non negative almost supermartingales and some applications[M]//Optimizing Methods in Statistics. Pittsburgh: Academic Press, 1971: 233-257.
38	Beck A , Teboulle M . A fast iterative shrinkage-thresholding algorithm for linear inverse problems[J]. SIAM Journal on Imaging Sciences, 2009, 2 (1): 183- 202. doi: 10.1137/080716542

名称	样本数	维数
E2006-tfidf	16087	150360

[1]	徐姿, 张慧灵. 非凸极小极大问题的优化算法与复杂度分析[J]. 运筹学学报, 2021, 25(3): 74-86.
[2]	李红武, 谢敏, 张榕. 一类非光滑凸优化问题的邻近梯度算法[J]. 运筹学学报, 2021, 25(1): 61-72.
[3]	苗小楠, 顾剑, 肖现涛. 求解非光滑方程组的三次正则化方法[J]. 运筹学学报, 2019, 23(2): 17-30.
[4]	王伟祥, 尚有林, 王朵. 求解带箱子集约束的非光滑全局优化问题的填充函数方法[J]. 运筹学学报, 2019, 23(1): 28-34.
[5]	简金宝, 劳译娴, 晁绵涛, 马国栋. 线性约束两分块非凸优化的ADMM-SQP算法[J]. 运筹学学报, 2018, 22(2): 79-92.
[6]	梁娜, 杜守强. 求解对称张量绝对值方程问题的非光滑牛顿法[J]. 运筹学学报, 2017, 21(3): 95-102.
[7]	陈元媛, 高岩, 刘志敏, 杜守强. 一类特殊优化问题的光滑梯度法[J]. 运筹学学报, 2017, 21(2): 119-125.
[8]	张勇, 叶万洲. 一种求解弹性l_2-l_q正则化问题的算法[J]. 运筹学学报, 2016, 20(4): 11-20.
[9]	张清叶, 高岩. 求解非光滑凸规划的一种混合束方法[J]. 运筹学学报, 2016, 20(2): 113-120.
[10]	韩艳丽, 高岩. 基于生存理论的线性微分博弈系统识别域的判别[J]. 运筹学学报, 2016, 20(1): 105-111.
[11]	李丽花, 高岩, 王隔霞. 一类变结构动态系统的非光滑最优性条件[J]. 运筹学学报, 2013, 17(3): 65-72.
[12]	赵克全,杨新民. 一类非光滑优化问题解集的性质[J]. 运筹学学报, 2012, 16(3): 109-118.
[13]	高岩. 非光滑非线性互补问题的牛顿法[J]. 运筹学学报, 2011, 15(2): 53-58.
[14]	白乙拉, 吕巍. 一类非光滑分布参数系统的可辨识性及最优性条件[J]. 运筹学学报, 2011, 15(2): 119-126.