小批量随机块坐标下降算法

展开
  • 1. 中国科学院大学数学科学学院, 北京 100049
    2. 中国科学院大数据挖掘与知识管理重点实验室, 北京 100190
韩丛英  E-mail: hancy@ucas.ac.cn

收稿日期: 2021-06-14

  网络出版日期: 2022-03-14

基金资助

国家自然科学基金(11731013);国家自然科学基金(U19B2040);国家自然科学基金(11991022);中国科学院先导专项基金(XDA27000000)

Mini-batch stochastic block coordinate descent algorithm

Expand
  • 1. School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
    2. Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing 100190, China

Received date: 2021-06-14

  Online published: 2022-03-14

摘要

针对机器学习中广泛存在的一类问题:结构化随机优化问题(其中“结构化”是指问题的可行域具有块状结构,且目标函数的非光滑正则化部分在变量块之间是可分离的),我们研究了小批量随机块坐标下降算法(mSBD)。按照求解非复合问题和复合问题分别给出了基本的mSBD和它的变体,对于非复合问题,分析了算法在没有一致有界梯度方差假设情况下的收敛性质。而对于复合问题,在不需要通常的Lipschitz梯度连续性假设条件下得到了算法的收敛性。最后通过数值实验验证了mSBD的有效性。

本文引用格式

胡佳, 郭田德, 韩丛英 . 小批量随机块坐标下降算法[J]. 运筹学学报, 2022 , 26(1) : 1 -22 . DOI: 10.15960/j.cnki.issn.1007-6093.2022.01.001

Abstract

We study the mini-batch stochastic block coordinate descent algorithm (mSBD) for a class of problems, i.e., structured stochastic optimization problems (where "structured" means that feasible region of the problem has a block structure and the nonsmooth regularized part of the objective function is separable across the variable blocks), widely used in machine learning. We give the basic mSBD and its variant according to solving non-composite and composite problems respectively. For the noncomposite problem, we analyze the convergence properties of the algorithm without the assumption of uniformly bounded gradient variance. For the composite problem, we obtain the convergence of the algorithm without the usual Lipschitz gradient continuity assumption. Finally, we verify the effectiveness of mSBD by numerical experiments.

参考文献

1 Tibshirani R . Regression shrinkage and selection via the lasso[J]. Journal of the Royal Statistical Society: Series B (Methodological), 1996, 58 (1): 267- 288.
2 Shevade S K , Keerthi S S . A simple and efficient algorithm for gene selection using sparse logistic regression[J]. Bioinformatics, 2003, 19 (17): 2246- 2253.
3 Sra S , Nowozin S , Wright S J . Optimization for Machine Learning[M]. Cambridge: MIT Press, 2012.
4 Bottou L , Curtis F E , Nocedal J . Optimization methods for large-scale machine learning[J]. SIAM Review, 2018, 60 (2): 223- 311.
5 Zeng J, Lau T T K, Lin S, et al. Global convergence of block coordinate descent in deep learning[C]//International Conference on Machine Learning, 2019: 7313-7323.
6 Li J, Fang C, Lin Z. Lifted proximal operator machines[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019: 4181-4188.
7 Luo Z Q , Tseng P . Error bound and convergence analysis of matrix splitting algorithms for the affine variational inequality problem[J]. SIAM Journal on Optimization, 1992, 2 (1): 43- 54.
8 Tseng P , Yun S . A coordinate gradient descent method for nonsmooth separable minimization[J]. Mathematical Programming, 2009, 117 (1): 387- 423.
9 Wright S J . Accelerated block-coordinate relaxation for regularized optimization[J]. SIAM Journal on Optimization, 2012, 22 (1): 159- 186.
10 Beck A , Tetruashvili L . On the convergence of block coordinate descent type methods[J]. SIAM Journal on Optimization, 2013, 23 (4): 2037- 2060.
11 Leventhal D , Lewis A S . Randomized methods for linear constraints: convergence rates and conditioning[J]. Mathematics of Operations Research, 2010, 35 (3): 641- 654.
12 Nesterov Y . Efficiency of coordinate descent methods on huge-scale optimization problems[J]. SIAM Journal on Optimization, 2012, 22 (2): 341- 362.
13 Richtárik P , Taká? M . Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function[J]. Mathematical Programming, 2014, 144 (1): 1- 38.
14 Shalev-Shwartz S , Tewari A . Stochastic methods for $ \ell$1-regularized loss minimization[J]. The Journal of Machine Learning Research, 2011, 12, 1865- 1892.
15 Kurdyka K. On gradients of functions definable in o-minimal structures[J]//Annales de l'institut Fourier. 1998, 48(3): 769-783.
16 ?ojasiewicz S . Une propriété topologique des sous-ensembles analytiques réels[J]. Les équations aux dérivées partielles, 1963, 117, 87- 89.
17 Xu Y , Yin W . A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion[J]. SIAM Journal on Imaging Sciences, 2013, 6 (3): 1758- 1789.
18 Xu Y , Yin W . A globally convergent algorithm for nonconvex optimization based on block coordinate update[J]. Journal of Scientific Computing, 2017, 72 (2): 700- 734.
19 Kleywegt A J , Shapiro A , Homem-de-Mello T . The sample average approximation method for stochastic discrete optimization[J]. SIAM Journal on Optimization, 2002, 12 (2): 479- 502.
20 Robbins H , Monro S . A stochastic approximation method[J]. The Annals of Mathematical Statistics, 1951, 22, 400- 407.
21 Ghadimi S , Lan G . Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: A generic algorithmic framework[J]. SIAM Journal on Optimization, 2012, 22 (4): 1469- 1492.
22 Ghadimi S , Lan G . Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization II: Shrinking procedures and optimal algorithms[J]. SIAM Journal on Optimization, 2013, 23 (4): 2061- 2089.
23 Ghadimi S , Lan G . Stochastic first-and zeroth-order methods for nonconvex stochastic programming[J]. SIAM Journal on Optimization, 2013, 23 (4): 2341- 2368.
24 Ghadimi S , Lan G , Zhang H . Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization[J]. Mathematical Programming, 2016, 155 (1-2): 267- 305.
25 Lan G . An optimal method for stochastic composite optimization[J]. Mathematical Programming, 2012, 133 (1): 365- 397.
26 Nedic A , Lee S . On stochastic subgradient mirror-descent algorithm with weighted averaging[J]. SIAM Journal on Optimization, 2014, 24 (1): 84- 107.
27 Nemirovski A , Juditsky A , Lan G , et al. Robust stochastic approximation approach to stochastic programming[J]. SIAM Journal on Optimization, 2009, 19 (4): 1574- 1609.
28 Xu Y , Yin W . Block stochastic gradient iteration for convex and nonconvex optimization[J]. SIAM Journal on Optimization, 2015, 25 (3): 1686- 1716.
29 Dang C D , Lan G . Stochastic block mirror descent methods for nonsmooth and stochastic optimization[J]. SIAM Journal on Optimization, 2015, 25 (2): 856- 881.
30 Zhao T, Yu M, Wang Y, et al. Accelerated mini-batch randomized block coordinate descent method[C]//Advances in Neural Information Processing Systems, 2014: 3329-3337.
31 Johnson R, Zhang T. Accelerating stochastic gradient descent using predictive variance reduction[C]//Advances in Neural Information Processing Systems, 2013: 315-323.
32 Nesterov Y . Introductory Lectures on Convex Optimization: A Basic Course[M]. New York: Springer Science & Business Media, 2003.
33 Lei Y , Hu T , Li G , et al. Stochastic gradient descent for nonconvex learning without bounded gradient assumptions[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 31 (10): 4394- 4400.
34 Ying Y , Zhou D X . Unregularized online learning algorithms with general loss functions[J]. Applied and Computational Harmonic Analysis, 2017, 42 (2): 224- 244.
35 Rockafellar R T . Convex Analysis[M]. Princeton: Princeton university press, 2015.
36 Grimmer B . Convergence rates for deterministic and stochastic subgradient methods without Lipschitz continuity[J]. SIAM Journal on Optimization, 2019, 29 (2): 1350- 1365.
37 Robbins H, Siegmund D. A convergence theorem for non negative almost supermartingales and some applications[M]//Optimizing Methods in Statistics. Pittsburgh: Academic Press, 1971: 233-257.
38 Beck A , Teboulle M . A fast iterative shrinkage-thresholding algorithm for linear inverse problems[J]. SIAM Journal on Imaging Sciences, 2009, 2 (1): 183- 202.
文章导航

/