面向非凸非光滑最优化问题的临近类方法寻找“钝化”局部最优解

doi:10.15960/j.cnki.issn.1007-6093.2026.02.001

Abstract

Abstract: We propose a general Flexible proxImal-based block-wise First-order Algorithm framework called FIFA for a stochastic composite minimization problem with two nonconvex function components in the objective while only one of them is assumed to be differentiable. Under some per-block Lipschitz-like conditions based on Bregman distance, but without the global Lipschitz continuity of the gradient of the differentiable function, we prove that any accumulation point of the sequence is a stationary point of the model. We further show that the stationarity is the "best" one if the global Lipschitz continuity is additionally assumed, and that it is even the local minimizer for some special cases. Convergence analysis without the global Lipschitz continuity and the enhanced stationarity analysis make our results different from existing results in both the convex and nonconvex contexts.

Key words: non-Lipschitz continuous, non-convex, proximal gradient, Bregman, first order method

CLC Number:

O221.2

WANG Xiangfeng, ZENG Shangzhi, ZHANG Jin, ZHOU Jinchuan. Proximal-based methods can guarantee blunt local minimizer for nonconvex nonsmooth optimization problem[J]. Operations Research Transactions, 2026, 30(2): 1-23.

References

[1] Polyak B T. Introduction to Optimization [M]. New York: Optimization Software, 1987.
[2] Nesterov Y. Efficiency of coordiate descent methods on huge-scale optimization problems [J]. SIAM Journal on Optimization, 2012, 22(2): 341-362.
[3] Hong M, Wang X, Razaviyayn M, et al. Iteration complexity analysis of block coordinate descent methods [J]. Mathematical Programming, 2017, 163(1): 85-114.
[4] Schmidt M, Le-Roux N, Bach F. Minimizing finite sums with the stochastic average gradient [J]. Mathematical Programming, 2017, 162(1-2): 83-112.
[5] Johnson R, Zhang T. Accelerating stochastic gradient descent using predictive variance reduction [C]//Advances in Neural Information Processing Systems, 2013: 315-323.
[6] Defazio A, Bach F, Lacoste-Julien S. SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives [C]//Advances in Neural Information Processing Systems, 2014: 1646-1654.
[7] Goodfellow I J, Bengio Y, Courville A C. Deep Learning [M]//Cambridge: MIT Press, 2016.
[8] Kingma D P, Ba J. Adam: A method for stochastic optimization [C]//Proceedings of the 3rd International Conference on Learning Representations, 2015.
[9] Fercoq O, Richtarik P. Optimization in high dimensions via accelerated, parallel, and proximal coordinate descent [J]. SIAM Review, 2016, 58(4): 739-771.
[10] Friedman J, Hastie T, Tibshirani R. Regularized paths for generalized linear models via coordinate descent [J]. Journal of Statistical Software, 2010, 33: 1-22.
[11] Liu J, Wright S. Asynchronous stochastic coordinate descent: Parallelism and convergence properties [J]. SIAM Journal on Optimization, 2015, 25(1): 351-376.
[12] Liu J, Wright S, Ré C, et al. An asynchronous parallel stochastic coordinate descent algorithm [J]. Journal of Machine Learning Research, 2015, 16: 285-322.
[13] Nutini J, Schmidt M, Laradji I, et al. Coordinate descent converges faster with the gausssouthwell rule than random selection [C]//Proceedings of the 32nd International Conference on Machine Learning, 2015: 1632-1641.
[14] Saha A, Tewari A. On the nonasymptotic convergence of cyclic coordinate descent method [J]. SIAM Journal on Optimization, 2013, 23(1): 576-601.
[15] Shalev-Shwartz S, Tewari A. Stochastic methods for ‘1 regularized loss minimization [J]. Journal of Machine Learning Research, 2011, 12: 1865-1892.
[16] Sun R, Hong M. Improved iteration complexity bounds of cyclic block coordinate descent for convex problems [C]//Advances in Neural Information Processing Systems, 2015: 1306-1314.
[17] Wang X, Ye J J, Yuan X, et al. Perturbation techniques for convergence analysis of proximal gradient method and other first-order algorithms via variational analysis [J]. Set-Valued and Variational Analysis, 2022, 30: 39-79.
[18] Hong M, Razaviyayn M, Luo Z Q, et al. A unified algorithmic framework for block-structured optimization involving big data: With applications in machine learning and signal processing [J]. IEEE Signal Processing Magazine, 2016, 33(1): 57-77.
[19] Wright S. Coordinate descent algorithms [J]. Mathematical Programming, 2015, 151(1): 3-34.
[20] Luo Z Q, Tseng P. Error bounds and convergence analysis of feasible descent methods: A general approach [J]. Annals of Operations Research, 1993, 46(1): 157-178.
[21] Razaviyayn M, Hong M, Luo Z Q. A unified convergence analysis of block successive minimization methods for nonsmooth optimization [J]. SIAM Journal on Optimization, 2013, 23(2): 1126-1153. [1] Polyak B T. Introduction to Optimization [M]. New York: Optimization Software, 1987.

[22] Beck A, Tetruashvili L. On the convergence of block coordinate descent type methods [J]. SIAM Journal on Optimization, 2013, 23(4): 2037-2060.
[23] Kurdyka K. On gradients of functions definable in o-minimal structures [J]. 1998, 48(3): 769- 784.
[24] Lojasiewicz S. Une propriété topologique des sous-ensembles analytiques réels [J]. Les équations aux dérivées partielles, 1963, 117: 87-89.
[25] Attouch H, Bolte J, Redont P, et al. Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka- Lojasiewicz inequality [J]. Mathematics of Operations Research, 2010, 35(2): 438-457.
[26] Bolte J, Sabach S, Teboulle M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems [J]. Mathematical Programming, 2014, 146(1-2): 459-494.
[27] Frankel P, Garrigos G, Peypouquet J. Splitting methods with variable metric for KurdykaLojasiewicz functions and general convergence rates [J]. Journal of Optimization Theory and Applications, 2015, 165(3): 874-900.
[28] Li G, Pong T K. Calculus of the exponent of Kurdyka- Lojasiewicz inequality and its applications to linear convergence of first-order methods [J]. Foundations of Computational Mathematics, 2018, 18(5): 1199-1232.
[29] Xu Y, Yin W. A globally convergent algorithm for nonconvex optimization based on block coordinate update [J]. Journal of Scientific Computing, 2017, 72(2): 700-734.
[30] Rude S. An overview of gradient descent optimization algorithms [EB/OL]. [2023-03-01]. arXiv: 1609.04747.
[31] Imre Csiszar. Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems [J]. The Annals of Statistics, 1991: 2032-2066.
[32] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets [C]//Advances in Neural Information Processing Systems, 2014: 2672-2680.
[33] Wainwright M, Jordan M. Graphical models, exponential families, and variational inference [J]. Foundations and Trends in Machine Learning, 2008, 1(1-2): 1-305.
[34] Kullback S. Information Theory and Statistics [M]. Mineola: Courier Corporation, 1997.
[35] Bauschke H, Bolte J, Teboulle M. A descent lemma beyond Lipschitz gradient continuity: First-order methods revisited and applications [J]. Mathematics of Operations Research, 2017, 42(2): 330-348.
[36] Bolte J, Sabach S, Teboulle M, et al. First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems [J]. SIAM Journal on Optimization, 2018, 28(3): 2131-2151.
[37] Mordukhovich B. Variational Analysis and Generalized Differentiation I: Basic Theory, II: Applications [M]. Heidelberg: Springer Science & Business Media, 2006.
[38] Leyffer S, Munson T. A globally convergent filter method for MPECs [J]. Preprint Argonne National Laboratory, 2007.
[39] Rockafellar R T, Wets R. Variational Analysis [M]. Heidelberg: Springer Science & Business Media, 2009.
[40] Borwein J, Lewis A. Convex Analysis and Nonlinear Optimization: Theory and Examples [M]. Heidelberg: Springer Science & Business Media, 2010.
[41] Chen G, Teboulle M. Convergence analysis of a proximal-like minimization algorithm using Bregman functions [J]. SIAM Journal on Optimization, 1993, 3(3): 538-543.
[42] Bauschke H, Combettes P. Convex Analysis and Monotone Operator Theory in Hilbert Space [M]. New York: Springer, 2011.
[43] Lei J, Shanbhag U. A randomized block proximal variable sample-size stochastic gradient method for composite nonconvex stochastic optimization [EB/OL]. [2023-03-01]. arXiv: 1808.02543.
[44] Amahroq T, Penot J P, Syam A. On the subdifferentiability of the difference of two functions and local minimization [J]. Set-Valued Analysis, 2008, 16: 413-427.
[45] Banjac G, Goulart P. A novel approach for solving convex problems with cardinality constraints [J]. IFAC-PapersOnLine, 2017, 50(1): 13182-13187.
[46] Beck A, Eldar Y. Sparsity constrained nonlinear optimization: Optimality conditions and algorithms [J]. SIAM Journal on Optimization, 2013, 23(3): 1480-1509.
[47] Robbins H, Siegmund D. A convergence theorem for non-negative almost supermartingales and some applications [C]//Herbert Robbins Selected Papers, 1985: 111-135.

Proximal-based methods can guarantee blunt local minimizer for nonconvex nonsmooth optimization problem

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments

[1]	YANG Jinji, SHEN Chungen, YU Zhensheng. Convergence analysis of an adaptive proximal gradient-subgradient algorithm for square-root-loss regression problems [J]. Operations Research Transactions, 2026, 30(1): 217-234.
[2]	LIU Cong, JIAN Ailun, YUAN Gonglin. A modified conjugate gradient algorithm with its applications in image recovery problems [J]. Operations Research Transactions, 2026, 30(1): 207-216.
[3]	HUANG Yin, LUO Hezhi. A faster SDP relaxation for two-period financial derivatives liquidation problem [J]. Operations Research Transactions, 2026, 30(1): 108-120.
[4]	Zilin TAN, Honglin LUO. A second-order splitting method with its application [J]. Operations Research Transactions, 2025, 29(4): 121-140.
[5]	Chenglong BAO, Chang CHEN. A survey on the Bregman iteration in computing Landau's free functional minimization problems [J]. Operations Research Transactions, 2025, 29(3): 243-266.
[6]	Jing ZENG, Ruowen DING. Stability of solutions for a class of non-convex vector optimization problems with mapping differences [J]. Operations Research Transactions, 2023, 27(3): 121-128.
[7]	Chuanlong WANG, Jianhua NIU, Qianying SHEN. Two new accelerated proximal gradient algorithms for Toeplitz matrix completion [J]. Operations Research Transactions, 2023, 27(3): 96-108.
[8]	ZHANG Huiling, XU Yang, XU Zi. Block alternating proximal gradient algorithm for convex-nonconcave minimax problems [J]. Operations Research Transactions, 2022, 26(4): 64-74.
[9]	Jiahao LYU, Honglin LUO, Zehua YANG, Jianwen PENG. A stochastic Bregman ADMM with its application in training sparse structure SVMs [J]. Operations Research Transactions, 2022, 26(2): 16-30.
[10]	Wenjie LIU, Dongmei ZHANG, Peng ZHANG, Juan ZOU. The seeding algorithm for $\mu$-similar Bregman divergences $k$-means problem with penalties [J]. Operations Research Transactions, 2022, 26(1): 99-112.
[11]	XU Zi, ZHANG Huiling. Optimization algorithms and their complexity analysis for non-convex minimax problems [J]. Operations Research Transactions, 2021, 25(3): 74-86.
[12]	Hongwu LI, Min XIE, Rong ZHANG. A proximal gradient method for nonsmooth convex optimization problems [J]. Operations Research Transactions, 2021, 25(1): 61-72.
[13]	KONG Lingchen, CHEN Bingzhen, XIU Naihua, QI Houduo. High-dimensional constrained matrix regression problems [J]. Operations Research Transactions, 2017, 21(2): 31-38.
[14]	WAN Rui, XU Zi. On the linear convergence of the general alternating proximal gradient method for convex minimization [J]. Operations Research Transactions, 2014, 18(3): 1-12.
[15]	WEN Zaiwen YIN Wotao LIU Xin ZHANG Yin. Introduction to compressive sensing and sparse optimization [J]. Operations Research Transactions, 2012, 16(3): 49-64.