关于求解平方根损失函数回归问题的自适应邻近梯度-次梯度算法的收敛性分析

doi:10.15960/j.cnki.issn.1007-6093.2026.01.016

Abstract

Abstract: Square-root-loss regression problems have attracted great attention since the choice of its regularization parameter does not rely on the prior knowledge of the deviation of the noise. However, the square-root loss function has a non-differentiable point, which brings difficulties to numerical algorithms. In this paper, we improve proofs of local smoothness and locally restricted strong convexity of the square-root loss function, which is based on the work of Li et al. (2020). To overcome numerical difficulties caused by nonsmoothness of the loss function, we develop an adaptive proximal gradient-subgradient algorithm (APGSA). Under some assumptions, the global convergence of the proposed algorithm is guaranteed with high probability. In addition, we also prove that the algorithm can accurately identify the active manifold in finite iterations, and then the linear rate of local convergence with high probability is established. Finally, simulation experiments were conducted to verify both the effectiveness and the fast local linear convergence of the algorithm APGSA.

Key words: local smoothness, locally restricted strong convexity, proximal gradient algorithm, proximal subgradient algorithm, partial smoothness

CLC Number:

O221.2

YANG Jinji, SHEN Chungen, YU Zhensheng. Convergence analysis of an adaptive proximal gradient-subgradient algorithm for square-root-loss regression problems[J]. Operations Research Transactions, 2026, 30(1): 217-234.

References

[1] Hocking R R. The analysis and selection of variables in linear regression [J]. Biometrics, 1976, 32: 1-49.
[2] Kira K, Rendell L A. The feature selection problem: Traditonal methods and a new algorithm [C]//Proceedings of the AAAI Conference on Artificial Intelligence, 1992, 10: 129-134.
[3] Akaike H. Information theory and an extension of the maximum likelihood principle [M]//Selected Papers of Hirotugu Akaike, New York: Springer, 1998: 610-624.
[4] Schwarz G. Estimating the dimension of a model [J]. The Annals of Statistics, 1978, 6(2): 461-464.
[5] Tibshirani R. Regression shrinkage and selection via the lasso [J]. Journal of the Royal Statistical Society Series B-Statistical Methodology, 1996, 58(1): 267-288.
[6] Mohri M, Rostamizadeh A, Talwalkar A. Foundations of Machine Learning [M]. Cambridge: MIT Press, 2018.
[7] Donoho D L. Compressed sensing [J]. IEEE Transactions on Information Theory, 2006, 52(4): 1289-1306.
[8] Liu X, Shen C G, Wang L. A dual active-set proximal Newton algorithm for sparse approximation of correlation matrices [J]. Optimization Methods and Software, 2022, 37(5): 1820-1844.
[9] Shen C G, Xue W J, Zhang L-H, et al. An active-set proximal Newton algorithm for ‘1- regularized optimization problems with box constraints [J]. Journal of Scientific Computing, 2020, 85: 1-34.
[10] Beck A, Teboulle M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems [J]. SIAM Journal on Imaging Sciences, 2009, 2(1): 183-202.
[11] Becker S, Bobin J, Candes E J. Nesta: A fast and accurate first-order method for sparse recovery [J]. SIAM Journal on Imaging Sciences, 2011, 4(1): 1-39.
[12] Van Den Berg E, Friedlander M P. Probing the Pareto frontier for basis pursuit solutions [J]. SIAM Journal on Scientific Computing, 2009, 31(2): 890-912.
[13] Byrd R H, Chin G M, Nocedal J, et al. A family of second-order methods for convex ‘1- regularized optimization [J]. Mathematical Programming Series A, 2016, 159(1): 435-467.
[14] Efron B, Hastie T, Johnstone I, et al. Least angle regression [J]. The Annals of Statistics, 2004, 32(2): 407-499.
[15] Figueiredo M A T, Nowak R D, Wright S J. Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems [J]. IEEE Journal of Selected Topics in Signal Processing, 2007, 1(4): 586-597.
[16] Hale E T, Yin W, Zhang Y. Fixed-point continuation for ‘1-minimization: Methodology and convergence [J]. SIAM Journal on Optimization, 2008, 19(3): 1107-1130.
[17] Keskar N, Nocedal J, Oztoprak F, et al. A second-order method for convex ‘1-regularized optimization with active-set prediction [J]. Optimization Methods and Software, 2016, 31(3): 605-621.
[18] Li X D, Sun D F, Toh K-C. A highly efficient semismooth Newton augmented Lagrangian method for solving Lasso problems [J]. SIAM Journal on Optimization, 2018, 28(1): 433-458.
[19] Milzarek A, Ulbrich M. A semismooth Newton method with multidimensional filter globalization for ‘1-optimization [J]. SIAM Journal on Optimization, 2014, 24(1): 298-333.
[20] Wen Z W, Yin W T, Goldfarb D, et al. A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization, and continuation [J]. SIAM Journal on Scientific Computing, 2010, 32(4): 1832-1857.
[21] Wright S J, Nowak R D, Figueiredo M A T. Sparse reconstruction by separable approximation [J]. IEEE Transactions on Signal Processing, 2009, 57(7): 2479-2493.
[22] Xiao X T, Li Y F, Wen Z W, et al. A regularized semi-smooth Newton method with projection steps for composite convex programs [J]. Journal of Scientific Computing, 2018, 76(1): 364- 389.
[23] Yuan M, Lin Y. Model selection and estimation in regression with grouped variables [J]. Journal of the Royal Statistical Society Series B-Statistical Methodology, 2006, 68(1): 49-67.
[24] Li X D, Sun D F, Toh K-C. On efficiently solving the subproblems of a level-set method for fused Lasso problems [J]. SIAM Journal on Optimization, 2018, 28(2): 1842-1866.
[25] Zhang Y J, Zhang N, Sun D F, et al. An efficient Hessian based algorithm for solving large-scale sparse group Lasso problems [J]. Mathematical Programming, 2020, 179(1): 223-263.
[26] Zhang Y J, Zhang N, Sun D F, et al. A proximal point dual Newton algorithm for solving group graphical Lasso problems [J]. SIAM Journal on Optimization, 2020, 30(3): 2197-2220.
[27] Tibshirani R. Regression shrinkage and selection via the Lasso: A retrospective [J]. Journal of the Royal Statistical Society Series B-Statistical Methodology, 2011, 73(3): 273-282.
[28] Belloni A, Chernozhukov V, Wang L. Square-root Lasso: Pivotal recovery of sparse signals via conic programming [J]. Biometrika, 2011, 98(4): 791-806.
[29] Li X G, Zhao T, Yuan X M, et al. The flare package for high dimensional linear regression and precision matrix estimation in R? [J]. Journal of Machine Learning Research, 2015, 16: 553-557.
[30] Tang P P, Wang C J, Sun D F, et al. A sparse semismooth Newton based proximal majorization-minimization algorithm for nonconvex square-root-loss regression problems [J]. Journal of Machine Learning Research, 2020, 21(226): 1-38.
[31] Chu H T M, Toh K-C, Zhang Y J. On regularized square-root regression problems: Distributionally robust interpretation and fast computations [EB/OL]. [2022-10-12]. arXiv:2109.03632.
[32] Li X G, Jiang H M, Haupt J, et al. On fast convergence of proximal algorithms for SQRT-Lasso optimization: Don’t worry about its nonsmooth loss function [C]//Uncertainty in Artificial Intelligence, 2020: 49-59.
[33] Candes E, Tao T. The Dantzig selector: Statistical estimation when p is much larger than n [J]. The Annals of Statistics, 2007, 35(6): 2313-2351.
[34] Negahban S N, Ravikumar P, Wainwright M J, et al. A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers [J]. Statistical Science, 2012, 27(4): 538-557.
[35] Wainwright M J. High-dimensional Statistics: A Non-asymptotic Viewpoint [M]. Cambridge: Cambridge University Press, 2019.
[36] Beck A. First-order Methods in Optimization [M]. Philadelphia: SIAM, 2017.
[37] Bello Cruz J Y. On proximal subgradient splitting method for minimizing the sum of two nonsmooth convex functions [J]. Set-Valued and Variational Analysis, 2017, 25(2): 245-263.
[38] Rockafellar R T. Convex Analysis [M]. Princeton: Princeton University Press, 1970.
[39] 刘浩洋，户将，李勇锋，等，最优化：建模、算法与理论[M].北京：高等教育出版社，2020.
[40] Nocedal J, Wright S. Numerical Optimization [M]. New York: Springer, 2006.
[41] Liang J W, Fadili J, Peyré G. Activity identification and local linear convergence of forwardbackward-type methods [J]. SIAM Journal on Optimization, 2017, 27(1): 408-437.
[42] Vaiter S, Peyré G, Fadili J. Model consistency of partly smooth regularizers [J]. IEEE Transactions on Information Theory, 2017, 64(3): 1725-1737.
[43] Hare L W, Lewis A S. Identifying active constraints via partial smoothness and prox-regularity [J]. Journal of Convex Analysis, 2004, 11(2): 251-266.
[44] Wu Z M, Li C S, Li M, et al. Inertial proximal gradient methods with Bregman regularization for a class of nonconvex optimization problems [J]. Journal of Global Optimization, 2021, 79: 617-644.

Convergence analysis of an adaptive proximal gradient-subgradient algorithm for square-root-loss regression problems

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 1

Recommended Articles

Metrics

Comments