关于求解平方根损失函数回归问题的自适应邻近梯度-次梯度算法的收敛性分析

doi:10.15960/j.cnki.issn.1007-6093.2026.01.016

摘要/Abstract

摘要： 模型由于其对于正则化参数的选择不依赖于残差的方差估计这一特点,从而受到了广泛关注。但平方根损失函数存在不可微点, 这给SQRT-Lasso模型的算法设计带来了困难。本文在Li 等人(2020)工作的基础上改进了平方根损失函数的局部光滑性与局部限制强凸性的证明;为了克服损失函数因存在不可微点而导致的计算困难,设计了自适应邻近梯度-次梯度算法(APGSA);在一定的假设条件下, 证明了所提算法在高概率意义下的全局收敛性。此外,本文还证明了算法在有限步迭代后准确探测出积极流形,进而得到了高概率意义下的局部线性收敛速度。最后通过仿真实验验证了算法(APGSA) 的有效性和局部线性收敛速度。

关键词: 局部光滑性, 局部限制强凸性, 邻近梯度算法, 邻近次梯度算法, 部分光滑性

Abstract: Square-root-loss regression problems have attracted great attention since the choice of its regularization parameter does not rely on the prior knowledge of the deviation of the noise. However, the square-root loss function has a non-differentiable point, which brings difficulties to numerical algorithms. In this paper, we improve proofs of local smoothness and locally restricted strong convexity of the square-root loss function, which is based on the work of Li et al. (2020). To overcome numerical difficulties caused by nonsmoothness of the loss function, we develop an adaptive proximal gradient-subgradient algorithm (APGSA). Under some assumptions, the global convergence of the proposed algorithm is guaranteed with high probability. In addition, we also prove that the algorithm can accurately identify the active manifold in finite iterations, and then the linear rate of local convergence with high probability is established. Finally, simulation experiments were conducted to verify both the effectiveness and the fast local linear convergence of the algorithm APGSA.

Key words: local smoothness, locally restricted strong convexity, proximal gradient algorithm, proximal subgradient algorithm, partial smoothness

中图分类号:

O221.2

杨金佶, 沈春根, 宇振盛. 关于求解平方根损失函数回归问题的自适应邻近梯度-次梯度算法的收敛性分析[J]. 运筹学学报（中英文）, 2026, 30(1): 217-234.

YANG Jinji, SHEN Chungen, YU Zhensheng. Convergence analysis of an adaptive proximal gradient-subgradient algorithm for square-root-loss regression problems[J]. Operations Research Transactions, 2026, 30(1): 217-234.

参考文献

[1] Hocking R R. The analysis and selection of variables in linear regression [J]. Biometrics, 1976, 32: 1-49.
[2] Kira K, Rendell L A. The feature selection problem: Traditonal methods and a new algorithm [C]//Proceedings of the AAAI Conference on Artificial Intelligence, 1992, 10: 129-134.
[3] Akaike H. Information theory and an extension of the maximum likelihood principle [M]//Selected Papers of Hirotugu Akaike, New York: Springer, 1998: 610-624.
[4] Schwarz G. Estimating the dimension of a model [J]. The Annals of Statistics, 1978, 6(2): 461-464.
[5] Tibshirani R. Regression shrinkage and selection via the lasso [J]. Journal of the Royal Statistical Society Series B-Statistical Methodology, 1996, 58(1): 267-288.
[6] Mohri M, Rostamizadeh A, Talwalkar A. Foundations of Machine Learning [M]. Cambridge: MIT Press, 2018.
[7] Donoho D L. Compressed sensing [J]. IEEE Transactions on Information Theory, 2006, 52(4): 1289-1306.
[8] Liu X, Shen C G, Wang L. A dual active-set proximal Newton algorithm for sparse approximation of correlation matrices [J]. Optimization Methods and Software, 2022, 37(5): 1820-1844.
[9] Shen C G, Xue W J, Zhang L-H, et al. An active-set proximal Newton algorithm for ‘1- regularized optimization problems with box constraints [J]. Journal of Scientific Computing, 2020, 85: 1-34.
[10] Beck A, Teboulle M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems [J]. SIAM Journal on Imaging Sciences, 2009, 2(1): 183-202.
[11] Becker S, Bobin J, Candes E J. Nesta: A fast and accurate first-order method for sparse recovery [J]. SIAM Journal on Imaging Sciences, 2011, 4(1): 1-39.
[12] Van Den Berg E, Friedlander M P. Probing the Pareto frontier for basis pursuit solutions [J]. SIAM Journal on Scientific Computing, 2009, 31(2): 890-912.
[13] Byrd R H, Chin G M, Nocedal J, et al. A family of second-order methods for convex ‘1- regularized optimization [J]. Mathematical Programming Series A, 2016, 159(1): 435-467.
[14] Efron B, Hastie T, Johnstone I, et al. Least angle regression [J]. The Annals of Statistics, 2004, 32(2): 407-499.
[15] Figueiredo M A T, Nowak R D, Wright S J. Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems [J]. IEEE Journal of Selected Topics in Signal Processing, 2007, 1(4): 586-597.
[16] Hale E T, Yin W, Zhang Y. Fixed-point continuation for ‘1-minimization: Methodology and convergence [J]. SIAM Journal on Optimization, 2008, 19(3): 1107-1130.
[17] Keskar N, Nocedal J, Oztoprak F, et al. A second-order method for convex ‘1-regularized optimization with active-set prediction [J]. Optimization Methods and Software, 2016, 31(3): 605-621.
[18] Li X D, Sun D F, Toh K-C. A highly efficient semismooth Newton augmented Lagrangian method for solving Lasso problems [J]. SIAM Journal on Optimization, 2018, 28(1): 433-458.
[19] Milzarek A, Ulbrich M. A semismooth Newton method with multidimensional filter globalization for ‘1-optimization [J]. SIAM Journal on Optimization, 2014, 24(1): 298-333.
[20] Wen Z W, Yin W T, Goldfarb D, et al. A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization, and continuation [J]. SIAM Journal on Scientific Computing, 2010, 32(4): 1832-1857.
[21] Wright S J, Nowak R D, Figueiredo M A T. Sparse reconstruction by separable approximation [J]. IEEE Transactions on Signal Processing, 2009, 57(7): 2479-2493.
[22] Xiao X T, Li Y F, Wen Z W, et al. A regularized semi-smooth Newton method with projection steps for composite convex programs [J]. Journal of Scientific Computing, 2018, 76(1): 364- 389.
[23] Yuan M, Lin Y. Model selection and estimation in regression with grouped variables [J]. Journal of the Royal Statistical Society Series B-Statistical Methodology, 2006, 68(1): 49-67.
[24] Li X D, Sun D F, Toh K-C. On efficiently solving the subproblems of a level-set method for fused Lasso problems [J]. SIAM Journal on Optimization, 2018, 28(2): 1842-1866.
[25] Zhang Y J, Zhang N, Sun D F, et al. An efficient Hessian based algorithm for solving large-scale sparse group Lasso problems [J]. Mathematical Programming, 2020, 179(1): 223-263.
[26] Zhang Y J, Zhang N, Sun D F, et al. A proximal point dual Newton algorithm for solving group graphical Lasso problems [J]. SIAM Journal on Optimization, 2020, 30(3): 2197-2220.
[27] Tibshirani R. Regression shrinkage and selection via the Lasso: A retrospective [J]. Journal of the Royal Statistical Society Series B-Statistical Methodology, 2011, 73(3): 273-282.
[28] Belloni A, Chernozhukov V, Wang L. Square-root Lasso: Pivotal recovery of sparse signals via conic programming [J]. Biometrika, 2011, 98(4): 791-806.
[29] Li X G, Zhao T, Yuan X M, et al. The flare package for high dimensional linear regression and precision matrix estimation in R? [J]. Journal of Machine Learning Research, 2015, 16: 553-557.
[30] Tang P P, Wang C J, Sun D F, et al. A sparse semismooth Newton based proximal majorization-minimization algorithm for nonconvex square-root-loss regression problems [J]. Journal of Machine Learning Research, 2020, 21(226): 1-38.
[31] Chu H T M, Toh K-C, Zhang Y J. On regularized square-root regression problems: Distributionally robust interpretation and fast computations [EB/OL]. [2022-10-12]. arXiv:2109.03632.
[32] Li X G, Jiang H M, Haupt J, et al. On fast convergence of proximal algorithms for SQRT-Lasso optimization: Don’t worry about its nonsmooth loss function [C]//Uncertainty in Artificial Intelligence, 2020: 49-59.
[33] Candes E, Tao T. The Dantzig selector: Statistical estimation when p is much larger than n [J]. The Annals of Statistics, 2007, 35(6): 2313-2351.
[34] Negahban S N, Ravikumar P, Wainwright M J, et al. A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers [J]. Statistical Science, 2012, 27(4): 538-557.
[35] Wainwright M J. High-dimensional Statistics: A Non-asymptotic Viewpoint [M]. Cambridge: Cambridge University Press, 2019.
[36] Beck A. First-order Methods in Optimization [M]. Philadelphia: SIAM, 2017.
[37] Bello Cruz J Y. On proximal subgradient splitting method for minimizing the sum of two nonsmooth convex functions [J]. Set-Valued and Variational Analysis, 2017, 25(2): 245-263.
[38] Rockafellar R T. Convex Analysis [M]. Princeton: Princeton University Press, 1970.
[39] 刘浩洋，户将，李勇锋，等，最优化：建模、算法与理论[M].北京：高等教育出版社，2020.
[40] Nocedal J, Wright S. Numerical Optimization [M]. New York: Springer, 2006.
[41] Liang J W, Fadili J, Peyré G. Activity identification and local linear convergence of forwardbackward-type methods [J]. SIAM Journal on Optimization, 2017, 27(1): 408-437.
[42] Vaiter S, Peyré G, Fadili J. Model consistency of partly smooth regularizers [J]. IEEE Transactions on Information Theory, 2017, 64(3): 1725-1737.
[43] Hare L W, Lewis A S. Identifying active constraints via partial smoothness and prox-regularity [J]. Journal of Convex Analysis, 2004, 11(2): 251-266.
[44] Wu Z M, Li C S, Li M, et al. Inertial proximal gradient methods with Bregman regularization for a class of nonconvex optimization problems [J]. Journal of Global Optimization, 2021, 79: 617-644.