A brief review on gradient method

Expand
  • School of Science, Beijing University of Posts and Telecommunications, Beijing 100876, China

Received date: 2021-03-16

  Online published: 2021-09-26

Abstract

Gradient method is a kind of first order optimization method. It is widely used for large scale problems, due to its simplicity and low complexity. This paper is a review on gradient method. Gradient methods for smooth unconstrained problems are introduced, with details of algorithm framework and theories. The crucial factor in gradient method is the stepsize, which determines the convergence property of the method. This paper reviews the stepsize update strategies and the corresponding convergence results from four aspects:line search, approximation technique, stochastic technique, alternating and constant stepsizes. Other related topics like gradient method for nonsmooth and constrained optimization problems, acceleration technique and stochastic gradient method are also mentioned.

Cite this article

SUN Cong, ZHANG Ya . A brief review on gradient method[J]. Operations Research Transactions, 2021 , 25(3) : 119 -132 . DOI: 10.15960/j.cnki.issn.1007-6093.2021.03.007

References

[1] 袁亚湘.非线性优化计算方法[M]. 北京:科学出版社, 2008.
[2] 袁亚湘等编著. 中国学科发展战略·数学优化[M]. 北京:科学出版社, 2020.
[3] Bottou L, Curtis F E, Nocedal J. Optimization methods for large-scale machine learning[J]. SIAM Review, 2018, 60(2):223-311.
[4] Cauchy A. Méthode générale pour la résolution des systems d'équations simultanées[J]. Comptes Rendus de I'Académie des Sciences, 1847, 25:536-538.
[5] Akaike H. On a successive transformation of probability distribution and its application to the analysis of the optimum gradient method[J]. Annals of the Institute of Statistical Mathematics, 1959, 11(1):1-16.
[6] Forsythe G E. On the asymptotic directions of the s-dimensional optimum gradient method[J]. Numerische Mathematik, 1968, 11:57-76.
[7] Nocedal J, Sartenaer A, Zhu C. On the behavior of the gradient norm in the steepest descent method[J]. Computational Optimization and Applications, 2002, 22(1):5-35.
[8] Yuan Y X. A short note on the Q-linear convergence of the steepest descent method[J]. Mathematical Programming, 2010, 123:339-343.
[9] Armijo L. Minimization of functions having Lipschitz continuous first partial derivatives[J]. Pacific Journal of Mathematics, 1966, 16(10):1-3.
[10] Goldstein A A. On steepest descent[J]. Journal of the Society for Industrial and Applied Mathematics Series A Control, 1965, 3(1):147-151.
[11] Powell M J D. Some global convergence properties of a variable metric algorithm for minimization without exact line searches[C]//SIAM-AMS, 1975.
[12] Vrahatis M N, Androulakis G S, Lambrinos J N, et al. A class of gradient unconstrained minimization algorithms with adaptive stepsize[J]. Journal of Computational and Applied Mathematics, 2000, 114(2):367-386.
[13] Barzilai J, Borwein J M. Two point step size gradient methods[J]. IMA Journal of Numerical Analysis, 1988, 8:141-148.
[14] Fletcher R. On the Barzilai-Borwein method[M]//Optimization and Control with Applications. Boston:Springer, 2005, 235-256.
[15] Dai Y H, Fletcher R. On the asymptotic behaviour of some new gradient methods[J]. Mathematical Programming, 2005, 103(3):541-559.
[16] Raydan M. On the Barzilai and Borwein choice of steplength for the gradient method[J]. IMA Journal of Numerical Analysis, 1993, 13:321-326.
[17] Dai Y H, Liao L Z. R-linear convergence of the Barzilai and Borwein gradient method[J]. IMA Journal of Numerical Analysis, 2002, 22:1-10.
[18] Yuan Y X. Step-sizes for the gradient method[J]. Ams Ip Studies in Advanced Mathematics, 2008, 42(2):785-796.
[19] Liu Y F, Dai Y H, Luo Z Q. Coordinated beamforming for MISO interference channel:Complexity analysis and efficient algorithms[J]. IEEE Transactions on Signal Processing, 2011, 59(3):1142-1157.
[20] Raydan M. The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem[J]. Society for Industrial and Applied Mathematics, 1997, 7(1):26-33.
[21] Frassoldati G, Zanni L, Zanghirati G. New adaptive stepsize selections in gradient methods[J]. Journal of Industrial and Management Optimization, 2008, 4(2):299-312.
[22] Raydan M, Svaiter B F. Relaxed steepest descent and Cauchy-Barzilai-Borwein method[J]. Computational Optimization and Applications, 2002, 21:155-167.
[23] Yuan Y X. A review on subspace methods for nonlinear optimization[C]//Proceedings of International Congress of Mathematicians (ICM), 2014, 807-827.
[24] Dai Y H, Yuan Y X. Alternate minimization gradient method[J]. IMA Journal of Numerical Analysis, 2003, 23:377-393.
[25] Dai Y H. Alternate step gradient method[J]. Optimization, 2003, 52(4-5):395-415.
[26] Huang Y K, Dai Y H, Liu X W, et al. On the acceleration of the Barzilai-Borwein method[J]. 2020, arXiv:2001.02335v1.
[27] Elman H, Golub G. Inexact and preconditioned Uzawa algorithms for saddle point problems[J]. IAM Journal of Numerical Analysis, 1994, 31:1645-1661.
[28] Dai Y H, Yuan J Y, Yuan Y X. Modified two-point stepsize gradient methods for unconstrained optimization[J]. Computational Optimization and Applications, 2002, 22:103-109.
[29] Xiao Y H, Wang Q Y, Wang D. Notes on the Dai-Yuan-Yuan modified spectral gradient method[J]. Journal of Computational and Applied Mathematics, 2010, 234(10):2986-2992.
[30] Biglari F, Solimanpur M. Scaling on the spectral gradient method[J]. Journal of Optimization Theory and Applications, 2013, 158(2):626-635.
[31] De Asmundis R, di Serafino D, Riccio F, et al. On spectral properties of steepest descent methods[J]. IAM Journal of Numerical Analysis, 2013, 33(4):1416-1435.
[32] Sun C, Liu J P. New stepsizes for the gradient method[J]. Optimization Letters, 2020, 14:1943-1955.
[33] Yuan Y X. A new stepsize for the steepest descent method[J]. Journal of Computational Mathematics, 2006, 24(2):149-156.
[34] Dai Y H, Yuan Y X. Analysis of monotone gradient methods[J]. Journal of Industrial and Management Optimization, 2005, 2:181-192.
[35] Dai Y H, Yang X Q. A new gradient method with an optimal stepsize property[J]. Computational Optimization and Applications, 2006, 33(1):73-88.
[36] Kalousek Z. Steepest descent method with random step lengths[J]. Foundations of Computational Mathematics, 2015, 17(2):359-422.
[37] De Asmundis R, di Serafino D, Hager W W, et al. An efficient gradient method using the Yuan steplength[J]. Computational Optimization and Applications, 2014, 59(3):541-563.
[38] Shor N Z. Application of the gradient method for solution of network transportation problems[C]//Notes Scientific Seminar on Theory and Application of Cybernetics and Operations Research, 1962.
[39] Clarke F H. Generalized gradients and applications[J]. Transactions of the American Mathematical Society, 1975, 205:247-262.
[40] Sun C, Wang Y F. Gravity-magnetic cross-gradient joint inversion by the cyclic gradient method[J]. Optimization Methods and Software, 2020, 35(5):1-20.
[41] Chen X J. Applications of smoothing methods in numerical analysis and optimization[C]//Focus on computational neurobiology, Nova Science Publishers, 2004.
[42] Dai Y H, Fletcher R. Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming[J]. Numerische Mathematik, 2005, 100(1):21-47.
[43] Dai Y H, Fletcher R. New algorithms for singly linearly constrained quadratic programs subject to lower and upper bounds[J]. Mathematical Programming, 2006, 106(3):403-421.
[44] Zhang H C, Hager W W. A nonmonotone line search technique and its application to unconstrained optimization[J]. SIAM Journal on Optimization, 2004, 14:1043-1056.
[45] Hager W W, Mair B A, Zhang H C. An affine-scaling interior-point CBB method for boxconstrained optimization[J]. Mathematical Programming, 2007, 119(1):1-32.
[46] Crisci S, Porta F, Ruggiero V, Zanni L. Spectral properties of Barzilai-Borwein rules in solving singly linearly constrained optimization problems subject to lower and upper bounds[J]. SIAM Journal on Optimization, 2020, 30(2):1300-1326.
[47] Birgin E G, Mart[nez J M, Raydan M. Nonmonotone spectral projected gradient methods on convex sets[J]. SIAM Journal on Optimization, 2000, 10(4):1196-1211.
[48] Nesterov Y. A method of solving a convex programming problem with convergence rate O(1/k2)[J]. Soviet Mathematics Doklady, 1983, 27:372-376.
[49] A Nemirovski, D B Yudin. Problem complexity and method efficiency in optimization[J]. Journal of the Operational Research Society, 1985, 27(2):264-265.
[50] Lin Z, Li H, Fang C. Accelerated Optimization for Machine Learning:First-Order Algorithms[M]. Singapore:Springer, 2020.
[51] Beck A, Teboulle M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems[J]. SIAM Journal on Imaging Sciences, 2009, 2(1):183-202.
[52] Nesterov Y. Gradient methods for minimizing composite functions[J]. Mathematical Programming, 2013, 140(1):125-161.
[53] Su W, Boyd S, Candes E J. A differential equation for modeling Nesterov's accelerated gradient method:Theory and Insights[J]. Advances in Neural Information Processing Systems, 2015, 3(1):2510-2518.
[54] Allen-Zhu Z, Orecchia L. Linear Coupling:An Ultimate Unification of Gradient and Mirror Descent[J]. 2014, arXiv:1407.1537.
[55] Wibisono A, Wilson A C, Jordan M I. A variational perspective on accelerated methods in optimization[J]. Proceedings of the National Academy of Sciences, 2016, 113(47):E7351-E7358.
[56] Lan G. First-order and Stochastic Optimization Methods for Machine Learning[M]. Gewerbestrasse:Springer Series in the Data Sciences, 2020.
[57] Robbins H, Monro S. A stochastic approximation method[J]. Annals of Mathematical Statistics, 1951, 22(3):400-407.
[58] Zhang T. Solving large scale linear prediction problems using stochastic gradient descent algorithms[C]//Machine Learning, Proceedings of the Twenty-first International Conference (ICML), 2004.
[59] Johnson R, Zhang T. Accelerating stochastic gradient descent using predictive variance reduction[C]//News in Physiological Sciences, 2013, 1(3):315-323.
[60] Allen-Zhu Z. Katyusha:the first direct acceleration of stochastic gradient methods[C]//Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, 2017, 1200-1205.
[61] Kidambi R, Netrapalli P, Jain P, et al. On the insufficiency of existing momentum schemes for stochastic optimization[C]//International Conference on Learning Representations, 2018.
[62] Ramillien G, Frappart F, Gratton S, et al. Sequential estimation of surface water mass changes from daily satellite gravimetry data[J]. Journal of Geodesy, 2015, (89):259-282.
[63] Liu X, Sun C, Jorswieck A E. Two-user SINR region for reconfigurable intelligent surface aided downlink channel[C]//Proceedings of IEEE International Conference on Communications (ICC) Workshop, 2021.
Outlines

/