考虑一类随机非凸-非凹的极小极大问题, 假设目标函数关于里层变量$y$ 满足Polyak-Łojasiewicz (PL)条件, 我们也称这类问题为NC-PL 极小极大问题。本文提出了一种用于求解随机NC-PL极小极大问题的方差缩减的梯度下降上升(VRGDA)算法, 且证明了该算法解得$\varepsilon$-稳定点的迭代复杂度为$\mathcal{O}$($\varepsilon^{-3})$。这也是目前求解一般化随机NC-PL问题复杂度最好的一阶算法。
In this paper, we consider a class of stochastic nonconvex-nonconcave minimax problems, i.e., NC-PL minimax problems, for which we assume that the objective function satisfies the Polyak-Łojasiewicz (PL) condition with respect to the inner variable. We propose a variance reduced gradient descent ascent (VRGDA) algorithm for solving NC-PL minimax problems under the stochastic setting. The number of iterations to obtain an $\varepsilon$-stationary point of the VRGDA algorithm for solving NC-PL minimax problems is upper bounded by $\mathcal{O}(\varepsilon^{-3})$. The VRGDA algorithm owns the best iteration complexity in first-order algorithms for solving stochastic NC-PL problems.
[1] Chen P Y, Zhang H, Sharma Y, et al. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models [C]//Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, 2017: 15-26.
[2] Finlay C, Oberman A M. Scaleable input gradient regularization for adversarial robustness [J]. Machine Learning with Applications, 2021, 3: 100017.
[3] Snoek J, Larochelle H, Adams R P. Practical Bayesian optimization of machine learning algorithms [C]//Advances in Neural Information Processing Systems, 2012: 2951-2959.
[4] Luo L, Ye H, Huang Z, et al. Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems [C]//Advances in Neural Information Processing Systems, 2020: 20566-20577.
[5] Nouiehed M, Sanjabi M, Huang T, et al. Solving a class of non-convex min-max games using iterative first order methods [C]//Advances in Neural Information Processing Systems, 2019: 14934-14942.
[6] Lin T, Jin C, Jordan M I. Near-optimal algorithms for minimax optimization [C]//Conference on Learning Theory, 2020: 2738-2779.
[7] Daskalakis C, Panageas I. The limit points of (optimistic) gradient descent in min-max optimization [C]//Advances in Neural Information Processing Systems, 2018: 9236-9246.
[8] Lin T, Jin C, Jordan M. On gradient descent ascent for nonconvex-concave minimax problems [C]//International Conference on Machine Learning, 2020: 6083-6093.
[9] Lu S, Tsaknakis I, Hong M, et al. Hybrid block successive approximation for one-sided nonconvex min-max problems: Algorithms and applications [J]. IEEE Transactions on Signal Processing, 2020, 68: 3676-3691.
[10] Pan W, Shen J, Xu Z. An efficient algorithm for nonconvex-linear minimax optimization problem and its application in solving weighted maximin dispersion problem [J]. Computational Optimization and Applications, 2021, 78(1): 287-306.
[11] Shen J, Wang Z, Xu Z. Zeroth-order single-loop algorithms for nonconvex-linear minimax problems [J]. Journal of Global Optimization, 2022: 1-30.
[12] Xu Z, Shen J, Wang Z, et al. Zeroth-order alternating randomized gradient projection algorithms for general nonconvex-concave minimax problems [EB/OL]. [2022-12-01]. arXiv: 2108.00473.
[13] Xu Z, Zhang H, Xu Y, et al. A unified single-loop alternating gradient projection algorithm for nonconvex-concave and convex-nonconcave minimax problems [J]. Mathematical Programming, 2023, 201: 635-706.
[14] Zhang J, Xiao P, Sun R, et al. A single-loop smoothed gradient descent-ascent algorithm for nonconvex-concave min-max problems [C]//Advances in Neural Information Processing Systems, 2020: 7377-7389.
[15] Johnson R, Zhang T. Accelerating stochastic gradient descent using predictive variance reduction [C]//Advances in Neural Information Processing Systems, 2013: 315-323.
[16] Defazio A, Bach F, Lacoste-Julien S. SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives [C]//Advances in Neural Information Processing Systems, 2014: 1646-1654.
[17] Nguyen L M, Liu J, Scheinberg K, et al. SARAH: A novel method for machine learning problems using stochastic recursive gradient [C]//International Conference on Machine Learning, 2017: 2613-2621.
[18] Fang C, Li C J, Lin Z, et al. Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator [C]//Advances in Neural Information Processing Systems, 2018: 9236-9246.
[19] Wang Z, Ji K, Zhou Y, et al. Spiderboost and momentum: Faster variance reduction algorithms [C]//Advances in Neural Information Processing Systems, 2019: 2406-2416.
[20] Rafique H, Liu M, Lin Q, et al. Weakly-convex-concave min-max optimization: Provable algorithms and applications in machine learning [J]. Optimization Methods and Software, 2022, 37(3): 1087-1121.
[21] Wai H T, Hong M, Yang Z, et al. Variance reduced policy evaluation with smooth function approximation [C]//Advances in Neural Information Processing Systems, 2019: 5776-5787.
[22] Yang J, Orvieto A, Lucchi A, et al. Faster single-loop algorithms for minimax optimization without strong concavity [C]//International Conference on Artificial Intelligence and Statistics, 2022: 5485-5517.
[23] Xu Z, Wang Z Q, Wang J L, et al. Zeroth-order alternating gradient descent ascent algorithms for a class of nonconvex-nonconcave minimax problems [EB/OL]. [2022-11-01]. arXiv: 2211.13668.
[24] Polyak B T. Gradient methods for minimizing functionals [J]. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki, 1963, 3(4): 643-653.
[25] Karimi H, Nutini J, Schmidt M. Linear convergence of gradient and proximal-gradient methods under the polyak- lojasiewicz condition [C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2016: 795-811.
[26] Du S, Lee J, Li H, et al. Gradient descent finds global minima of deep neural networks [C]//International Conference on Machine Learning, 2019: 1675-1685.
[27] Fazel M, Ge R, Kakade S, et al. Global convergence of policy gradient methods for the linear quadratic regulator [C]//International Conference on Machine Learning, 2018: 1467-1476.
[28] Cai Q, Hong M, Chen Y, et al. On the global convergence of imitation learning: A case for linear quadratic regulator [EB/OL].[2022-12-01]. arXiv:1901.03674.
[29] Liu C, Zhu L, Belkin M. Loss landscapes and optimization in over-parameterized non-linear systems and neural networks [J]. Applied and Computational Harmonic Analysis, 2022, 59: 85-116.