群体追逃微分博弈

doi:10.15960/j.cnki.issn.1007-6093.2024.03.003

摘要/Abstract

摘要：

本文以微分博弈和经典的追逃问题为主线, 对群体追逃微分博弈的历史发展脉络进行梳理。针对大规模群体追逃问题, 从平均场博弈视角出发, 阐释了强化学习技术的应用前景。提出探索解决逆向追逃微分博弈的观点, 可适用于水下无人舰艇、陆地机器人以及空中无人机集群等同类场景。区别于其他综述性文章, 作者对于俄罗斯以及苏联在本领域发展历史中代表性的学术流派给予了较多关注。

关键词: 追逃微分博弈, 群体智能博弈, 平均场博弈, 逆向博弈, 强化学习

Abstract:

With differential games and classical pursuit-evasion problems as the main focus, this article aims to trace the historical development of group pursuit-evasion differential games. By addressing large-scale group pursuit-evasion issues from the point of mean-field games, the prospects of applying reinforcement learning techniques are elucidated. It proposes exploring solutions to inverse pursuit-evasion differential games, suitable for scenarios such as underwater autonomous vessels, terrestrial robots, and swarms of unmanned aerial vehicles. Diverging from other review papers, it devotes significant attention to the distinctive academic schools of thought in Russia and the former Soviet Union, highlighting their influence in the evolution of this field.

Key words: pursuit-evasion differential games, swarm intelligence games, mean-field games, inverse game theory, reinforcement learning

中图分类号:

O225

高红伟, 孟斌斌, 刘剑, 戴照鹏. 群体追逃微分博弈[J]. 运筹学学报（中英文）, 2024, 28(3): 46-62.

Hongwei GAO, Binbin MENG, Jian LIU, Zhaopeng DAI. Group pursuit-evasion differential games[J]. Operations Research Transactions, 2024, 28(3): 46-62.

参考文献 144

1	Isaacs R. Games of pursuit[R]. Santa Monica: RAND Corporation, 1951: 257.
2	高红伟, [俄] 彼得罗相. 动态合作博弈[M]. 北京: 科学出版社, 2009.
3	Isaacs R . Differential Games[M]. New York: John Wiley and Sons, 1965.
4	Bellman R . Dynamic Programming[M]. Princeton: Princeton University Press, 1957.
5	Pontryagin L S , Boltyanskii V G , Gamkrelidze R V , et al. The Mathematical Theory of Optimal Processes[M]. New York: Interscience Publishers, 1962.
6	Bernhard P . Singular surfaces in differential games an introduction[J]. Lecture Notes in Control and Information Sciences, 1977, 3, 1- 33.
7	Melikyan A. Generalized Characteristics of First Order PDEs: Applications in Optimal Control and Differential Games[M]. Springer Science & Business Media, 1998.
8	Lewin J . Differential Games[M]. London: Springer, 1994.
9	Breakwell J V, Merz A W. Toward a complete solution of the homicidal chauffeur game[C]//Proceedings of the First International Conference on the Theory and Applications of Differential Games, 1969: Ⅲ-1-Ⅲ-5.
10	Merz A W. The homicidal chauffeur-a differential game[D]. Stanford: Stanford University, 1971.
11	Merz A W . The homicidal chauffeur[J]. AIAA Journal, 1974, 12 (3): 259- 260. doi: 10.2514/3.49215
12	Patsko V S , Turova V L . Antony merz and his works[J]. Dynamic Games and Applications, 2020, 10 (1): 157- 182. doi: 10.1007/s13235-019-00318-y
13	Petrosyan L A , Yeung D W K , Parilina E M . Mathematical game theory at St. Petersburg State University[J]. International Game Theory Review, 2024, 26 (1): 2350019. doi: 10.1142/S0219198923500196
14	中国科学院数学与系统科学研究院编. 吴文俊全集教材卷Ⅰ-博弈论讲义[M]. 北京: 科学出版社, 2023.
15	Petrosyan L A . Stable solutions of differential games with many participants[J]. Viestnik of Leningrad University, 1977a, 19, 46- 52.
16	Pontryagin L S . Linear differential games, 1[J]. Doklady Akademii Nauk SSSR, 1967a, 174 (6): 1278- 1280.
17	Pontryagin L S . Linear differential games, 2[J]. Doklady Akademii Nauk SSSR, 1967b, 175 (4): 764- 766.
18	Pontryagin L S , Mishchenko E F . The problem of evasion in linear differential games[J]. Differentsial'nye Uravneniya, 1971, 7 (3): 436- 445.
19	Pontryagin L S . A linear differential evasion game[J]. Trudy Mat Inst Akademii Nauk SSSR, 1971, 112, 27- 60.
20	Krasovskii N N . Game Problems on the Encounter of Motions[M]. Moscow: Nauka, 1970.
21	Krasovskii N N , Subbotin A I . Positional Differential Games[M]. Moscow: Nauka, 1974.
22	Krasovskii N N , Subbotin A I , Kotz S . Game-Theoretical Control Problems[M]. New York: Springer-Verlag, 1988.
23	Subbotin A I , Patsko V S (eds) . Algorithms and programs for solving linear differential games[J]. Institute of Mathematics and Mechanics, Ural Scientific Center, Academy of Sciences of USSR, 1984, 127- 158.
24	Taras'yev A M , Ushakov V N , Khripunov A P . On a computational algorithm for solving game control problems[J]. Journal of Applied Mathematics and Mechanics, 1987, 51 (2): 167- 172. doi: 10.1016/0021-8928(87)90059-1
25	Ushakov V N . Construction of solutions in differential games of pursuit-evasion[J]. Lecture Notes in Nonlinear Analysis, 1998, 2, 269- 281.
26	Petrosyan L A . A family of differential survival games in the space $\mathbb{R}.n$[J]. Doklady Akademii Nauk SSSR, 1965, 161 (1): 52- 54.
27	Petrosyan L A . Differential Pursuit Games[M]. Leningrad: Leningrad State University, 1977b.
28	Petrosyan L A . Differential Games of Pursuit[M]. London: World Scientific, 1993.
29	Petrosyan L A . "Life-line" pursuit games with several players[J]. Izvestija Akademii Nauk Armjansko$\breve i$ SSR Serija Matematika, 1966, 1 (5): 331- 340.
30	Petrosyan L A , Shiryaev V D . Group pursuit game with several evaders by one pursuer[J]. Vestnik LGU, 1980, 13 (3): 50- 57.
31	Reeds J , Shepp L . Optimal paths for a car that goes both forwards and backwards[J]. Pacific Journal of Mathematics, 1990, 145 (2): 367- 393. doi: 10.2140/pjm.1990.145.367
32	Patsko V S , Fedotov A A . Analytic description of a reachable set for the Dubins car[J]. Trudy Inst Mat Mekh UrO RAN, 2020, 26 (1): 182- 197. doi: 10.21538/0134-4889-2020-26-1-182-197
33	Buzikov M , Galyaev A . The game of two identical cars: An analytical description of the barrier[J]. Journal of Optimization Theory and Applications, 2023, 198 (3): 988- 1018. doi: 10.1007/s10957-023-02278-1
34	Merz A W . The game of two identical cars[J]. Journal of Optimization Theory and Applications, 1972, 9, 324- 343. doi: 10.1007/BF00932932
35	Chernousko F L , Melikyan A A . Game Problems of Control and Search[M]. Moscow: Nauka, 1978.
36	Kurzhanskii A B . Control and Observation under Uncertainty Conditions[M]. Moscow: Nauka, 1977.
37	Kurzhanskii A B . The problem of measurement feedback control[J]. Journal of Applied Mathematics and Mechanics, 2004, 68 (4): 487- 501. doi: 10.1016/j.jappmathmech.2004.07.002
38	Osipov Y S . Control packages: An approach to solution of positional control problems with incomplete information[J]. Russian Mathematical Surveys, 2006, 61 (4): 611- 661. doi: 10.1070/RM2006v061n04ABEH004342
39	Kryazhimskiy A V , Osipov Y S . Idealized program packages and problems of positional control with incomplete information[J]. Proceedings of the Steklov Institute of Mathematics, 2010, 268 (1): 155- 174.
40	Ushakov V N , Ukhobotov V I , Lipin A E . An addition to the definition of a stable bridge and an approximating system of sets in differential games[J]. Proceedings of the Steklov Institute of Mathematics, 2019, 304, 268- 280. doi: 10.1134/S0081543819010206
41	Chernousko F L, Melikyan A A. Some differential games with incomplete information[M]//Optimization Techniques IFIP Technical Conference, Berlin: Springer, 1975: 445-450.
42	Bernhard P , Pourtallier O . Pursuit evasion game with costly information[J]. Dynamics and Control, 1994, 4 (4): 365- 382. doi: 10.1007/BF01974141
43	Neveu D, Pignon J, Raimondo A, et al. Pursuit games with costly information: Application to the ASW helicopter versus submarine game [M]//New Trends in Dynamic Games and Applications, Boston: Birkhäuser, 1995: 247-257.
44	Olsder G J, Pourtallier O. Optimal selection of observation times in a costly information game [M]//New Trends in Dynamic Games and Applications, Boston: Birkhäuser, 1995: 227-246.
45	Miele A , Wang T , Melvin W W . Optimal take-off trajectories in the presence of windshear[J]. Journal of Optimization Theory and Applications, 1986, 49 (1): 1- 45. doi: 10.1007/BF00939246
46	Miele A , Wang T , Tzeng C Y , et al. Optimal abort landing trajectories in the presence of windshear[J]. Journal of Optimization Theory and Applications, 1987, 55 (2): 165- 202. doi: 10.1007/BF00939080
47	Miele A , Wang T , Wang H , et al. Optimal penetration landing trajectories in the presence of windshear[J]. Journal of Optimization Theory and Applications, 1988, 57 (1): 1- 40. doi: 10.1007/BF00939327
48	Leitmann G , Pandey S . Aircraft control for flight in an uncertain environment: Takeoff in windshear[J]. Journal of Optimization Theory and Applications, 1991, 70 (1): 25- 55. doi: 10.1007/BF00940503
49	Bulirsch R , Montrone F , Pesch H J . Abort landing in the presence of windshear as a minimax optimal control problem, part 1: Necessary conditions[J]. Journal of Optimization Theory and Applications, 1991a, 70 (1): 1- 23. doi: 10.1007/BF00940502
50	Bulirsch R , Montrone F , Pesch H J . Abort landing in the presence of windshear as a minimax optimal control problem, part 2: Multiple shooting and homotopy[J]. Journal of Optimization Theory and Applications, 1991b, 70 (2): 223- 254. doi: 10.1007/BF00940625
51	Botkin N D , Kein V M , Patsko V S . The model problem of controlling the lateral motion of an aircraft during landing[J]. Journal of Applied Mathematics and Mechanics, 1984, 48 (4): 395- 400. doi: 10.1016/0021-8928(84)90004-2
52	Patsko V S , Botkin N D , Kein V M , et al. Control of an aircraft landing in windshear[J]. Journal of Optimization Theory and Applications, 1994, 83 (2): 237- 267. doi: 10.1007/BF02190056
53	Sun W , Tsiotras P , Yezzi A J . Multiplayer pursuit-evasion games in three-dimensional flow fields[J]. Dynamic Games and Applications, 2019, 9 (4): 1188- 1207. doi: 10.1007/s13235-019-00304-4
54	Botkin N D , Martynov K , Turova V L , et al. Generation of dangerous disturbances for flight systems[J]. Dynamic Games and Applications, 2019, 9 (3): 628- 651. doi: 10.1007/s13235-018-0259-5
55	Shaferman V , Shima T . Cooperative multiple-model adaptive guidance for an aircraft defending missile[J]. Journal of Guidance, Control, and Dynamics, 2010, 33 (6): 1801- 1813. doi: 10.2514/1.49515
56	Shima T . Optimal cooperative pursuit and evasion strategies against a homing missile[J]. Journal of Guidance, Control, and Dynamics, 2011, 34 (2): 414- 425. doi: 10.2514/1.51765
57	Pachter M, Garcia E, Casbeer D W. Active target defense differential game[C]//201452nd Annual Allerton Conference on Communication, Control, and Computing, IEEE, 2014: 46-53.
58	Garcia E , Casbeer D W , Pachter M . Pursuit in the presence of a defender[J]. Dynamic Games and Applications, 2019, 9 (3): 652- 670. doi: 10.1007/s13235-018-0271-9
59	Rubinovich E . Missile-target-defender problem with incomplete a priori information[J]. Dynamic Games and Applications, 2019, 9 (3): 851- 857. doi: 10.1007/s13235-019-00297-0
60	Pachter M , Garcia E , Casbeer D W . Toward a solution of the active target defense differential game[J]. Dynamic Games and Applications, 2019, 9 (1): 165- 216. doi: 10.1007/s13235-018-0250-1
61	Abramyants T G , Maslov E P , Rubinovich E Y . A simplest differential game of alternate pursuit[J]. Automation and Remote Control, 1980, 41 (8): 1043- 1052.
62	Abramyants T G , Maslov E P , Yakhno V P . Evasion from detection in the three-dimensional space[J]. Journal of Computer and Systems Sciences International, 2007, 46 (5): 675- 680. doi: 10.1134/S1064230707050012
63	Shevchenko I . Successive pursuit with a bounded detection domain[J]. Journal of Optimization Theory and Applications, 1997, 95 (1): 25- 48. doi: 10.1023/A:1022679210961
64	Kim D P . Methods of Search and Pursuit of Mobile Objects[M]. Moscow: Nauka, 1993.
65	Petrosyan L A , Garnaev A Y . Search Games[M]. Saint Petersburg: Saint Petersburg University Press, 1992.
66	Crandall M G , Evans L C , Lions P L . Some properties of viscosity solutions of Hamilton-Jacobi equations[J]. Transactions of the American Mathematical Society, 1984, 282 (2): 487- 502. doi: 10.1090/S0002-9947-1984-0732102-X
67	Crandall M G , Lions P L . Viscosity solutions of Hamilton-Jacobi equations[J]. Transactions of the American Mathematical Society, 1983, 277 (1): 1- 42. doi: 10.1090/S0002-9947-1983-0690039-8
68	Lions P L . Generalized Solutions of Hamilton-Jacobi Equations[M]. London: Pitman, 1982.
69	Subbotin A I . Generalized Solutions of First-Order PDEs: The Dynamical Optimization Perspective[M]. Boston: Birkhäuser, 1995.
70	Botkin N D , Hoffmann K H , Turova V L . Stable numerical schemes for solving Hamilton-Jacobi-Bellman-Isaacs equations[J]. SIAM Journal on Scientific Computing, 2011, 33 (2): 992- 1007. doi: 10.1137/100801068
71	Chen M, Fisac J F, Sastry S, et al. Safe sequential path planning of multi-vehicle systems via double-obstacle Hamilton-Jacobi-Isaacs variational inequality[C]//Proceedings of the 14th European Control Conference, IEEE, 2015: 3304-3309.
72	Falcone M . Numerical methods for differential games based on partial differential equations[J]. International Game Theory Review, 2006, 8 (2): 231- 272. doi: 10.1142/S0219198906000886
73	Barron E N . Reach-avoid differential games with targets and obstacles depending on controls[J]. Dynamic Games and Applications, 2018, 8 (4): 696- 712. doi: 10.1007/s13235-017-0235-5
74	Hagedorn P , Breakwell J V . A differential game with two pursuers and one evader[J]. Journal of Optimization Theory and Applications, 1976, 18 (1): 15- 29. doi: 10.1007/BF00933791
75	Breakwell J V , Hagedorn P . Point capture of two evaders in succession[J]. Journal of Optimization Theory and Applications, 1979, 27 (1): 89- 97. doi: 10.1007/BF00933327
76	Pshenichnyi B N . Simple pursuit by several objects[J]. Cybernetics, 1976, 12 (3): 484- 485. doi: 10.1007/BF01070036
77	Chernousko F L . A problem of evasion from many pursuers[J]. Journal of Applied Mathematics and Mechanics, 1976, 40 (1): 11- 20. doi: 10.1016/0021-8928(76)90105-2
78	Kumkov S S , Le Ménec S , Patsko VS . Zero-sum pursuit-evasion differential games with many objects: Survey of publications[J]. Dynamic Games and Applications, 2017, 7 (4): 609- 633. doi: 10.1007/s13235-016-0209-z
79	Katz I N , Mukai H , Schüttler H , et al. Solution of a differential game formulation of military air operations by the method of characteristics[J]. Journal of Optimization Theory and Applications, 2005, 125, 113- 135. doi: 10.1007/s10957-004-1713-7
80	Rusnak I . The lady, the bandits and the body guards——a two team dynamic game[J]. IFAC Proceedings Volumes, 2005, 38 (1): 441- 446.
81	Zhou Z J , Xu H . Decentralized optimal large scale multi-player pursuit-evasion strategies: A mean field game approach with reinforcement learning[J]. Neurocomputing, 2022, 484, 46- 58. doi: 10.1016/j.neucom.2021.01.141
82	Petrov N N , Solov'eva N A . Capture of given number of evaders in pontryagin's nonstationary example[J]. Dynamic Games and Applications, 2019, 9 (3): 614- 627. doi: 10.1007/s13235-019-00303-5
83	Petrov N N . The problem of simple group pursuit with phase constraints in time scales[J]. Vestnik Udmurtskogo Universiteta Matematika Mekhanika Komp Yuternye Nauki, 2020, 30 (2): 249- 258. doi: 10.35634/vm200208
84	Petrov N N , Machtakova A I . Capture of two coordinated evaders in a problem with fractional derivatives, phase restrictions and a simple matrix[J]. Izvestiya Instituta Matematikii Informatiki Udmurtskogo Gosudarstvennogo Universiteta, 2020, 56, 50- 62. doi: 10.35634/2226-3594-2020-56-05
85	Petrov N N , Shuravina I N . On the "soft" capture in one group pursuit problem[J]. Journal of Computer and Systems Sciences International, 2009, 48 (4): 521- 526. doi: 10.1134/S1064230709040042
86	Bopardikar S D , Bullo F , Hespanha J P . A cooperative homicidal chauffeur game[J]. Automatica, 2009, 45 (7): 1771- 1777. doi: 10.1016/j.automatica.2009.03.014
87	Ibragimov G , Ferrara M , Kuchkarov A , et al. Simple motion evasion differential game of many pursuers and evaders with integral constraints[J]. Dynamic Games and Applications, 2018, 8 (2): 352- 378. doi: 10.1007/s13235-017-0226-6
88	Ge J, Tang L, Reimann J, et al. Suboptimal approaches to multiplayer pursuit-evasion differential games[C]//AIAA Guidance, Navigation, and Control Conference and Exhibit, 2006: 6786.
89	Sun W , Tsiotras P . Sequential pursuit of multiple targets under external disturbances via Zermelo-Voronoi diagrams[J]. Automatica, 2017, 81, 253- 260. doi: 10.1016/j.automatica.2017.03.015
90	Makkapati V R , Tsiotras P . Optimal evading strategies and task allocation in multi-player pursuit-evasion problems[J]. Dynamic Games and Applications, 2019, 9 (4): 1168- 1187. doi: 10.1007/s13235-019-00319-x
91	Kurzhanskii A B . On a team control problem under obstacles[J]. Proceedings of the Steklov Institute of Mathematics, 2015, 291 (1): 128- 142.
92	Kurzhanskii A B . Problem of collision avoidance for a team motion with obstacles[J]. Proceedings of the Steklov Institute of Mathematics, 2016, 293 (1): 120- 136.
93	Grigorenko N L . Simple pursuit-evasion game with a group of pursuers and one evader[J]. Vestnik Moskovskogo Universiteta. Seriya XV. Vychislitelrime naya Matematika I Kibernetika, 1983, (1): 41- 47.
94	Blagodatskikh A I . Simultaneous multiple capture in a simple pursuit problem[J]. Journal of Applied Mathematics and Mechanics, 2009, 73 (1): 36- 40.
95	Blagodatskikh A I . Simultaneous multiple capture in a conflict-controlled process[J]. Journal of Applied Mathematics and Mechanics, 2013, 77 (3): 314- 320.
96	Blagodatskikh A I . Multiple capture of rigidly coordinated evaders[J]. Vestnik Udmurtskogo Universiteta. Matematika. Mekhanika. Komp'yuternye Nauki, 2016, 26 (1): 46- 57.
97	Bakolas E , Tsiotras P . The Zermelo-Voronoi diagram: A dynamic partition problem[J]. Automatica, 2010, 46 (12): 2059- 2067.
98	Bakolas E , Tsiotras P . Relay pursuit of a maneuvering target using dynamic Voronoi diagrams[J]. Automatica, 2012, 48 (9): 2213- 2220.
99	Von Moll A, Casbeer D W, Garcia E, et al. Pursuit-evasion of an evader by multiple pursuers[C]//2018 International Conference on Unmanned Aircraft Systems, IEEE, 2018: 133-142.
100	Von Moll A , Casbeer D , Garcia E , et al. The multi-pursuer single-evader game: A geometric approach[J]. Journal of Intelligent & Robotic Systems, 2019, 96, 193- 207.
101	Awheda M D , Schwartz H M . A decentralized fuzzy learning algorithm for pursuit-evasion differential games with superior evaders[J]. Journal of Intelligent & Robotic Systems, 2016, 83, 35- 53.
102	Al-Talabi A A. Multi-player pursuit-evasion differential game with equal speed[C]//2017 International Automatic Control Conference, IEEE, 2017: 1-6.
103	Mitchell I M. Application of level set methods to control and reachability problems in continuous and hybrid systems [D]. Stanford: Stanford University, 2002.
104	Raivio T, Ehtamo H. On the numerical solution of a class of pursuit-evasion games [M]//Advances in Dynamic Games and Applications, Boston: Birkhäuser, 2000: 177-192.
105	Meyer A, Breitner M H, Kriesell M. A pictured memorandum on synthesis phenomena occurring in the homicidal chauffeur game[C]//Proceedings of the 5th International ISDG Workshop, Segovia, 2005: 17-32.
106	Mikhalev D K , Ushakov V N . Two algorithms for approximate construction of the set of positional absorption in the game problem of pursuit[J]. Automation and Remote Control, 2007, 68 (11): 2056- 2070.
107	Botkin N D, Hoffmann K H, Mayer N, et al. Computation of value functions in nonlinear differential games with state constraints[C]//Proceedings of the 25th IFIP TC7 Conference on System Modeling and Optimization, 2013: 235-244.
108	Li D, Cruz J B, Chen G, et al. A hierarchical approach to multi-player pursuit-evasion differential games[C]//Proceedings of the 44th IEEE Conference on Decision and Control, 2005: 5674-5679.
109	Jin S, Qu Z. A heuristic task scheduling for multi-pursuer multi-evader games[C]//IEEE International Conference on Information and Automation, 2011: 528-533.
110	Margellos K , Lygeros J . Hamilton-Jacobi formulation for reach-avoid differential games[J]. IEEE Transactions on Automatic Control, 2011, 56 (8): 1849- 1861.
111	Chen M , Zhou Z , Tomlin C J . Multiplayer reach-avoid games via pairwise outcomes[J]. IEEE Transactions on Automatic Control, 2016, 62 (3): 1451- 1457.
112	Zhou Z , Ding J , Huang H , et al. Efficient path planning algorithms in reach-avoid problems[J]. Automatica, 2018, 89, 28- 36.
113	Fisac J F, Sastry S S. The pursuit-evasion-defense differential game in dynamic constrained environments[C]//Proceedings of the 54th IEEE Conference on Decision and Control, IEEE, 2015: 4549-4556.
114	Castelvecchi D . DeepMind's AI helps untangle the mathematics of knots[J]. Nature, 2021, 600 (7888): 202- 202.
115	De Souza C , Newbury R , Cosgun A , et al. Decentralized multi-agent pursuit using deep reinforcement learning[J]. IEEE Robotics and Automation Letters, 2021, 6 (3): 4552- 4559.
116	Yang H, Ge P, Cao J, et al. Large scale pursuit-evasion under collision avoidance using deep reinforcement learning[C]//2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, 2023: 2232-2239.
117	Fujimoto S, Hoof H, Meger D. Addressing function approximation error in actor-critic methods[C]//International Conference on Machine Learning, 2018: 1587-1596.
118	Wang Y , Dong L , Sun C . Cooperative control for multi-player pursuit-evasion games with reinforcement learning[J]. Neurocomputing, 2020, 412, 101- 114.
119	Singh G, Lofaro D M, Sofge D. Pursuit-evasion with decentralized robotic swarm in continuous state space and action space via deep reinforcement learning[C]// Proceedings of the 12th International Conference on Agents and Artificial Intelligence, 2020, 1: 226-233.
120	Lowe R, Wu Y I, Tamar A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments [M]//Advances in Neural Information Processing Systems, 2017: 6382-6393.
121	Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[EB/OL]. (2019-06-05)[2024-03-28]. https://arxiv.org/abs/1509.02971.
122	Sutton R S , Barto A G . Introduction to Reinforcement Learning[M]. Cambridge: MIT press, 1998.
123	Selvakumar J , Bakolas E . Min-max Q-learning for multi-player pursuit-evasion games[J]. Neurocomputing, 2022, 475, 1- 14.
124	Hu P , Pan Q , Zhao C , et al. Transfer reinforcement learning for multi-agent pursuit-evasion differential game with obstacles in a continuous environment[J]. Asian Journal of Control, 2024, 1- 16.
125	Huang X . StarCraft adversary-agent challenge for pursuit-evasion game[J]. Journal of the Franklin Institute, 2023, 360 (15): 10893- 10916.
126	Lanctot M, Lockhart E, Lespiau JB, et al. OpenSpiel: A framework for reinforcement learning in games[EB/OL]. (2020-09-26)[2024-03-28]. https://arxiv.org/abs/1908.09453v3.
127	Katsev M , Yershova A , Tovar B , et al. Mapping and pursuit-evasion strategies for a simple wall-following robot[J]. IEEE Transactions on Robotics, 2011, 27 (1): 113- 128.
128	Busoniu L , Babuška R , De Schutter B . Multi-agent reinforcement learning: An overview[J]. Innovations in Multi-Agent Systems and Applications-1, 2010, 183- 221.
129	Panait L , Luke S . Cooperative multi-agent learning: The state of the art[J]. Autonomous Agents and Multi-Agent Systems, 2005, 11, 387- 434.
130	Gupta J K, Egorov M, Kochenderfer M. Cooperative multi-agent control using deep reinforcement learning[C]//Autonomous Agents and Multiagent Systems: AAMAS 2017 Workshops, 2017: 66-83.
131	Lyu Y , Ren X , Na J . Adaptive optimal tracking controls of unknown multi-input systems based on nonzero-sum game theory[J]. Journal of the Franklin Institute, 2019, 356 (15): 8255- 8277.
132	Guéant O, Lasry J M, Lions P L. Mean field games and applications [M]//Paris-Princeton Lectures on Mathematical Finance 2010, Berlin: Springer, 2011: 205-266.
133	Lasry J M , Lions P L . Mean field games[J]. Japanese Journal of Mathematics, 2007, 2 (1): 229- 260.
134	Zhou Z, Xu H. Mean field game and decentralized intelligent adaptive pursuit evasion strategy for massive multi-agent system under uncertain environment[C]//2020 American Control Conference, 2020: 5382-5387.
135	Han J , Jentzen A , Weinan E . Solving high-dimensional partial differential equations using deep learning[J]. Proceedings of the National Academy of Sciences, 2018, 115 (34): 8505- 8510.
136	Ren L , Jin Y X , Niu Z J , et al. Optimal strategies for large-scale pursuers against one evader: A mean field game-based hierarchical control approach[J]. Systems & Control Letters, 2024, 183, 105697.
137	Wang G, Yao W, Zhang X, et al. Coupled alternating neural networks for solving multi-population high-dimensional mean-field games with stochasticity[EB/OL]. (2022-01-28)[2024-03-28]. https://www.techrxiv.org/doi/full/10.36227/techrxiv.19009463.v1.
138	Uz Zaman M A , Miehling E , Basar T . Reinforcement learning for non-stationary discrete-time linear-quadratic mean-field games in multiple populations[J]. Dynamic Games and Applications, 2023, 13 (1): 118- 164.
139	Kamimura A , Ohira T . Group Chase and Escape: Fusion of Pursuits-Escapes and Collective Motions[M]. Berlin: Springer, 2019.
140	Stocco GF, Cybenko G. Inverse game theory: Learning the nature of a game through play[C]//Carapezza EM, editor, Sensors, and Command, Control, Communications, and Intelligence Technologies for Homeland Security and Homeland Defense XI, 2012: 835905.
141	Russell S. Learning agents for uncertain environments[C]//Proceedings of the Eleventh Annual Conference on Computational Learning Theory, 1998: 101-103.
142	Ng A Y, Russell S. Algorithms for inverse reinforcement learning[C]//Proceedings of the 17th International Conference on Machine Learning, 2000: 663-670.
143	Liu Y , Alsaleh R , Sayed T . Modelling motorized and non-motorized vehicle conflicts using multiagent inverse reinforcement learning approach[J]. Transportmetrica B: Transport Dynamics, 2024, 12 (1): 2314762.
144	Xiang G , Li S , Shuang F , et al. SC-AIRL: Share-critic in adversarial inverse reinforcement learning for long-horizon task[J]. IEEE Robotics and Automation Letters, 2024, 9 (4): 3179- 3186.

[1]	王祥丰, 李文浩. 机器学习驱动的多智能体路径搜寻算法综述[J]. 运筹学学报, 2023, 27(4): 106-135.
[2]	华贇, 王祥丰, 金博. 面向城市交通信号优化的多智能体强化学习综述[J]. 运筹学学报, 2023, 27(2): 49-62.