面向城市交通信号优化的多智能体强化学习综述

华贇, 王祥丰, 金博

doi:10.15960/j.cnki.issn.1007-6093.2023.02.003

运筹学学报 >

2023 , Vol. 27 >Issue 2: 49 - 62

DOI: https://doi.org/10.15960/j.cnki.issn.1007-6093.2023.02.003

面向城市交通信号优化的多智能体强化学习综述

展开

1. 华东师范大学计算机科学与技术学院, 上海 200062

王祥丰, E-mail: xfwang@cs.ecnu.edu.cn

收稿日期: 2022-06-12

网络出版日期: 2023-06-13

基金资助

国家重点研发计划(2021YFA1000300);国家重点研发计划(2021YFA1000302);国家自然科学基金(12071145);国家自然科学基金(11971216)

收起

Multi-agent deep reinforcement learning-based urban traffic signal management

Expand

1. School of Computer Science and Technology, East China Normal University, Shanghai 200062, China

Received date: 2022-06-12

Online published: 2023-06-13

Fold

摘要

随着近年来国民经济水平的快速提高, 人民的出行需求快速增长, 给当前由传统非智能信号控制主导的道路交通信号系统带来了日趋严峻的压力。交通路网复杂程度的显著提升促使交通信号控制从单点问题向系统工程问题发展, 而人工智能技术的兴起, 使得城市交通信号优化有了更多的处理手段。以多智能体强化学习为代表的群体智能方法在最近几年被广泛应用于交通信号控制与优化, 其中包括交通信号灯控制、自动驾驶、车路协同等。多智能体强化学习方法相比于传统方法, 可以赋予交通信号系统智能化的同时实现大规模交通信号系统协作, 以提升城市交通运行效率。未来智慧城市交通愿景下, 参与城市交通的各个部分互相协作是至关重要的, 多智能体强化学习在城市交通信号优化具有极大研究价值。本文将系统介绍面向城市交通信号优化的多智能体强化学习的基本理论及其应用于城市交通信号优化领域的现状, 从智能体协作的角度对已有方法进行归纳, 并分析各类方法优缺点。此外, 本文总结多智能体强化学习方法在城市交通信号优化领域所面临的挑战, 并指出该领域未来潜在研究方向, 以促进多智能体强化学习方法在智能城市交通信号优化领域的发展。

关键词： 多智能体强化学习; 智能交通; 交通信号控制; 自动驾驶

本文引用格式

华贇, 王祥丰, 金博 . 面向城市交通信号优化的多智能体强化学习综述[J]. 运筹学学报, 2023 , 27(2) : 49 -62 . DOI: 10.15960/j.cnki.issn.1007-6093.2023.02.003

Abstract

With the rapid improvement of the national economy in recent years, people's travel demand has increased, bringing increasingly severe pressure on the current urban traffic signal system relying on traditional non-intelligent traffic lights. The significant increase in the complexity of the traffic network has led to the development of traffic signal control from a single-point problem to a system engineering problem, and the development of artificial intelligence technology brings more methods to dealing with urban traffic signal control. Swarm intelligence methods, represented by multi-agent reinforcement learning, have been widely used in traffic signal control and optimization, including traffic light control, autonomous driving, and vehicle-road collaboration. Compared to traditional methods, multi-agent reinforcement learning can empower the intelligence of traffic signal systems while implementing large-scale traffic signal system collaboration to improve the efficiency of urban traffic operations. The various components involved in urban transportation must collaborate in the vision of intelligent urban traffic. Multi-agent reinforcement learning is of great research value in urban traffic signal control and optimization. This paper will systematically introduce the basic theory of multi-agent deep reinforcement learning and its use in urban traffic signal optimization, summarize the existing approaches and analyze the drawbacks of each method. In addition, this paper will outline the challenges of multi-agent reinforcement learning methods for urban traffic signal optimization. Then the paper points out possible future research directions to promote the development of multi-agent reinforcement learning methods in urban traffic signal optimization.

Key words： multi-agent reinforcement learning; intelligent traffic; traffic signal control; autonomous driving

参考文献

1	周大可, 唐慕尧, 杨欣. 一种结合状态预测的深度强化学习交通信号控制方法[J]. 计算机应用研究, 2022, 39 (8): 2311- 2315.
2	Lin Y, Wang P, Ma M. Intelligent transportation system (ITS): Concept, challenge and opportunity[C]//International Conference on Big Data Security on Cloud, 2017.
3	Dion F , Rakha H , Kang Y . Comparison of delay estimates at under-saturated and over-saturated pre-timed signalized intersections[J]. Transportation Research Part B: Methodological, 2004, 38 (2): 99- 122.
4	Porche I , Lafortune S . Adaptive look-ahead optimization of traffic signals[J]. Journal of Intelligent Transportation System, 1999, 4 (3-4): 209- 254.
5	Wei H, Zheng G, Yao H, Li Z. IntelliLight: A reinforcement learning approach for intelligent traffic light control[C]//SIGKDD, 2018.
6	Li W, Chen H, Jin B, et al. Multi-agent path finding with prioritized communication learning[C]//ICRA, 2022.
7	Kober J , Bagnell J , Peters J . Reinforcement learning in robotics: A survey[J]. The International Journal of Robotics Research, 2013, 32 (11): 1238- 1274.
8	Lample G, Chaplot D. Playing FPS games with deep reinforcement learning[C]//AAAI, 2017.
9	Palanisamy P. Multi-agent connected autonomous driving using deep reinforcement learning[C]//IJCNN, 2020.
10	Abdoos M, Mozayani N, Bazzan A. Traffic light control in non-stationary environments based on multi agent Q-learning[C]//ITSC, 2011.
11	Balaji P , German X , Srinivasan D . Urban traffic signal control using reinforcement learning agents[J]. IET Intelligent Transport Systems, 2010, 4 (3): 177- 188.
12	Chu T , Wang J , Codecà L , et al. Multi-agent deep reinforcement learning for large-scale traffic signal control[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 21 (3): 1086- 1095.
13	Mannion P, Duggan J, Howley E. An experimental review of reinforcement learning algorithms for adaptive traffic signal control[M]//Autonomic Road Transport Support Systems, 2016: 47-66.
14	Prashanth L. Bhatnagar S. Reinforcement learning with average cost for adaptive control of traffic lights at intersections[C]//ITCS, 2011.
15	Van der Pol E, Oliehoek F. Coordinated deep reinforcement learners for traffic light control[C]//NeurIPS, 2016.
16	Xiong Y, Zheng G, Xu K, et al. Learning traffic signal control from demonstrations[C]//CIKM, 2019.
17	Zang X, Yao H, Zheng G, et al. Metalight: Value-based meta-reinforcement learning for traffic signal control[C]//AAAI, 2020.
18	Brys T , Pham T , Taylor M . Distributed learning and multi-objectivity in traffic light control[J]. Connection Science, 2014, 26 (1): 65- 83.
19	Nishi T, Otaki K, Hayakawa K, et al. Traffic signal control based on reinforcement learning with graph convolutional neural nets[C]//ITSC, 2018.
20	Xu L , Xia X , Luo Q . The study of reinforcement learning for traffic self-adaptive control under multiagent markov game environment[J]. Mathematical Problems in Engineering, 2013, 8, 1- 10.
21	Casas N. Deep deterministic policy gradient for urban traffic light control[J]. 2017, arXiv: 1703.09035.
22	Wei H, Chen C, Zheng G, et al. Presslight: Learning max pressure control to coordinate traffic signals in arterial network[C]//SIGKDD, 2019.
23	Zhao C, Hu X, Wang G. PDLight: a deep reinforcement learning traffic light control algorithm with pressure and dynamic light duration[J]. 2020, arXiv: 2009.13711.
24	Wei H, Xu N, Zhang H, et al. Colight: Learning network-level cooperation for traffic signal control[C]//CIKM, 2019.
25	Aslani M , Mesgari M , Wiering M . Adaptive traffic signal control with actor-critic methods in a real-world traffic network with different traffic disruption events[J]. Transportation Research Part C: Emerging Technologies, 2017, 85, 732- 752.
26	Chen C, Wei H, Xu N, et al. Toward a thousand lights: Decentralized deep reinforcement learning for large-scale traffic signal control[C]//AAAI, 2020.
27	Co?kun M, Baggag A, Chawla S. Deep reinforcement learning for traffic light optimization[C]//ICDM, 2018.
28	Gao J, Shen Y, Liu J, et al. Adaptive traffic signal control: Deep reinforcement learning algorithm with experience replay and target network[J]. 2017, arXiv: 1705.02755.
29	Wiering M. Multi-agent reinforcement learning for traffic light control[C]//ICML, 2000.
30	El-Tantawy S, Abdulhai B. An agent-based learning towards decentralized and coordinated traffic signal control[C]//ITSC, 2010.
31	Arel I , Liu C , Urbanik T , et al. Reinforcement learning-based multi-agent system for network traffic signal control[J]. IET Intelligent Transport Systems, 2010, 4 (2): 128- 135.
32	Watkins C , Dayan P . Q-learning[J]. Machine Learning, 1992, 8 (3): 279- 292.
33	Sutton R, McAllester D, Singh S, et al. Policy gradient methods for reinforcement learning with function approximation[C]//NeurIPS, 1999.
34	Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[J]. 2017, arXiv: 1707.06347.
35	Lillicrap T, Hunt J, Pritzel A, et al. Continuous control with deep reinforcement learning[J]. 2015, arXiv: 1509.02971.
36	Schulman J, Levine S, Abbeel P, et al. Trust region policy optimization[C]//ICML, 2015.
37	唐建华. 强化学习及其在城市交通信号控制中的应用研究[D]. 西安: 西安电子科技大学, 2012.
38	Aslani M , Mesgari M , Seipel S , et al. Developing adaptive traffic signal control by actor-critic and direct exploration methods[J]. Transport, 2019, 172 (5): 289- 298.
39	Zhang Z, Yang J, Zha H. Integrating independent and centralized multi-agent reinforcement learning for traffic signal network optimization[C]//AAMAS, 2020.
40	Chen X, Xiong G, Lv Y, et al. A collaborative communication-Qmix approach for large-scale networked traffic signal control[C]//ITSC, 2021.
41	Sukhbaatar S, Szlam A, Fergus R. Learning multiagent communication with backpropagation[C]//NeurIPS, 2016.
42	Xu M , Wu J , Huang L , et al. Network-wide traffic signal control based on the discovery of critical nodes and deep reinforcement learning[J]. Journal of Intelligent Transportation Systems, 2020, 24 (1): 1- 10.
43	Ge H , Song Y , Wu C , et al. Cooperative deep q-learning with q-value transfer for multi-intersection signal control[J]. IEEE Access, 2019, 7, 40797- 40809.
44	Schlichtkrull M, Kipf T, Bloem P, et al. Modeling relational data with graph convolutional networks[C]//European Semantic Web Conference, 2018.
45	Wang Y , Xu T , Niu X , et al. STMARL: A spatio-temporal multi-agent reinforcement learning approach for cooperative traffic light control[J]. IEEE Transactions on Mobile Computing, 2020, 21 (6): 2228- 2242.
46	Huang X, Wu D, Jenkin M, et al. ModelLight: model-based meta-reinforcement learning for traffic signal control[J]. 2021, arXiv: 2111.08067.
47	Zhang H, Liu C, Zhang W, et al. Generalight: Improving environment generalization of traffic signal control via meta reinforcement learning[C]//CIKM, 2020.
48	Ault J, Hanna J, Sharon G. Learning an interpretable traffic signal control policy[C]//AAMAS, 2020.
49	Wei H, Chen C, Liu C, et al. Learning to simulate on sparse trajectory data[C]//ECML, 2021.
50	Zheng G, Liu H, Xu K, et al. Learning to simulate vehicle trajectories from demonstrations[C]//ICDE, 2020.
51	Zheng G, Liu C, Wei H, et al. Rebuilding city-wide traffic origin destination from road speed data[C]//ICDE, 2021.
52	Wu Q, Zhi P, Wei Y, et al. Communicate with traffic lights and vehicles based on multi-agent reinforcement learning[C]//CSCWD, 2021.
53	Capasso A, Maramotti P, Dell'Eva A, et al. End-to-End intersection handling using multi-agent deep reinforcement learning[C]//2021 IEEE Intelligent Vehicles Symposium, 2021.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献