面向城市交通信号优化的多智能体强化学习综述

doi:10.15960/j.cnki.issn.1007-6093.2023.02.003

Abstract

Abstract:

With the rapid improvement of the national economy in recent years, people's travel demand has increased, bringing increasingly severe pressure on the current urban traffic signal system relying on traditional non-intelligent traffic lights. The significant increase in the complexity of the traffic network has led to the development of traffic signal control from a single-point problem to a system engineering problem, and the development of artificial intelligence technology brings more methods to dealing with urban traffic signal control. Swarm intelligence methods, represented by multi-agent reinforcement learning, have been widely used in traffic signal control and optimization, including traffic light control, autonomous driving, and vehicle-road collaboration. Compared to traditional methods, multi-agent reinforcement learning can empower the intelligence of traffic signal systems while implementing large-scale traffic signal system collaboration to improve the efficiency of urban traffic operations. The various components involved in urban transportation must collaborate in the vision of intelligent urban traffic. Multi-agent reinforcement learning is of great research value in urban traffic signal control and optimization. This paper will systematically introduce the basic theory of multi-agent deep reinforcement learning and its use in urban traffic signal optimization, summarize the existing approaches and analyze the drawbacks of each method. In addition, this paper will outline the challenges of multi-agent reinforcement learning methods for urban traffic signal optimization. Then the paper points out possible future research directions to promote the development of multi-agent reinforcement learning methods in urban traffic signal optimization.

Key words: multi-agent reinforcement learning, intelligent traffic, traffic signal control, autonomous driving

CLC Number:

O221.2

Yun HUA, Xiangfeng WANG, Bo JIN. Multi-agent deep reinforcement learning-based urban traffic signal management[J]. Operations Research Transactions, 2023, 27(2): 49-62.

Figures/Tables 5

References 53

1	周大可, 唐慕尧, 杨欣. 一种结合状态预测的深度强化学习交通信号控制方法[J]. 计算机应用研究, 2022, 39 (8): 2311- 2315.
2	Lin Y, Wang P, Ma M. Intelligent transportation system (ITS): Concept, challenge and opportunity[C]//International Conference on Big Data Security on Cloud, 2017.
3	Dion F , Rakha H , Kang Y . Comparison of delay estimates at under-saturated and over-saturated pre-timed signalized intersections[J]. Transportation Research Part B: Methodological, 2004, 38 (2): 99- 122. doi: 10.1016/S0191-2615(03)00003-1
4	Porche I , Lafortune S . Adaptive look-ahead optimization of traffic signals[J]. Journal of Intelligent Transportation System, 1999, 4 (3-4): 209- 254. doi: 10.1080/10248079908903749
5	Wei H, Zheng G, Yao H, Li Z. IntelliLight: A reinforcement learning approach for intelligent traffic light control[C]//SIGKDD, 2018.
6	Li W, Chen H, Jin B, et al. Multi-agent path finding with prioritized communication learning[C]//ICRA, 2022.
7	Kober J , Bagnell J , Peters J . Reinforcement learning in robotics: A survey[J]. The International Journal of Robotics Research, 2013, 32 (11): 1238- 1274. doi: 10.1177/0278364913495721
8	Lample G, Chaplot D. Playing FPS games with deep reinforcement learning[C]//AAAI, 2017.
9	Palanisamy P. Multi-agent connected autonomous driving using deep reinforcement learning[C]//IJCNN, 2020.
10	Abdoos M, Mozayani N, Bazzan A. Traffic light control in non-stationary environments based on multi agent Q-learning[C]//ITSC, 2011.
11	Balaji P , German X , Srinivasan D . Urban traffic signal control using reinforcement learning agents[J]. IET Intelligent Transport Systems, 2010, 4 (3): 177- 188. doi: 10.1049/iet-its.2009.0096
12	Chu T , Wang J , Codecà L , et al. Multi-agent deep reinforcement learning for large-scale traffic signal control[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 21 (3): 1086- 1095.
13	Mannion P, Duggan J, Howley E. An experimental review of reinforcement learning algorithms for adaptive traffic signal control[M]//Autonomic Road Transport Support Systems, 2016: 47-66.
14	Prashanth L. Bhatnagar S. Reinforcement learning with average cost for adaptive control of traffic lights at intersections[C]//ITCS, 2011.
15	Van der Pol E, Oliehoek F. Coordinated deep reinforcement learners for traffic light control[C]//NeurIPS, 2016.
16	Xiong Y, Zheng G, Xu K, et al. Learning traffic signal control from demonstrations[C]//CIKM, 2019.
17	Zang X, Yao H, Zheng G, et al. Metalight: Value-based meta-reinforcement learning for traffic signal control[C]//AAAI, 2020.
18	Brys T , Pham T , Taylor M . Distributed learning and multi-objectivity in traffic light control[J]. Connection Science, 2014, 26 (1): 65- 83. doi: 10.1080/09540091.2014.885282
19	Nishi T, Otaki K, Hayakawa K, et al. Traffic signal control based on reinforcement learning with graph convolutional neural nets[C]//ITSC, 2018.
20	Xu L , Xia X , Luo Q . The study of reinforcement learning for traffic self-adaptive control under multiagent markov game environment[J]. Mathematical Problems in Engineering, 2013, 8, 1- 10.
21	Casas N. Deep deterministic policy gradient for urban traffic light control[J]. 2017, arXiv: 1703.09035.
22	Wei H, Chen C, Zheng G, et al. Presslight: Learning max pressure control to coordinate traffic signals in arterial network[C]//SIGKDD, 2019.
23	Zhao C, Hu X, Wang G. PDLight: a deep reinforcement learning traffic light control algorithm with pressure and dynamic light duration[J]. 2020, arXiv: 2009.13711.
24	Wei H, Xu N, Zhang H, et al. Colight: Learning network-level cooperation for traffic signal control[C]//CIKM, 2019.
25	Aslani M , Mesgari M , Wiering M . Adaptive traffic signal control with actor-critic methods in a real-world traffic network with different traffic disruption events[J]. Transportation Research Part C: Emerging Technologies, 2017, 85, 732- 752. doi: 10.1016/j.trc.2017.09.020
26	Chen C, Wei H, Xu N, et al. Toward a thousand lights: Decentralized deep reinforcement learning for large-scale traffic signal control[C]//AAAI, 2020.
27	Coşkun M, Baggag A, Chawla S. Deep reinforcement learning for traffic light optimization[C]//ICDM, 2018.
28	Gao J, Shen Y, Liu J, et al. Adaptive traffic signal control: Deep reinforcement learning algorithm with experience replay and target network[J]. 2017, arXiv: 1705.02755.
29	Wiering M. Multi-agent reinforcement learning for traffic light control[C]//ICML, 2000.
30	El-Tantawy S, Abdulhai B. An agent-based learning towards decentralized and coordinated traffic signal control[C]//ITSC, 2010.
31	Arel I , Liu C , Urbanik T , et al. Reinforcement learning-based multi-agent system for network traffic signal control[J]. IET Intelligent Transport Systems, 2010, 4 (2): 128- 135. doi: 10.1049/iet-its.2009.0070
32	Watkins C , Dayan P . Q-learning[J]. Machine Learning, 1992, 8 (3): 279- 292.
33	Sutton R, McAllester D, Singh S, et al. Policy gradient methods for reinforcement learning with function approximation[C]//NeurIPS, 1999.
34	Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[J]. 2017, arXiv: 1707.06347.
35	Lillicrap T, Hunt J, Pritzel A, et al. Continuous control with deep reinforcement learning[J]. 2015, arXiv: 1509.02971.
36	Schulman J, Levine S, Abbeel P, et al. Trust region policy optimization[C]//ICML, 2015.
37	唐建华. 强化学习及其在城市交通信号控制中的应用研究[D]. 西安: 西安电子科技大学, 2012.
38	Aslani M , Mesgari M , Seipel S , et al. Developing adaptive traffic signal control by actor-critic and direct exploration methods[J]. Transport, 2019, 172 (5): 289- 298.
39	Zhang Z, Yang J, Zha H. Integrating independent and centralized multi-agent reinforcement learning for traffic signal network optimization[C]//AAMAS, 2020.
40	Chen X, Xiong G, Lv Y, et al. A collaborative communication-Qmix approach for large-scale networked traffic signal control[C]//ITSC, 2021.
41	Sukhbaatar S, Szlam A, Fergus R. Learning multiagent communication with backpropagation[C]//NeurIPS, 2016.
42	Xu M , Wu J , Huang L , et al. Network-wide traffic signal control based on the discovery of critical nodes and deep reinforcement learning[J]. Journal of Intelligent Transportation Systems, 2020, 24 (1): 1- 10. doi: 10.1080/15472450.2018.1527694
43	Ge H , Song Y , Wu C , et al. Cooperative deep q-learning with q-value transfer for multi-intersection signal control[J]. IEEE Access, 2019, 7, 40797- 40809.
44	Schlichtkrull M, Kipf T, Bloem P, et al. Modeling relational data with graph convolutional networks[C]//European Semantic Web Conference, 2018.
45	Wang Y , Xu T , Niu X , et al. STMARL: A spatio-temporal multi-agent reinforcement learning approach for cooperative traffic light control[J]. IEEE Transactions on Mobile Computing, 2020, 21 (6): 2228- 2242.
46	Huang X, Wu D, Jenkin M, et al. ModelLight: model-based meta-reinforcement learning for traffic signal control[J]. 2021, arXiv: 2111.08067.
47	Zhang H, Liu C, Zhang W, et al. Generalight: Improving environment generalization of traffic signal control via meta reinforcement learning[C]//CIKM, 2020.
48	Ault J, Hanna J, Sharon G. Learning an interpretable traffic signal control policy[C]//AAMAS, 2020.
49	Wei H, Chen C, Liu C, et al. Learning to simulate on sparse trajectory data[C]//ECML, 2021.
50	Zheng G, Liu H, Xu K, et al. Learning to simulate vehicle trajectories from demonstrations[C]//ICDE, 2020.
51	Zheng G, Liu C, Wei H, et al. Rebuilding city-wide traffic origin destination from road speed data[C]//ICDE, 2021.
52	Wu Q, Zhi P, Wei Y, et al. Communicate with traffic lights and vehicles based on multi-agent reinforcement learning[C]//CSCWD, 2021.
53	Capasso A, Maramotti P, Dell'Eva A, et al. End-to-End intersection handling using multi-agent deep reinforcement learning[C]//2021 IEEE Intelligent Vehicles Symposium, 2021.

元素	参考文献
车道队列长度	[5, 10, 11, 12, 13, 14, 15, 16, 17]
车辆等待时间	[5, 12, 13, 14, 15, 18, 19, 20]
车速	[5, 15, 21]
停车次数	[15]
是否发生事故	[15]
路口交通压力	[22, 23]

元素	参考文献
车道队列长度	[5, 10, 11, 12, 13, 14, 17, 19, 24]
车辆等待时间	[5, 12]
车速	[19, 21]
当前信号灯状态	[5, 13, 16, 17, 22, 24, 25, 26]
路口图像信息	[15, 27, 28]

动作类型	参考文献
设置当前阶段的持续时间	[20, 25]
设置阶段分割	[10, 11, 21]
改变或保持当前阶段	[13, 18, 22]
选择下一阶段	[12, 16, 31]

[1]	Wei XU, Yuefeng HUANG, Caihua CHEN. Charging and discharging scheduling for electric bus charging station with energy storage system [J]. Operations Research Transactions, 2023, 27(2): 95-109.
[2]	Maoran WANG, Xingju CAI, Zhongming WU, Deren HAN. First-order splitting algorithm for multi-model traffic equilibrium problems [J]. Operations Research Transactions, 2023, 27(2): 63-78.
[3]	Hu SHAO, Yue ZHUO, Pengjie LIU, Feng SHAO. Operational research methods for urban traffic flow estimation [J]. Operations Research Transactions, 2023, 27(2): 27-48.
[4]	He WEI, Haofei LIU, Dandan XU, Xuehua HAN, Liang WANG, Xiaodong ZHANG. A systematic review of researches and applications of bi-level programming in the context of urban transport [J]. Operations Research Transactions, 2023, 27(2): 1-26.
[5]	Minglu YE, Huan DENG. A new projection and contraction algorithm for solving quasimonotone variational inequalities [J]. Operations Research Transactions, 2023, 27(1): 127-137.
[6]	Wenhui XIE, Chen LING, Chenjian PAN. A tensor completion method based on tensor train decomposition and its application in image restoration [J]. Operations Research Transactions, 2022, 26(3): 31-43.
[7]	Binwu ZHANG, Xiucui GUAN. The bounded inverse optimal value problem on minimum spanning tree under unit infinity norm [J]. Operations Research Transactions, 2022, 26(3): 44-56.
[8]	Bing SU, Wyatt CARLSON, Jiabin FAN, Arthur GAO, Yanjun SHAO, Guohui LIN. Sharing bicycle relocating with minimum carbon emission [J]. Operations Research Transactions, 2022, 26(3): 75-91.
[9]	Ping ZHOU, Min JI, Yiwei JIANG. LPT heuristic for parallel-machine scheduling of maximizing total early work [J]. Operations Research Transactions, 2022, 26(3): 151-156.
[10]	Jiahao LYU, Honglin LUO, Zehua YANG, Jianwen PENG. A stochastic Bregman ADMM with its application in training sparse structure SVMs [J]. Operations Research Transactions, 2022, 26(2): 16-30.
[11]	Bo WANG, Li CHU, Liwei ZHANG, Hongwei ZHANG. An SAA approach for a class of second-order cone stochastic inverse quadratic programming problem [J]. Operations Research Transactions, 2022, 26(2): 31-44.
[12]	Xiaoli HUANG, Yuelin GAO, Bo ZHANG, Xia LIU. An adaptive global optimization algorithm for solving quadratically constrained quadratic programming problems [J]. Operations Research Transactions, 2022, 26(2): 83-100.
[13]	Xiquan SHAN, Meixia LI, Jinyu LIU. Smoothing Newton method for the tensor stochastic complementarity problem [J]. Operations Research Transactions, 2022, 26(2): 128-136.
[14]	Libo WANG, Wenhua LI, Dan YU. Online scheduling on single batch machine with variable lookahead interval [J]. Operations Research Transactions, 2022, 26(1): 134-140.
[15]	Jia HU, Tiande GUO, Congying HAN. Mini-batch stochastic block coordinate descent algorithm [J]. Operations Research Transactions, 2022, 26(1): 1-22.

Multi-agent deep reinforcement learning-based urban traffic signal management

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 5

References 53

Related Articles 15

Recommended Articles

Metrics

Comments