面向城市交通信号优化的多智能体强化学习综述

doi:10.15960/j.cnki.issn.1007-6093.2023.02.003

摘要/Abstract

摘要：

随着近年来国民经济水平的快速提高, 人民的出行需求快速增长, 给当前由传统非智能信号控制主导的道路交通信号系统带来了日趋严峻的压力。交通路网复杂程度的显著提升促使交通信号控制从单点问题向系统工程问题发展, 而人工智能技术的兴起, 使得城市交通信号优化有了更多的处理手段。以多智能体强化学习为代表的群体智能方法在最近几年被广泛应用于交通信号控制与优化, 其中包括交通信号灯控制、自动驾驶、车路协同等。多智能体强化学习方法相比于传统方法, 可以赋予交通信号系统智能化的同时实现大规模交通信号系统协作, 以提升城市交通运行效率。未来智慧城市交通愿景下, 参与城市交通的各个部分互相协作是至关重要的, 多智能体强化学习在城市交通信号优化具有极大研究价值。本文将系统介绍面向城市交通信号优化的多智能体强化学习的基本理论及其应用于城市交通信号优化领域的现状, 从智能体协作的角度对已有方法进行归纳, 并分析各类方法优缺点。此外, 本文总结多智能体强化学习方法在城市交通信号优化领域所面临的挑战, 并指出该领域未来潜在研究方向, 以促进多智能体强化学习方法在智能城市交通信号优化领域的发展。

关键词: 多智能体强化学习, 智能交通, 交通信号控制, 自动驾驶

Abstract:

With the rapid improvement of the national economy in recent years, people's travel demand has increased, bringing increasingly severe pressure on the current urban traffic signal system relying on traditional non-intelligent traffic lights. The significant increase in the complexity of the traffic network has led to the development of traffic signal control from a single-point problem to a system engineering problem, and the development of artificial intelligence technology brings more methods to dealing with urban traffic signal control. Swarm intelligence methods, represented by multi-agent reinforcement learning, have been widely used in traffic signal control and optimization, including traffic light control, autonomous driving, and vehicle-road collaboration. Compared to traditional methods, multi-agent reinforcement learning can empower the intelligence of traffic signal systems while implementing large-scale traffic signal system collaboration to improve the efficiency of urban traffic operations. The various components involved in urban transportation must collaborate in the vision of intelligent urban traffic. Multi-agent reinforcement learning is of great research value in urban traffic signal control and optimization. This paper will systematically introduce the basic theory of multi-agent deep reinforcement learning and its use in urban traffic signal optimization, summarize the existing approaches and analyze the drawbacks of each method. In addition, this paper will outline the challenges of multi-agent reinforcement learning methods for urban traffic signal optimization. Then the paper points out possible future research directions to promote the development of multi-agent reinforcement learning methods in urban traffic signal optimization.

Key words: multi-agent reinforcement learning, intelligent traffic, traffic signal control, autonomous driving

中图分类号:

O221.2

华贇, 王祥丰, 金博. 面向城市交通信号优化的多智能体强化学习综述[J]. 运筹学学报, 2023, 27(2): 49-62.

Yun HUA, Xiangfeng WANG, Bo JIN. Multi-agent deep reinforcement learning-based urban traffic signal management[J]. Operations Research Transactions, 2023, 27(2): 49-62.

图/表 5

图1

图2

表1

表2

表3

参考文献 53

1	周大可, 唐慕尧, 杨欣. 一种结合状态预测的深度强化学习交通信号控制方法[J]. 计算机应用研究, 2022, 39 (8): 2311- 2315.
2	Lin Y, Wang P, Ma M. Intelligent transportation system (ITS): Concept, challenge and opportunity[C]//International Conference on Big Data Security on Cloud, 2017.
3	Dion F , Rakha H , Kang Y . Comparison of delay estimates at under-saturated and over-saturated pre-timed signalized intersections[J]. Transportation Research Part B: Methodological, 2004, 38 (2): 99- 122. doi: 10.1016/S0191-2615(03)00003-1
4	Porche I , Lafortune S . Adaptive look-ahead optimization of traffic signals[J]. Journal of Intelligent Transportation System, 1999, 4 (3-4): 209- 254. doi: 10.1080/10248079908903749
5	Wei H, Zheng G, Yao H, Li Z. IntelliLight: A reinforcement learning approach for intelligent traffic light control[C]//SIGKDD, 2018.
6	Li W, Chen H, Jin B, et al. Multi-agent path finding with prioritized communication learning[C]//ICRA, 2022.
7	Kober J , Bagnell J , Peters J . Reinforcement learning in robotics: A survey[J]. The International Journal of Robotics Research, 2013, 32 (11): 1238- 1274. doi: 10.1177/0278364913495721
8	Lample G, Chaplot D. Playing FPS games with deep reinforcement learning[C]//AAAI, 2017.
9	Palanisamy P. Multi-agent connected autonomous driving using deep reinforcement learning[C]//IJCNN, 2020.
10	Abdoos M, Mozayani N, Bazzan A. Traffic light control in non-stationary environments based on multi agent Q-learning[C]//ITSC, 2011.
11	Balaji P , German X , Srinivasan D . Urban traffic signal control using reinforcement learning agents[J]. IET Intelligent Transport Systems, 2010, 4 (3): 177- 188. doi: 10.1049/iet-its.2009.0096
12	Chu T , Wang J , Codecà L , et al. Multi-agent deep reinforcement learning for large-scale traffic signal control[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 21 (3): 1086- 1095.
13	Mannion P, Duggan J, Howley E. An experimental review of reinforcement learning algorithms for adaptive traffic signal control[M]//Autonomic Road Transport Support Systems, 2016: 47-66.
14	Prashanth L. Bhatnagar S. Reinforcement learning with average cost for adaptive control of traffic lights at intersections[C]//ITCS, 2011.
15	Van der Pol E, Oliehoek F. Coordinated deep reinforcement learners for traffic light control[C]//NeurIPS, 2016.
16	Xiong Y, Zheng G, Xu K, et al. Learning traffic signal control from demonstrations[C]//CIKM, 2019.
17	Zang X, Yao H, Zheng G, et al. Metalight: Value-based meta-reinforcement learning for traffic signal control[C]//AAAI, 2020.
18	Brys T , Pham T , Taylor M . Distributed learning and multi-objectivity in traffic light control[J]. Connection Science, 2014, 26 (1): 65- 83. doi: 10.1080/09540091.2014.885282
19	Nishi T, Otaki K, Hayakawa K, et al. Traffic signal control based on reinforcement learning with graph convolutional neural nets[C]//ITSC, 2018.
20	Xu L , Xia X , Luo Q . The study of reinforcement learning for traffic self-adaptive control under multiagent markov game environment[J]. Mathematical Problems in Engineering, 2013, 8, 1- 10.
21	Casas N. Deep deterministic policy gradient for urban traffic light control[J]. 2017, arXiv: 1703.09035.
22	Wei H, Chen C, Zheng G, et al. Presslight: Learning max pressure control to coordinate traffic signals in arterial network[C]//SIGKDD, 2019.
23	Zhao C, Hu X, Wang G. PDLight: a deep reinforcement learning traffic light control algorithm with pressure and dynamic light duration[J]. 2020, arXiv: 2009.13711.
24	Wei H, Xu N, Zhang H, et al. Colight: Learning network-level cooperation for traffic signal control[C]//CIKM, 2019.
25	Aslani M , Mesgari M , Wiering M . Adaptive traffic signal control with actor-critic methods in a real-world traffic network with different traffic disruption events[J]. Transportation Research Part C: Emerging Technologies, 2017, 85, 732- 752. doi: 10.1016/j.trc.2017.09.020
26	Chen C, Wei H, Xu N, et al. Toward a thousand lights: Decentralized deep reinforcement learning for large-scale traffic signal control[C]//AAAI, 2020.
27	Coşkun M, Baggag A, Chawla S. Deep reinforcement learning for traffic light optimization[C]//ICDM, 2018.
28	Gao J, Shen Y, Liu J, et al. Adaptive traffic signal control: Deep reinforcement learning algorithm with experience replay and target network[J]. 2017, arXiv: 1705.02755.
29	Wiering M. Multi-agent reinforcement learning for traffic light control[C]//ICML, 2000.
30	El-Tantawy S, Abdulhai B. An agent-based learning towards decentralized and coordinated traffic signal control[C]//ITSC, 2010.
31	Arel I , Liu C , Urbanik T , et al. Reinforcement learning-based multi-agent system for network traffic signal control[J]. IET Intelligent Transport Systems, 2010, 4 (2): 128- 135. doi: 10.1049/iet-its.2009.0070
32	Watkins C , Dayan P . Q-learning[J]. Machine Learning, 1992, 8 (3): 279- 292.
33	Sutton R, McAllester D, Singh S, et al. Policy gradient methods for reinforcement learning with function approximation[C]//NeurIPS, 1999.
34	Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[J]. 2017, arXiv: 1707.06347.
35	Lillicrap T, Hunt J, Pritzel A, et al. Continuous control with deep reinforcement learning[J]. 2015, arXiv: 1509.02971.
36	Schulman J, Levine S, Abbeel P, et al. Trust region policy optimization[C]//ICML, 2015.
37	唐建华. 强化学习及其在城市交通信号控制中的应用研究[D]. 西安: 西安电子科技大学, 2012.
38	Aslani M , Mesgari M , Seipel S , et al. Developing adaptive traffic signal control by actor-critic and direct exploration methods[J]. Transport, 2019, 172 (5): 289- 298.
39	Zhang Z, Yang J, Zha H. Integrating independent and centralized multi-agent reinforcement learning for traffic signal network optimization[C]//AAMAS, 2020.
40	Chen X, Xiong G, Lv Y, et al. A collaborative communication-Qmix approach for large-scale networked traffic signal control[C]//ITSC, 2021.
41	Sukhbaatar S, Szlam A, Fergus R. Learning multiagent communication with backpropagation[C]//NeurIPS, 2016.
42	Xu M , Wu J , Huang L , et al. Network-wide traffic signal control based on the discovery of critical nodes and deep reinforcement learning[J]. Journal of Intelligent Transportation Systems, 2020, 24 (1): 1- 10. doi: 10.1080/15472450.2018.1527694
43	Ge H , Song Y , Wu C , et al. Cooperative deep q-learning with q-value transfer for multi-intersection signal control[J]. IEEE Access, 2019, 7, 40797- 40809.
44	Schlichtkrull M, Kipf T, Bloem P, et al. Modeling relational data with graph convolutional networks[C]//European Semantic Web Conference, 2018.
45	Wang Y , Xu T , Niu X , et al. STMARL: A spatio-temporal multi-agent reinforcement learning approach for cooperative traffic light control[J]. IEEE Transactions on Mobile Computing, 2020, 21 (6): 2228- 2242.
46	Huang X, Wu D, Jenkin M, et al. ModelLight: model-based meta-reinforcement learning for traffic signal control[J]. 2021, arXiv: 2111.08067.
47	Zhang H, Liu C, Zhang W, et al. Generalight: Improving environment generalization of traffic signal control via meta reinforcement learning[C]//CIKM, 2020.
48	Ault J, Hanna J, Sharon G. Learning an interpretable traffic signal control policy[C]//AAMAS, 2020.
49	Wei H, Chen C, Liu C, et al. Learning to simulate on sparse trajectory data[C]//ECML, 2021.
50	Zheng G, Liu H, Xu K, et al. Learning to simulate vehicle trajectories from demonstrations[C]//ICDE, 2020.
51	Zheng G, Liu C, Wei H, et al. Rebuilding city-wide traffic origin destination from road speed data[C]//ICDE, 2021.
52	Wu Q, Zhi P, Wei Y, et al. Communicate with traffic lights and vehicles based on multi-agent reinforcement learning[C]//CSCWD, 2021.
53	Capasso A, Maramotti P, Dell'Eva A, et al. End-to-End intersection handling using multi-agent deep reinforcement learning[C]//2021 IEEE Intelligent Vehicles Symposium, 2021.

元素	参考文献
车道队列长度	[5, 10, 11, 12, 13, 14, 15, 16, 17]
车辆等待时间	[5, 12, 13, 14, 15, 18, 19, 20]
车速	[5, 15, 21]
停车次数	[15]
是否发生事故	[15]
路口交通压力	[22, 23]

元素	参考文献
车道队列长度	[5, 10, 11, 12, 13, 14, 17, 19, 24]
车辆等待时间	[5, 12]
车速	[19, 21]
当前信号灯状态	[5, 13, 16, 17, 22, 24, 25, 26]
路口图像信息	[15, 27, 28]

动作类型	参考文献
设置当前阶段的持续时间	[20, 25]
设置阶段分割	[10, 11, 21]
改变或保持当前阶段	[13, 18, 22]
选择下一阶段	[12, 16, 31]

[1]	徐薇, 黄悦丰, 陈彩华. 考虑配置储能系统的电动公交充电站充放电调度策略[J]. 运筹学学报, 2023, 27(2): 95-109.
[2]	王茂然, 蔡邢菊, 吴中明, 韩德仁. 多模式交通均衡问题的一阶分裂算法[J]. 运筹学学报, 2023, 27(2): 63-78.
[3]	邵虎, 卓越, 刘鹏杰, 邵枫. 城市交通流量估计的运筹学方法[J]. 运筹学学报, 2023, 27(2): 27-48.
[4]	魏贺, 刘昊飞, 许丹丹, 韩雪华, 王良, 张晓东. 双层规划在城市交通领域研究与应用的系统综述[J]. 运筹学学报, 2023, 27(2): 1-26.
[5]	叶明露, 邓欢. 一种新的求解拟单调变分不等式的压缩投影算法[J]. 运筹学学报, 2023, 27(1): 127-137.
[6]	谢文蕙, 凌晨, 潘晨健. 一个基于张量火车分解的张量填充方法及在图像恢复中的应用[J]. 运筹学学报, 2022, 26(3): 31-43.
[7]	张斌武, 关秀翠. 单位无穷范数下边权有界的最小支撑树逆最优值问题[J]. 运筹学学报, 2022, 26(3): 44-56.
[8]	苏兵, WyattCarlson, 范佳彬, GAO Arthur, 邵艳君, 林国辉. 最小化碳排放的共享单车迁移问题[J]. 运筹学学报, 2022, 26(3): 75-91.
[9]	周萍, 季敏, 蒋义伟. 极大化提前完工总量平行机排序问题的LPT算法[J]. 运筹学学报, 2022, 26(3): 151-156.
[10]	吕袈豪, 罗洪林, 杨泽华, 彭建文. 随机Bregman ADMM及其在训练具有离散结构的支持向量机中的应用[J]. 运筹学学报, 2022, 26(2): 16-30.
[11]	王博, 初丽, 张立卫, 张宏伟. 随机二阶锥二次规划逆问题的SAA方法[J]. 运筹学学报, 2022, 26(2): 31-44.
[12]	黄小利, 高岳林, 张博, 刘霞. 一种求解二次约束二次规划问题的自适应全局优化算法[J]. 运筹学学报, 2022, 26(2): 83-100.
[13]	单锡泉, 李梅霞, 刘瑾瑜. 求解张量随机互补问题的光滑牛顿算法[J]. 运筹学学报, 2022, 26(2): 128-136.
[14]	王利博, 李文华, 余丹. 单机上带有可变前瞻区间的分批在线排序问题[J]. 运筹学学报, 2022, 26(1): 134-140.
[15]	胡佳, 郭田德, 韩丛英. 小批量随机块坐标下降算法[J]. 运筹学学报, 2022, 26(1): 1-22.