## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# AttendLight: Universal Attention-Based Reinforcement Learning Model for Traffic Signal Control

NIPS 2020, (2020)

EI

Keywords

Abstract

We propose AttendLight, an end-to-end Reinforcement Learning (RL) algorithm for the problem of traffic signal control. Previous approaches for this problem have the shortcoming that they require training for each new intersection with a different structure or traffic flow distribution. AttendLight solves this issue by training a single,...More

Code:

Data:

Introduction

- With the emergence of urbanization and the increase in household car ownership, traffic congestion has been one of the major challenges in many highly-populated cities.
- Traffic congestion can be mitigated by road expansion/correction, sophisticated road allowance rules, or improved traffic signal controlling.
- Approaches for controlling signalized intersections could be categorized into two main classes, namely conventional methods and adaptive methods
- In the former, customarily rule-based fixed cycles and phase times are determined a priori and offline based on historical measurements as well

Highlights

- With the emergence of urbanization and the increase in household car ownership, traffic congestion has been one of the major challenges in many highly-populated cities
- We propose the AttendLight framework, a reinforcement learning algorithm, to train a “universal” model which can be used for any intersection, with any number of roads, lanes, phase, traffic distribution, and type of sensory data to measure the traffic
- We evaluate the key feature of AttendLight that enables it to be utilized for multiple traffic signal control problem (TSCP)
- We consider the traffic signal control problem, and for the first time, we propose a universal Reinforcement Learning (RL) model, called AttendLight, which is capable of providing efficient control for any type of intersections
- The trained model is tested in new intersections verifying the generalizability of AttendLight
- Similar to TSCP, each of these problems has to deal with the varying number of inputs and outputs, and AttendLight can be applied with slight modifications

Results

- The authors train the AttendLight for a particular intersection instance and test it for the same intersection configuration.
- The authors train AttendLight by running all 42 training intersection instances in parallel to obtain data for the training and use the REINFORCE algorithm to optimize the trainable parameters.
- The testing set includes new intersection topologies as well as new traffic-data that has not been observed during training.

Conclusion

- The authors consider the traffic signal control problem, and for the first time, the authors propose a universal RL model, called AttendLight, which is capable of providing efficient control for any type of intersections
- To provide such capability to the model, the authors propose a framework including two attention mechanisms to make the input and output of the model, independent of the intersection structure.
- Similar to TSCP, each of these problems has to deal with the varying number of inputs and outputs, and AttendLight can be applied with slight modifications

- Table1: All intersection configurations intersection ID #road #lanes per road #phase (minp |Lp|, maxp |Lp|)
- Table2: Parameters of the synthetic data. Each parenthesis shows λ of the Poisson distribution and the probability of having two vehicle arrival at each time
- Table3: Results of all algorithms for INT1-INT5 case
- Table4: Results of all algorithms for INT6-INT9 case
- Table5: Results of all algorithms for INT10-INT11 case
- Table6: Intersections used for the training of the multi-env regime
- Table7: Intersections used for the testing of the multi-env regime case

Related work

- The selection of RL components in traffic light control is quite challenging. The most common action set for the traffic problem is the set of all possible phases. In [22] an image-like representation is used as the state and a combination of vehicle delay and waiting time is considered as the reward. A deep Q-Network algorithm was proposed in [15], where the queue length of the last four samples is defined as the state, and reward is defined as the absolute value of the difference between queue length of approaching lanes. In [16] the intersection was divided into multiple chunks building a matrix such that each chunk contains a binary indicator for existence of a car and its speed. Using this matrix as the state and defining reward as the difference of the cumulative waiting time for two cycles, they proposed to learn the duration of each phase in a fixed cycle by a Double Dueling DQN algorithm with a prioritized experience replay. Likewise, [24] defined a similar state by dividing each lane into a few chucks and the reward is the reduction of cumulative delay in the intersection. A DQN approach was proposed to train a policy to choose the next phase. Ault et al [4] proposed three DQN-based algorithms to obtain an interpretable policy. A simple function approximator with 256 variables was used and is showed that it obtains slightly worse result compared to the DQN algorithm with an uninterpretable approximator. The IntelliLight algorithm was proposed in [34]. The state and reward are a combination of several components. A multi-intersection problem was considered in [31], where an RL agent was trained for every individual intersection. The main idea is to use the pressure as the reward function, thus the algorithm is called PressLight. To efficiently model the influence of the neighbor intersections, a graph attentional network [29] is utilized in CoLight [32]. See [33] for a detailed review of conventional and RL-based methods for TSCP.

Study subjects and analysis

samples: 4

In [22] an image-like representation is used as the state and a combination of vehicle delay and waiting time is considered as the reward. A deep Q-Network algorithm was proposed in [15], where the queue length of the last four samples is defined as the state, and reward is defined as the absolute value of the difference between queue length of approaching lanes. In [16] the intersection was divided into multiple chunks building a matrix such that each chunk contains a binary indicator for existence of a car and its speed

cases: 112

In Figure 3d, FRAP is not applicable. When considering all 112 cases, AttendLight yields 46%, 39%, 34%, 16%, and 9% improvement over FixedTime, MaxPressure, SOTL, DQTSC-M, and FRAP, respectively. Thus, when AttendLight is used solely to train a single environment, it works well for all the available number of roads, lanes, phases, and all traffic-data

Reference

- Atlanta traffic data. https://github.com/gjzheng93/frap-pub/tree/master/data/template_ls. Accessed:2020-05-14
- Hangzhou traffic data. https://github.com/traffic-signal-control/sample-code/tree/master/data. Accessed:2020-05-14
- Implementation of FRAP algorithm. https://github.com/gjzheng93/frap-pub. Accessed:2020-05-14
- Ault, J., Hanna, J., Sharon, G.: Learning an interpretable traffic signal control policy. arXiv preprint arXiv:1912.11023 (2019)
- Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
- Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S.: Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940 (2016)
- Bengio, Y., Lodi, A., Prouvost, A.: Machine learning for combinatorial optimization: a methodological tour d’horizon. arXiv preprint arXiv:1811.06128 (2018)
- Choi, E., Bahadori, M.T., Sun, J., Kulas, J., Schuetz, A., Stewart, W.: Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. In: Advances in Neural Information Processing Systems, pp. 3504–3512 (2016)
- Dooley, E.: Here’s how much time americans waste in traffic (2015). URL https://abcnews.go.com/US/time-americans-waste-traffic/story?id=33313765
- Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1126–1135. JMLR. org (2017)
- Fischer, T.G.: Reinforcement learning in financial markets-a survey. Tech. rep., FAU Discussion Papers in Economics (2018)
- Gershenson, C.: Self-organizing traffic lights. arXiv preprint nlin/0411066 (2004)
- Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE international conference on robotics and automation (ICRA), pp. 3389–3396. IEEE (2017)
- Koonce, P., Rodegerdts, L.: Traffic signal timing manual. Tech. rep., United States. Federal Highway Administration (2008)
- Li, L., Lv, Y., Wang, F.Y.: Traffic signal timing via deep reinforcement learning. IEEE/CAA Journal of Automatica Sinica 3(3), 247–254 (2016)
- Liang, X., Du, X., Wang, G., Han, Z.: Deep reinforcement learning for traffic light control in vehicular networks. arXiv preprint arXiv:1803.11115 (2018)
- Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)
- Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. nature 518(7540), 529–533 (2015)
- Nazari, M., Oroojlooy, A., Snyder, L., Takác, M.: Reinforcement learning for solving the vehicle routing problem. In: Advances in Neural Information Processing Systems, pp. 9839–9849 (2018)
- OroojlooyJadid, A., Hajinezhad, D.: A review of cooperative multi-agent deep reinforcement learning. arXiv preprint arXiv:1908.03963 (2019)
- Pimpin, L., Retat, L., Fecht, D., de Preux, L., Sassi, F., Gulliver, J., Belloni, A., Ferguson, B., Corbould, E., Jaccard, A., et al.: Estimating the costs of air pollution to the national health service and social care: An assessment and forecast up to 2035. PLoS medicine 15(7) (2018)
- Van der Pol, E., Oliehoek, F.A.: Coordinated deep reinforcement learners for traffic light control. Proceedings of Learning, Inference and Control of Multi-Agent Systems (at NIPS 2016) (2016)
- Seo, S., Huang, J., Yang, H., Liu, Y.: Interpretable convolutional neural networks with dual local and global attention for review rating prediction. In: Proceedings of the eleventh ACM conference on recommender systems, pp. 297–305 (2017)
- Shabestary, S.M.A., Abdulhai, B.: Deep learning vs. discrete reinforcement learning for adaptive traffic signal control. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pp. 286–293. IEEE (2018)
- Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al.: Mastering the game of go with deep neural networks and tree search. nature 529(7587), 484 (2016)
- Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. MIT press (2018)
- Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, pp. 1057–1063 (2000)
- Varaiya, P.: The max-pressure controller for arbitrary networks of signalized intersections. In: Advances in Dynamic Network Modeling in Complex Transportation Systems, pp. 27–66.
- Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)
- Wei, H., Chen, C., Wu, K., Zheng, G., Yu, Z., Gayah, V., Li, Z.: Deep reinforcement learning for traffic signal control along arterials (2019)
- Wei, H., Chen, C., Zheng, G., Wu, K., Gayah, V., Xu, K., Li, Z.: Presslight: Learning max pressure control to coordinate traffic signals in arterial network. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1290– 1298 (2019)
- Wei, H., Xu, N., Zhang, H., Zheng, G., Zang, X., Chen, C., Zhang, W., Zhu, Y., Xu, K., Li, Z.: Colight: Learning network-level cooperation for traffic signal control. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1913–1922 (2019)
- Wei, H., Zheng, G., Gayah, V., Li, Z.: A survey on traffic signal control methods. arXiv preprint arXiv:1904.08117 (2019)
- Wei, H., Zheng, G., Yao, H., Li, Z.: Intellilight: A reinforcement learning approach for intelligent traffic light control. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2496–2505 (2018)
- Zang, X., Yao, H., Zheng, G., Xu, N., Xu, K., Li, Z.: Metalight: Value-based meta-reinforcement learning for traffic signal control (2020)
- Zhang, H., Feng, S., Liu, C., Ding, Y., Zhu, Y., Zhou, Z., Zhang, W., Yu, Y., Jin, H., Li, Z.: Cityflow: A multi-agent reinforcement learning environment for large scale city traffic scenario. In: The World Wide Web Conference, pp. 3620–3624 (2019)
- Zheng, G., Xiong, Y., Zang, X., Feng, J., Wei, H., Zhang, H., Li, Y., Xu, K., Li, Z.: Learning phase competition for traffic signal control. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1963–1972 (2019)

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn