基于策略梯度强化学习的高铁列车动态调度方法
CSTR:
作者:
作者单位:

1. 东北大学 流程工业综合自动化国家重点实验室,沈阳 110004;\hspace{3pt};2. 中国铁道科学研究院集团有限公司 通信信号研究所,北京 100081

作者简介:

通讯作者:

E-mail: spyu@mail.neu.edu.cn.

中图分类号:

TP273

基金项目:

国家自然科学基金项目(U1834211,61790574,61603262,61773269);辽宁省自然科学基金项目(2020-MS-093).


A policy gradient reinforcement learning algorithm for high-speed railway dynamic scheduling
Author:
Affiliation:

1. State Key Laboratory of Synthetical Automation for Process Industries,Northeastern University,Shenyang 110004,China;2. Signal & Communication Reseach Institute,China Academy of Railway Sciences Co., Ltd,Beijing 100081,China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    高速铁路以其运输能力大、速度快、全天候等优势,取得了飞速蓬勃的发展.而恶劣天气等突发事件会导致列车延误晚点,更甚者延误会沿着路网不断传播扩散,其带来的多米诺效应将造成大面积列车无法按计划运行图运行.目前依靠人工经验的动态调度方式难以满足快速优化调整的实际要求.因此,针对突发事件造成高铁列车延误晚点的动态调度问题,设定所有列车在各站到发时间晚点总和最小为优化目标,构建高铁列车可运行情况下的混合整数非线性规划模型,提出基于策略梯度强化学习的高铁列车动态调度方法,包括交互环境建立、智能体状态及动作集合定义、策略网络结构及动作选择方法和回报函数建立,并结合具体问题对策略梯度强化学习(REINFORCE)算法进行误差放大和阈值设定两种改进.最后对算法收敛性及算法改进后的性能提升进行仿真研究,并与Q-learning算法进行比较,结果表明所提出的方法可以有效地对高铁列车进行动态调度,将突发事件带来的延误影响降至最小,从而提高列车的运行效率.

    Abstract:

    The high-speed railway has achieved vigorous development in recent years due to its advantages of large transport capacity, fast speed and all-weather. But unexpected events such as bad weather will cause train delays, and even the delay will continue to spread along the road network. The domino effect will cause large-area trains to fail to operate according to the plan. At present, the dynamic scheduling method relying on manual experience is difficult to meet the actual requirements. Therefore, this paper aims at the problem of dynamic scheduling of high-speed train, setting the minimum sum of the delays of all trains at each station as the optimization goal. At the same time, a mixed-integer nonlinear programming (MINLP) model under traversable conditions is constructed, and a policy gradient reinforcement learning method is proposed including establishment of environment, definition of state and action set, policy network, action selection method, reward function and combined with the specific problems, the error amplification and threshold setting of REINFORCE algorithm are improved. Finally, the convergence and the performance improvement of the algorithm are studied and compared with the Q-learning algorithm. The results show that the method proposed in this paper can effectively reschedule high-speed trains, minimize the impact of delays, and improve the efficiency of train operation.

    参考文献
    相似文献
    引证文献
引用本文

俞胜平,韩忻辰,袁志明,等.基于策略梯度强化学习的高铁列车动态调度方法[J].控制与决策,2022,37(9):2407-2417

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2022-07-30
  • 出版日期: 2022-09-20
文章二维码