基于强化学习的多目标车辆跟随决策算法

doi:10.13195/j.kzyjc.2020.0426

首页 > 过刊浏览>2021年第36卷第10期 >2497-2503. DOI:10.13195/j.kzyjc.2020.0426

基于强化学习的多目标车辆跟随决策算法
DOI:
                        10.13195/j.kzyjc.2020.0426
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:西南交通大学 信息科学与技术学院,成都 611756
作者简介:
通讯作者:E-mail: jhou@swjtu.edu.cn.
中图分类号:TP273
基金项目:浙江大学CAD&CG国家重点实验室开放课题(A1923)；成都市科技项目(2015-HM01-00050-SF).

Multi-objective vehicle following decision algorithm based on reinforcement learning

Author:

Affiliation:

School of Information Science and Technology,Southwest Jiaotong University,Chengdu 611756,China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

为满足自适应巡航系统跟车模式下的舒适性需求并兼顾车辆安全性和行车效率,解决已有算法泛化性和舒适性差的问题,基于深度确定性策略梯度算法(deep deterministic policy gradient,DDPG),提出一种新的多目标车辆跟随决策算法.根据跟随车辆与领航车辆的相互纵向运动学特性,建立车辆跟随过程的马尔可夫决策过程(Markov decision process,MDP)模型.结合最小安全距离模型,设计一个高效、舒适、安全的车辆跟随决策算法.为提高模型收敛速度,改进了DDPG算法经验样本的存储方式和抽取策略,根据经验样本重要性的不同,对样本进行分类存储和抽取.针对跟车过程的多目标结构,对奖赏函数进行模块化设计.最后,在仿真环境下进行测试,当测试环境和训练环境不同时,依然能顺利完成跟随任务,且性能优于已有跟随算法.

Abstract:

To meet the comfort requirements of the adaptive cruise system following mode and take into account vehicle safety and driving efficiency, and solve the problem of poor generalization and comfort of existing algorithms, a new multi-target vehicle following decision is proposed based on the deep deterministic policy gradient(DDPG). According to the mutual longitudinal kinematics of the following vehicle and the pilot vehicle, a Markov decision process(MDP) model of the vehicle following process is established. Combined with the minimum safety distance model, an efficient, comfortable and safe vehicle following decision algorithm is designed. In order to improve the model convergence speed, the storage method and extraction strategy of the DDPG algorithm's experience samples are improved, and the samples are classified and stored according to the importance of the experience samples. Aiming at the multi-objective structure of the following process, the reward function is modularized. Finally, the test is performed in the simulation environment. When the test environment and the training environment are different, the following tasks can be successfully completed, and the performance is better than the existing following algorithms.

参考文献

相似文献

引证文献

引用本文

邓小豪,侯进,谭光鸿,等.基于强化学习的多目标车辆跟随决策算法[J].控制与决策,2021,36(10):2497-2503

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2021-08-18
出版日期: 2021-10-20

首页

期刊简介

编委会

作者中心

精选专辑

品牌联动

引用本文

相关视频

分享

文章指标

历史

文章二维码