基于径向基神经网络的多步Sarsa控制算法
CSTR:
作者:
作者单位:

1. 河南科技大学 信息工程学院,河南 洛阳 471023;2. 东北大学 机器人科学与工程学院,沈阳 110169

作者简介:

通讯作者:

E-mail: pjx@haust.edu.cn.

中图分类号:

TP181

基金项目:

航空科学基金项目(20185142003);国家国防基础研究计划项目(JCKY2018419C001);河南省高等学校重点科研项目(20A120008);河南省自然科学基金项目(202300410149).


Multi-step Sarsa control algorithm based on RBF neural network
Author:
Affiliation:

1. College of Information Science and Engineering,Henan University of Science and Technology,Luoyang 471023,China;2. Faculty of Robot Science and Engineering,Northeastern University,Shenyang 110169,China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对具有连续状态空间的无模型非线性系统,提出一种基于径向基(radial basis function,RBF)神经网络的多步强化学习控制算法.首先,将神经网络引入强化学习系统,利用RBF神经网络的函数逼近功能近似表示状态-动作值函数,解决连续状态空间表达问题;然后,结合资格迹机制形成多步Sarsa算法,通过记录经历过的状态提高系统的学习效率;最后,采用温度参数衰减的方式改进softmax策略,优化动作的选择概率,达到平衡探索和利用关系的目的.MountainCar任务的仿真实验表明:所提出算法经过少量训练能够有效实现无模型情况下的连续非线性系统控制;与单步算法相比,该算法完成任务所用的平均收敛步数更少,效果更稳定,表明非线性值函数近似与多步算法结合在控制任务中同样可以具有良好的性能.

    Abstract:

    For a model-free nonlinear system with continuous state space, a multi-step reinforcement learning control algorithm based on the RBF neural network is proposed. Firstly, the neural network is introduced to a reinforcement learning system for approximating the state-action value function, which is a common solution to the problem of continuous state space expression in reinforcement learning. Then, combined with the eligibility trace mechanism,multi-step algorithm Sarsa($\lambda$) is formed to improve the learning efficiency of the system by recording the experienced states. Finally, the softmax strategy is improved by decayed temperature parameters, so as to optimize the selection probability of actions and balance the relationship between exploration and exploitation. The simulation results of the MountainCar task show that the proposed algorithm can effectively achieve the model-free control task of the continuous nonlinear system through fewer times of training. Compared with the single-step algorithm, the multi-step algorithm takes less average convergent steps to complete the task and perform more stable, which proves that the combination of nonlinear value function approximation and the multi-step algorithm has good performance in the control task.

    参考文献
    相似文献
    引证文献
引用本文

司彦娜,普杰信,于晓升,等.基于径向基神经网络的多步Sarsa控制算法[J].控制与决策,2023,38(4):944-950

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2023-03-22
  • 出版日期: 2023-04-20
文章二维码