基于径向基神经网络的多步Sarsa控制算法

doi:10.13195/j.kzyjc.2021.1728

首页 > 过刊浏览>年第0卷第4期 >944-950. DOI:10.13195/j.kzyjc.2021.1728

基于径向基神经网络的多步Sarsa控制算法
DOI:
                        10.13195/j.kzyjc.2021.1728
                    
作者:
                        
                        
                    
作者单位:1.河南科技大学;2.东北大学
作者简介:
通讯作者:
中图分类号:TP181
基金项目:航空科学基金资助项目(20185142003); 国家国防基础科学研究计划(No.JCKY2018419C001);河南省高等学校重点科研项目(20A120008); 河南省自然科学基金(202300410149)

Multi-Step Sarsa Control Algorithm Based on RBF Neural Network

Author:

Affiliation:

1.Henan University of Science and Technology;2.Northeastern University

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对具有连续状态空间的无模型非线性系统，提出了一种基于径向基(Radial Basis Function, RBF)神经网络的多步强化学习控制算法。首先，将神经网络引入强化学习系统，利用RBF神经网络的函数逼近功能近似表示状态-动作值函数，解决了连续状态空间表达问题。然后，结合资格迹机制，形成多步Sarsa算法，通过记录经历过的状态，提高系统的学习效率。最后，采用温度参数衰减的方式改进softmax策略，优化动作的选择概率，达到平衡探索和利用关系的目的。MountainCar任务的仿真实验表明，所提出的算法经过少量训练，能够有效地实现无模型情况下的连续非线性系统控制。与单步算法相比，该算法完成任务所用的平均收敛步数更少，效果更稳定，证明非线性值函数近似与多步算法结合在控制任务中同样可以具有良好的性能。

Abstract:

For model-free nonlinear system with continuous state space, a multi-step reinforcement learning control algorithm based on RBF neural network is proposed in this paper. Firstly, the neural network is introduced to reinforcement learning system for approximating state-action value function, which is a common solution to the problem of continuous state space expression in reinforcement learning. Then, combined with the eligibility trace mechanism，multi-step algorithm Sarsa($\lambda$) is formed to improve the learning efficiency of the system by recording the experienced states. Finally, softmax strategy is improved by decayed temperature parameter, so as to optimize the selection probability of actions and balance the relationship between exploration and exploitation. The simulation results of MountainCar task show that the proposed algorithm can effectively achieve the model-free control task of continuous nonlinear system through fewer times of training. Compared with the single-step algorithm, the multi-step algorithm takes less average convergent steps to complete the task and perform more stable, which proves that the combination of nonlinear value function approximation and multi-step algorithm has good performance in the control task.

参考文献

相似文献

引证文献

引用本文

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2021-10-09
最后修改日期:2022-09-21
录用日期:2021-12-30
在线发布日期: 2022-02-01
出版日期:

首页

期刊简介

编委会

作者中心

精选专辑

品牌联动

引用本文

分享

文章指标

历史