基于对称扰动采样的Actor-critic 算法

doi:10.13195/j.kzyjc.2014.1489

首页 > 过刊浏览>2015年第30卷第12期 >2161-2167. DOI:10.13195/j.kzyjc.2014.1489

基于对称扰动采样的Actor-critic 算法
DOI:
                        10.13195/j.kzyjc.2014.1489
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:1. 电子科技大学计算机科学与工程学院，成都611731;
 2. 海南大学信息科学技术学院，海口570228.
作者简介:张春元
通讯作者:
中图分类号:TP18
基金项目:
国家自然科学基金项目(61100118, 60671033)；海南省自然科学基金项目(613153).

Actor-critic algorithms based on symmetric perturbation sampling

Author:

Affiliation:

1. School of Computer Science and Engineering，University of Electronic Science and Technology of China, Chengdu 611731，China;
2. College of Information Science and Technology，Hainan University，Haikou 570228, China．

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对传统Actor-critic (AC) 方法在求解连续空间序贯决策问题时收敛速度较慢、收敛质量不高的问题, 提出一种基于对称扰动采样的AC算法框架. 首先, 框架采用高斯分布作为策略分布, 在每一时间步对当前动作均值对称扰动, 从而生成两个动作与环境并行交互; 然后, 基于两者的最大时域差分(TD) 误差选取Agent 的行为动作, 并对值函数参数进行更新; 最后, 基于两者的平均常规梯度或增量自然梯度对策略参数进行更新. 理论分析和仿真结果表明, 所提框架具有较好的收敛性和计算效率.

Abstract:

When solving the sequential decision-making problems in continuous spaces, the traditional actor-critic(AC) methods are often difficult to get good convergence speed and quality. To overcome the above weakness, an AC algorithm framework, which uses a Gaussian distribution as the policy distribution, is proposed based on the symmetric perturbation sampling. At each time step, the framework generates two actions through two symmetric perturbations on the current action mean, and takes them to interact with the environment in parallel. Then, the framework selects the Agent’s behavior action
and updates the value-function parameters based on the maximum temporal difference(TD) error, and updates the policy parameters based on the average regular gradient or the average incremental natural gradient. The theoretical analysis and simulation results show that the framework not only has a better convergence performance, but also has a high computational efficiency.

参考文献

相似文献

引证文献

引用本文

张春元朱清新.基于对称扰动采样的Actor-critic 算法[J].控制与决策,2015,30(12):2161-2167

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2014-09-27
最后修改日期:2015-01-07
录用日期:
在线发布日期: 2015-12-20
出版日期:

首页

期刊简介

编委会

作者中心

精选专辑

品牌联动

引用本文

相关视频

分享

文章指标

历史

文章二维码