连续空间增量最近邻时域差分学习
CSTR:
作者:
作者单位:

1. 电子科技大学计算机科学与工程学院,成都611731;
2. 海南大学信息科学技术学院,海口570228.

作者简介:

张春元

通讯作者:

中图分类号:

TP18

基金项目:

国家自然科学基金项目(61100118, 60671033);海南省自然科学基金项目(613153).


Temporal difference learning with incremental nearest neighbors in continuous spaces
Author:
Affiliation:

1. School of Computer Science and Engineering,University of Electronic Science and Technology of China,Chengdu 611731,China;
2. College of Information Science and Technology,Hainan University,Haikou 570228,China.

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对连续空间强化学习问题, 提出一种基于局部加权学习的增量最近邻时域差分(TD) 学习框架. 通过增量方式在线选取部分已观测状态构建实例词典, 采用新观测状态的范围最近邻实例逼近其值函数与策略, 并结合TD 算法对词典中各实例的值函数和资格迹迭代更新. 就框架各主要组成部分给出多种设计方案, 并对其收敛性进行理论分析. 对24 种方案组合进行仿真验证的实验结果表明, SNDN组合具有较好的学习性能和计算效率.

    Abstract:

    Based on locally weighted learning, a temporal difference(TD) learning framework with incremental nearest neighbors is proposed for reinforcement learning problems in continuous spaces. The framework selects some observed states to construct an instance dictionary in increments, uses the range nearest neighbor instances of the new observed state to approximate its value function and policy, and combines with a TD algorithm to update the value function and eligibility trace of each instance in the dictionary iteratively. Some schemes are designed for each key component of the framework, and theoretical analyses are given for its convergence. Finally, twenty-four scheme combinations are verified by simulations, which show that the combination SNDN has better learning performance and computational efficiency.

    参考文献
    相似文献
    引证文献
引用本文

张春元 朱清新 钟声.连续空间增量最近邻时域差分学习[J].控制与决策,2014,29(12):2121-2128

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2013-10-23
  • 最后修改日期:2014-02-23
  • 录用日期:
  • 在线发布日期: 2014-12-20
  • 出版日期:
文章二维码