融合认知行为模型的深度强化学习框架及算法
CSTR:
作者:
作者单位:

国防科技大学 智能科学学院,长沙 410073

作者简介:

通讯作者:

E-mail: nudtjhuang@hotmail.com.

中图分类号:

TP183

基金项目:

国家自然科学基金项目(61906202).


Deep reinforcement learning framework and algorithms integrated with cognitive behavior models
Author:
Affiliation:

College of Intelligence Science and Technology,National University of Defense Technology,Changsha 410073,China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    面对高维连续状态空间或稀疏奖励等复杂任务时,仅依靠深度强化学习算法从零学习最优策略十分困难,如何将已有知识表示为人与学习型智能体之间相互可理解的形式,并有效地加速策略收敛仍是一个难题.对此,提出一种融合认知行为模型的深度强化学习框架,将领域内先验知识建模为基于信念-愿望-意图(belief- desire-intention, BDI)的认知行为模型,用于引导智能体策略学习.基于此框架,分别提出融合认知行为模型的深度Q学习算法和近端策略优化算法,并定量化设计认知行为模型对智能体策略更新的引导方式.最后,通过典型gym环境和空战机动决策对抗环境,验证所提出算法可以高效利用认知行为模型加速策略学习,有效缓解状态空间巨大和环境奖励稀疏的影响.

    Abstract:

    When facing complex tasks with high-dimensional continuous state-space or sparse rewards, it is difficult for a reinforcement learning agent to learn an optimal policy from scratch. How to represent the known knowledge in a form understandable by human beings and the learning agent, and effectively accelerate policy convergence is still a difficult problem. Therefore, this paper proposes a deep reinforcement learning(DRL) framework integrating with cognitive behavior models. It represents prior knowledge as belief-desire-intention(BDI) based cognitive behavior models, which are used to guide policy learning in the DRL. Besides, we introduce the deep Q-learning algorithm with the cognitive behavior model(COG-DQN) and the proximal policy optimization algorithm with the cognitive behavior model(COG-PPO) based on the proposed framework. Moreover, we quantitatively design the guidance strategies of the cognitive behavior model to policy update. Finally, in a typical gym environment and an air combat maneuver confrontation environment, we verify that the proposed algorithms can efficiently use the cognitive behavior model to accelerate policy learning, and significantly alleviate the impact of high-dimensional state-space and sparse rewards.

    参考文献
    相似文献
    引证文献
引用本文

陈浩,李嘉祥,黄健,等.融合认知行为模型的深度强化学习框架及算法[J].控制与决策,2023,38(11):3209-3218

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2023-10-08
  • 出版日期: 2023-11-20
文章二维码