强化学习与自适应动态规划:从基础理论到多智能体系统中的应用进展综述
CSTR:
作者:
作者单位:

1. 东南大学 系统科学系,南京 211189;2. 东北大学 流程工业综合自动化国家重点实验室,沈阳 110819;3. 北京理工大学 前沿交叉科学研究院,北京 100081

作者简介:

通讯作者:

E-mail: yangtao@mail.neu.edu.cn.

中图分类号:

TP273

基金项目:

国家自然科学基金项目(U22B2046,62073079,62088101,62133003,61991403,62173085,62003167);装备预研教育部联合基金项目(8091B022114).


Reinforcement learning and adaptive/approximate dynamic programming: A survey from theory to applications in multi-agent systems
Author:
Affiliation:

1. Department of Systems Science,Southeast University,Nanjing 211189,China;2. State Key Laboratory of Synthetical Automation for Process Industries,Northeastern University,Shenyang 110819,China;3. Advanced Research Institute of Multidisciplinary Sciences,Beijing Institute of Technology,Beijing 100081,China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    近年来,强化学习与自适应动态规划算法的迅猛发展及其在一系列挑战性问题(如大规模多智能体系统优化决策和最优协调控制问题)中的成功应用,使其逐渐成为人工智能、系统与控制和应用数学等领域的研究热点.鉴于此,首先简要介绍强化学习和自适应动态规划算法的基础知识和核心思想,在此基础上综述两{类

    Abstract:

    Reinforcement learning(RL) and adaptive/approximate dynamic programming(ADP) algorithms have recently received much attention from various scientific fields (e.g., artificial intelligence, systems and control, and applied mathematics). This is partly due to their successful applications in a series of challenging problems, such as the sequential decision and optimal coordination control problems of large-scale multi-agent systems. In this paper, some preliminaries on RL and ADP algorithms are firstly introduced, and then the developments of these two closely related algorithms in different research fields are reviewed respectively, with emphasis on the developments from solving the sequential decision (optimal control) problem for single agent (control plant) to the sequential decision (optimal coordination control) problem of multi-agent systems by utilizing these two algorithms. Furthermore, after briefly surveying the structure evolution of the ADP algorithm in the last decades and the recent development of the ADP algorithm from model-based offline programming framework to model-free online learning framework, the research progress of the ADP algorithm in solving the optimal coordination control problem of multi-agent systems is reviewed. Finally, some interesting yet challenging issues on MARL algorithms and using ADP algorithms to solve optimal coordination control problem of multi-agent systems are suggested.

    参考文献
    相似文献
    引证文献
引用本文

温广辉,杨涛,周佳玲,等.强化学习与自适应动态规划:从基础理论到多智能体系统中的应用进展综述[J].控制与决策,2023,38(5):1200-1230

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2023-04-18
  • 出版日期: 2023-05-20
文章二维码