不依赖初始容许控制的非对称约束零和博弈智能评判设计
CSTR:
作者:
作者单位:

北京工业大学

作者简介:

通讯作者:

中图分类号:

TP273

基金项目:

国家自然科学基金项目(62222301, 61890930-5, 62021003); 新一代人工智能国家科技重大专项(2021ZD0112302, 2021ZD0112301); 北京市自然科学基金项目(JQ19013)


Intelligent critic design for asymmetric constrained zero-sum games without relying on initial admissible control
Author:
Affiliation:

Beijing University of Technology

Fund Project:

National Natural Science Foundation of China (62222301, 61890930-5, 62021003); National Science and Technology Major Project (2021ZD0112302, 2021ZD0112301); Beijing Natural Science Foundation (JQ19013)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    本文利用自适应评判控制方法研究了具有非对称约束的连续时间零和博弈问题. 首先, 建立了一种新颖的非二次型函数来处理非对称约束问题, 这放松了对控制矩阵的限制. 其次, 推导出了最优控制, 最坏扰动, 以及Hamilton-Jacobi-Isaacs方程. 然后, 建立了一种自适应评判控制方法来近似最优代价函数, 从而获得近似最优控制以及近似最坏扰动. 值得一提的是, 针对具有非对称约束的零和博弈问题, 本文提出了一种新型评判学习准则来强化学习过程并且消除对初始容许控制的依赖, 这在以往论文中是没有被考虑的. 此外, 利用Lyapunov方法证明了系统状态和评判网络权值近似误差的稳定性. 最后, 利用F-16战斗机和倒立摆两个实例来验证本文所提算法的有效性. 同时, 为了进行比较, 给出了传统学习算法下的仿真结果, 进一步说明本文所提新型学习准则的可行性.

    Abstract:

    In this paper, the continuous-time zero-sum game problem with asymmetric constraints is investigated by making use of the adaptive critic control approach. To begin with, a novel nonquadratic function is established to deal with the asymmetric constraint problem, which relaxes the restriction on the control matrix. Secondly, the optimal control, the worst disturbance, and the Hamilton-Jacobi-Isaacs equation are derived. After that, an adaptive critic control method is constructed to approximate the optimal cost function, so as to obtain the near-optimal control as well as the near-worst disturbance. It is worth mentioning that for the zero-sum game problem with asymmetric constraints, this paper proposes an innovative critic learning criterion to strengthen the learning process and eliminate the dependence on the initial admissible control, which has not been considered in previous papers. Moreover, the stability of the system state and the weight estimation error of the critic network is proved by using the Lyapunov method. In the end, the effectiveness of the algorithm proposed in this paper is verified by utilizing two examples, namely, the F-16 aircraft and the inverted pendulum. At the same time, for comparison, the simulation results under the traditional learning algorithm are provided to further illustrate the feasibility of the innovative learning criterion proposed in this paper.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-04-01
  • 最后修改日期:2024-09-25
  • 录用日期:2024-09-26
  • 在线发布日期: 2024-10-31
  • 出版日期:
文章二维码