基于非策略Q-learning的欺骗攻击下未知线性离散系统最优跟踪控制
CSTR:
作者:
作者单位:

合肥工业大学

作者简介:

通讯作者:

中图分类号:

TP273

基金项目:

安徽省重大专项基金(202103a05020001)


Based on off-policy Q-learning: optimal tracking control for unknown linear discrete-time systems under deception attacks
Author:
Affiliation:

Hefei University of Technology

Fund Project:

Major Science and Technology Projects in Anhui Province(202103a05020001)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    本文针对欺骗攻击下动力学信息未知的线性离散系统, 提出一种非策略Q-learning算法解决系统的最优跟踪控制问题. 首先, 根据欺骗攻击的特点建立控制器通信信道遭受攻击的模型, 结合参考命令生成器构建增广跟踪系统. 在线性二次跟踪框架内将系统的最优跟踪控制表达为欺骗攻击与控制输入同时参与的零和博弈问题. 其次, 设计一种基于状态数据的非策略Q-learning算法学习系统最优跟踪控制增益, 解决了应用中控制增益不能按照给定要求更新的问题, 并证明在满足持续激励条件的探测噪声下该算法的求解不存在偏差. 同时考虑系统状态不可测的情况, 设计了基于输出数据的非策略Q-learning算法. 最后, 通过对F-16飞机自动驾驶仪的跟踪控制仿真, 验证了所设计非策略Q-learning算法的有效性以及对探测噪声影响的无偏性.

    Abstract:

    In this paper, an off-policy Q-learning algorithm is proposed to solve the optimal tracking control problem for the linear discrete-time system with unknown dynamics information under deception attack. Firstly, according to the characteristics of deception attack, the model of controller communication channel under attack is established, and the augmented tracking system is constructed with reference command generator. In the framework of linear quadratic tracking, the optimal tracking control of the system is expressed as a zero-sum game problem between deception attacks and control inputs. Secondly, an off-policy Q-learning algorithm based on state data is designed to learn the optimal tracking control gain of the system, which solves the problem that the control gain is difficult to update according to the given requirements in applications. It is proved that the algorithm has no deviation in solving under the probe noise satisfying the persistence of excitation condition. At the same time, considering the situation that the system state cannot be measured, an off-policy Q-learning algorithm based on output data is designed. Finally, through the tracking control simulation of F-16 aircraft autopilot, the effectiveness of the designed off-policy Q-learning algorithm and the unbiasedness effect on detection noise are verified.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-07-11
  • 最后修改日期:2024-09-24
  • 录用日期:2024-09-25
  • 在线发布日期: 2024-10-17
  • 出版日期:
文章二维码