基于非策略Q-learning的欺骗攻击下未知线性离散系统最优跟踪控制

doi:10.13195/j.kzyjc.2024.0830

首页 > 过刊浏览>2025年第40卷第5期 >1641-1650. DOI:10.13195/j.kzyjc.2024.0830

基于非策略Q-learning的欺骗攻击下未知线性离散系统最优跟踪控制
DOI:
                        10.13195/j.kzyjc.2024.0830
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:TP273
基金项目:安徽省科技重大专项项目(202103a05020001).

Based on off-policy Q-learning: Optimal tracking control for unknown linear discrete-time systems under deception attacks

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对多重欺骗攻击下动力学信息未知的线性离散系统, 提出一种非策略Q-learning算法解决系统的最优跟踪控制问题. 首先, 考虑加入一个权重矩阵建立控制器通信信道遭受多重欺骗攻击的输入模型, 并结合参考命令生成器构建增广跟踪系统. 在线性二次跟踪框架内将系统的最优跟踪控制表达为欺骗攻击与控制输入同时参与的零和博弈问题. 其次, 设计一种基于状态数据的非策略Q-learning算法学习系统最优跟踪控制增益, 解决应用中控制增益不能按照给定要求更新的问题, 并证明在满足持续激励条件的探测噪声下该算法的求解不存在偏差. 同时考虑系统状态不可测的情况, 设计基于输出数据的非策略Q-learning算法. 最后, 通过对F-16飞机自动驾驶仪的跟踪控制仿真, 验证所设计非策略Q-learning算法的有效性以及对探测噪声影响的无偏性.

Abstract:

An off-policy Q-learning algorithm is proposed to solve the optimal tracking control problem for the linear discrete-time system with unknown dynamics information under multiple deception attack. Firstly, a weight matrix is added to establish the input model of multiple deception attacks on the controller communication channel, and an augmented tracking system is constructed with a reference command generator. In the framework of linear quadratic tracking, the optimal tracking control of the system is expressed as a zero-sum game problem between deception attacks and control inputs. Then, an off-policy Q-learning algorithm based on state data is designed to learn the optimal tracking control gain of the system, which solves the problem that the control gain is difficult to update according to the given requirements in applications. It is proved that the solution of the algorithm has no deviation under the probe noise satisfying the persistence of excitation condition. At the same time, considering the situation that the system state cannot be measured, an off-policy Q-learning algorithm based on output data is designed. Finally, through the tracking control simulation of F-16 aircraft autopilot, the effectiveness of the designed off-policy Q-learning algorithm and the unbiasedness effect on detection noise are verified.

参考文献

相似文献

引证文献

引用本文

宋星星,储昭碧.基于非策略Q-learning的欺骗攻击下未知线性离散系统最优跟踪控制[J].控制与决策,2025,40(5):1641-1650

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2024-07-11
最后修改日期:
录用日期:
在线发布日期: 2025-04-15
出版日期: 2025-05-20

首页

期刊简介

编委会

作者中心

精选专辑

品牌联动

引用本文

相关视频

分享

文章指标

历史

文章二维码