Abstract:In this paper, an off-policy Q-learning algorithm is proposed to solve the optimal tracking control problem for the linear discrete-time system with unknown dynamics information under deception attack. Firstly, according to the characteristics of deception attack, the model of controller communication channel under attack is established, and the augmented tracking system is constructed with reference command generator. In the framework of linear quadratic tracking, the optimal tracking control of the system is expressed as a zero-sum game problem between deception attacks and control inputs. Secondly, an off-policy Q-learning algorithm based on state data is designed to learn the optimal tracking control gain of the system, which solves the problem that the control gain is difficult to update according to the given requirements in applications. It is proved that the algorithm has no deviation in solving under the probe noise satisfying the persistence of excitation condition. At the same time, considering the situation that the system state cannot be measured, an off-policy Q-learning algorithm based on output data is designed. Finally, through the tracking control simulation of F-16 aircraft autopilot, the effectiveness of the designed off-policy Q-learning algorithm and the unbiasedness effect on detection noise are verified.