基于非策略Q-学习的网络控制系统最优跟踪控制
作者:
作者单位:

(1. 沈阳化工大学信息工程学院,沈阳110142;2. 辽宁石油化工大学信息与控制工程学院,辽宁抚顺113001;3. 东北大学流程工业综合自动化国家重点实验室,沈阳110004)

作者简介:

通讯作者:

E-mail: lijinna_721@126.com.

中图分类号:

TP13

基金项目:

国家自然科学基金项目(61673280,61525302,61590922,61503257);辽宁省高等学校创新人才项目(LR20 17006);辽宁省自然基金计划重点领域联合开放基金项目(2019-KF-03-06);辽宁石油化工大学基金项目(2018XJJ-005).


Off-policy Q-learning: Optimal tracking control for networked control systems
Author:
Affiliation:

(1. College of Information Engineering,Shenyang University of Chemical Technology,Shenyang110142,China;2. School of Information and Control Engineering, Liaoning Shihua University,Fushun113001,China;3. State Key Lab of Synthetical Automation for Process Industries,Northeastern University,Shenyang110004,China)

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对具有数据包丢失的网络化控制系统跟踪控制问题,提出一种非策略Q-学习方法,完全利用可测数据,在系统模型参数未知并且网络通信存在数据丢失的情况下,实现系统以近似最优的方式跟踪目标.首先,刻画具有数据包丢失的网络控制系统,提出线性离散网络控制系统跟踪控制问题;然后,设计一个Smith预测器补偿数据包丢失对网络控制系统性能的影响,构建具有数据包丢失补偿的网络控制系统最优跟踪控制问题;最后,融合动态规划和强化学习方法,提出一种非策略Q-学习算法.算法的优点是:不要求系统模型参数已知,利用网络控制系统可测数据,学习基于预测器状态反馈的最优跟踪控制策略;并且该算法能够保证基于Q-函数的迭代Bellman方程解的无偏性.通过仿真验证所提方法的有效性.

    Abstract:

    This paper develops a novel off-policy Q-learning method for solving linear quadratic tracking (LQT) problem in discrete-time networked control systems with packet dropout. The proposed method can be implemented using measured data without requiring systems dynamics to be known a priori, and it also allows bounded packet loss. First, networked control systems with packet dropout are established, thus an optimal tracking problem of linear discrete-time networked control systems is further formulated. Then, a Smith predictor is designed to predict current state based on historical data measured on the communication network. On this basis, an optimal tracking problem with packet dropout compensation is put up. Finally, a novel off-policy Q-learning algorithm is developed by integrating dynamic programming with reinforcement learning. The merit of the proposed algorithm is that the optimal tracking control law based predicted states of systems can be learned using only measured data without the need of knowing system dynamics. Moreover, the unbiasedness of solution to Q-function based Bellman equation can be guaranteed by using off-policy Q-learning approach. The simulation results show that the proposed method has good tracking performance for the network control system with unknown dynamic state and packet dropout.

    参考文献
    相似文献
    引证文献
引用本文

李金娜,尹子轩.基于非策略Q-学习的网络控制系统最优跟踪控制[J].控制与决策,2019,34(11):2343-2349

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2019-10-30
  • 出版日期: