引用本文:李金娜,尹子轩.基于非策略Q-学习的网络控制系统最优跟踪控制[J].控制与决策,2019,34(11):2343-2349
【打印本页】   【HTML】   【下载PDF全文】   查看/发表评论  【EndNote】   【RefMan】   【BibTex】 附件
←前一篇|后一篇→ 过刊浏览    高级检索
本文已被:浏览 35次   下载 53 本文二维码信息
码上扫一扫!
分享到: 微信 更多
基于非策略Q-学习的网络控制系统最优跟踪控制
李金娜1,2,3, 尹子轩1
(1. 沈阳化工大学信息工程学院,沈阳110142;2. 辽宁石油化工大学信息与控制工程学院,辽宁抚顺113001;3. 东北大学流程工业综合自动化国家重点实验室,沈阳110004)
摘要:
针对具有数据包丢失的网络化控制系统跟踪控制问题,提出一种非策略Q-学习方法,完全利用可测数据,在系统模型参数未知并且网络通信存在数据丢失的情况下,实现系统以近似最优的方式跟踪目标.首先,刻画具有数据包丢失的网络控制系统,提出线性离散网络控制系统跟踪控制问题;然后,设计一个Smith预测器补偿数据包丢失对网络控制系统性能的影响,构建具有数据包丢失补偿的网络控制系统最优跟踪控制问题;最后,融合动态规划和强化学习方法,提出一种非策略Q-学习算法.算法的优点是:不要求系统模型参数已知,利用网络控制系统可测数据,学习基于预测器状态反馈的最优跟踪控制策略;并且该算法能够保证基于Q-函数的迭代Bellman方程解的无偏性.通过仿真验证所提方法的有效性.
关键词:  网络控制  非策略Q-学习  线性二次跟踪(LQT)  数据包丢失
DOI:10.13195/j.kzyjc.2019.0417
分类号:TP13
基金项目:国家自然科学基金项目(61673280,61525302,61590922,61503257);辽宁省高等学校创新人才项目(LR20 17006);辽宁省自然基金计划重点领域联合开放基金项目(2019-KF-03-06);辽宁石油化工大学基金项目(2018XJJ-005).
Off-policy Q-learning: Optimal tracking control for networked control systems
LI Jin-na1,2,3,YIN Zi-xuan1
(1. College of Information Engineering,Shenyang University of Chemical Technology,Shenyang110142,China;2. School of Information and Control Engineering, Liaoning Shihua University,Fushun113001,China;3. State Key Lab of Synthetical Automation for Process Industries,Northeastern University,Shenyang110004,China)
Abstract:
This paper develops a novel off-policy Q-learning method for solving linear quadratic tracking (LQT) problem in discrete-time networked control systems with packet dropout. The proposed method can be implemented using measured data without requiring systems dynamics to be known a priori, and it also allows bounded packet loss. First, networked control systems with packet dropout are established, thus an optimal tracking problem of linear discrete-time networked control systems is further formulated. Then, a Smith predictor is designed to predict current state based on historical data measured on the communication network. On this basis, an optimal tracking problem with packet dropout compensation is put up. Finally, a novel off-policy Q-learning algorithm is developed by integrating dynamic programming with reinforcement learning. The merit of the proposed algorithm is that the optimal tracking control law based predicted states of systems can be learned using only measured data without the need of knowing system dynamics. Moreover, the unbiasedness of solution to Q-function based Bellman equation can be guaranteed by using off-policy Q-learning approach. The simulation results show that the proposed method has good tracking performance for the network control system with unknown dynamic state and packet dropout.
Key words:  networked control system  off-policy Q-learning  linear quadratic tracking  packet dropout

用微信扫一扫

用微信扫一扫