基于多动作并行异步深度确定性策略梯度的选矿运行指标决策方法

doi:10.13195/j.kzyjc.2020.1063

首页 > 过刊浏览>2022年第37卷第8期 >1989-1996. DOI:10.13195/j.kzyjc.2020.1063

基于多动作并行异步深度确定性策略梯度的选矿运行指标决策方法
DOI:
                        10.13195/j.kzyjc.2020.1063
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:东北大学 流程工业综合自动化国家重点实验室,沈阳 110004
作者简介:
通讯作者:E-mail: jlding@mail.neu.edu.cn.
中图分类号:TP18
基金项目:国家重点研发计划课题(2018YFB1701104)；辽宁省科技技术项目(2020JH1/10100008).

Multi-action parallel asynchronous depth deterministic strategy gradient based decision-making approach of operational indices for mineral processing

Author:

Affiliation:

State Key Laboratory of Synthetical Automation for Process Industries,Northeastern University,Shenyang 110004,China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

为了解决深度确定性策略梯度算法探索能力不足的问题,提出一种多动作并行异步深度确定性策略梯度(MPADDPG)算法,并用于选矿运行指标强化学习决策.该算法使用多个actor网络,进行不同的初始化和训练,不同程度地提升了探索能力,同时通过扩展具有确定性策略梯度结构的评论家体系,揭示了探索与利用之间的关系.该算法使用多个DDPG代替单一DDPG,可以减轻一个DDPG性能不佳的影响,提高学习稳定性;同时通过使用并行异步结构,提高数据利用效率,加快了网络收敛速度;最后,actor通过影响critic的更新而得到更好的策略梯度.通过选矿过程运行指标决策的实验结果验证了所提出算法的有效性.

Abstract:

In order to solve the problem of insufficient exploration ability of the deep deterministic strategy gradient algorithm, a multi-action parallel asynchronous deep deterministic policy gradient(DDPG) algorithm is proposed for the decision-making approach of operational indices in mineral processing based on reinforcement learning. This algorithm uses multiple actor networks for different initialization and training, which greatly increases the exploration to different degrees. The relationship between exploration and utilization is revealed by extending the critical architecture of deterministic selection policy. This algorithm uses multiple DDPGs instead of a single DDPG, which can alleviate the poor performance of one DDPG and improve the learning stability. And it also improves the data utilization efficiency and speeds up the network convergence by using parallel asynchronous structure. Finally, the actor gets better strategy gradient by influencing critic's update. The effectiveness of the proposed approach has been verified by experiment results on decision-making of the operational indices in mineral processing.

参考文献

相似文献

引证文献

引用本文

李悄然,丁进良.基于多动作并行异步深度确定性策略梯度的选矿运行指标决策方法[J].控制与决策,2022,37(8):1989-1996

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2022-06-29
出版日期: 2022-08-20

首页

期刊简介

编委会

作者中心

精选专辑

品牌联动

引用本文

相关视频

分享

文章指标

历史

文章二维码