基于深度强化学习的多配送中心车辆路径规划

doi:10.13195/j.kzyjc.2021.1381

首页 > 过刊浏览>2022年第37卷第8期 >2101-2109. DOI:10.13195/j.kzyjc.2021.1381

基于深度强化学习的多配送中心车辆路径规划
DOI:
                        10.13195/j.kzyjc.2021.1381
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:浙江工业大学 计算机科学与技术学院,杭州 310023
作者简介:
通讯作者:E-mail: zjutwwl@zjut.edu.cn.
中图分类号:TP18
基金项目:国家自然科学基金项目(61873240).

Deep reinforcement learning for multi-depot vehicle routing problem

Author:

Affiliation:

School of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

多配送中心车辆路径规划(multi-depot vehicle routing problem,MDVRP)是现阶段供应链应用较为广泛的问题模型,现有算法多采用启发式方法,其求解速度慢且无法保证解的质量,因此研究快速且有效的求解算法具有重要的学术意义和应用价值.以最小化总车辆路径距离为目标,提出一种基于多智能体深度强化学习的求解模型.首先,定义多配送中心车辆路径问题的多智能体强化学习形式,包括状态、动作、回报以及状态转移函数,使模型能够利用多智能体强化学习训练;然后通过对MDVRP的节点邻居及遮掩机制的定义,基于注意力机制设计由多个智能体网络构成的策略网络模型,并利用策略梯度算法进行训练以获得能够快速求解的模型;接着,利用2-opt局部搜索策略和采样搜索策略改进解的质量;最后,通过对不同规模问题仿真实验以及与其他算法进行对比,验证所提出的多智能体深度强化学习模型及其与搜索策略的结合能够快速获得高质量的解.

Abstract:

The multi-depot vehicle routing problem(MDVRP) is widely used in the supply chain at present. Most of the existing algorithms use heuristic methods, which are slow to solve the problem and cannot guarantee the quality of the solution. It is of great academic significance and application value to study a fast and high-quality algorithm to solve the problem. With the goal of minimizing the total vehicle routing distance, a multi-agent deep reinforcement learning model is proposed. Firstly, the form of multi-agent reinforcement learning for the multi-depot vehicle routing problem is defined, including state, action, reward, and transition function, so that the model can be trained by multi-agent reinforcement learning. Through the definition of node neighbor and the masking mechanism of the MDVRP, a policy network composed of multi-agent networks based on the attention mechanism is designed. And the policy gradient algorithm is used to train the model. Then, the 2-opt local search strategy and the sampling search strategy are used to improve the solution. Finally through the comparison of the simulation experiments of different scale problems with other algorithms, it is verified that the proposed multi-agent deep reinforcement learning model and its combination with the search strategy can obtain high-quality solutions within a short period.

参考文献

相似文献

引证文献

引用本文

王万良,陈浩立,李国庆,等.基于深度强化学习的多配送中心车辆路径规划[J].控制与决策,2022,37(8):2101-2109

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2022-06-29
出版日期: 2022-08-20

首页

期刊简介

编委会

作者中心

精选专辑

品牌联动

引用本文

分享

文章指标

历史

文章二维码