Zhe Jiang University of Technology
The National Natural Science Foundation of China (Program No.61873240)
多配送中心车辆路径规划（Multi-Depot Vehicle Routing Problem, MDVRP）是现阶段供应链应用较为广泛的问题模型，现有算法多采用启发式方法，其求解速度慢且无法保证解的质量，因此研究快速且有效的求解算法具有重要的学术意义以及应用价值. 以最小化总车辆路径距离为目标，提出了一种基于多智能体深度强化学习的求解模型. 首先，定义了多配送中心车辆路径问题的多智能体强化学习形式，包括状态、动作、回报以及状态转移函数，使模型能够利用多智能体强化学习训练. 通过对MDVRP的节点邻居及遮掩机制的定义，基于注意力机制设计了由多个智能体网络构成的策略网络模型，并利用策略梯度算法进行训练以获得能够快速求解的模型. 然后，利用2-opt局部搜索策略和采样搜索策略改进解的质量. 通过对不同规模问题仿真实验和其他算法的对比，验证了所提出的多智能体深度强化学习模型及其与搜索策略的结合能够快速获得高质量的解.
Multi-Depot Vehicle Routing Problem (MDVRP) is widely used in the supply chain at present. Most of the existing algorithms use heuristic methods, which are slow to solve the problem and cannot guarantee the quality of the solution. It is of great academic significance and application value to study a fast and high-quality algorithm to solve the problem. With the goal of minimizing the total vehicle routing distance, a multi-agent deep reinforcement learning model is proposed. First, the form of multi-agent reinforcement learning for multi-depot vehicle routing problem is defined, including state, action, reward, and transition function. So that the model can be trained by multi-agent reinforcement learning. Through the definition of node neighbor and masking mechanism of MDVRP, a policy network composed of multi-agent networks based on attention mechanism is designed. And the policy gradient algorithm is used to train the model. Then, the 2-opt local search strategy and sampling search strategy are used to improve the solution. Through the comparison of the simulation experiments of different scale problems with other algorithms, it is verified that the proposed multi-agent deep reinforcement learning model and its combination with search strategy can obtain high-quality solutions within a short period.