带有二维装箱约束车辆路径问题的知识驱动强化学习求解
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP18

基金项目:

国家自然科学基金面上项目(62573056).


Knowledge-driven reinforcement learning method for solving capacitated vehicle routing problem with two-dimensional loading constraints
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    物流配送效率及其成本优化是制造业供应链管理的核心挑战之一, 相关问题常建模为车辆路径规划问题. 易碎家电等货物在物流运输中无法堆叠, 需在车厢中平铺, 针对这一实际约束, 考虑在传统车辆路径规划模型基础上增加货物的二维装载约束, 形成带有二维装箱约束的车辆路径问题(2L-CVRP). 该问题包含路径规划与二维装箱两个子问题, 存在强约束、多极组合优化的特性. 传统精确算法及启发式方法在其大规模问题求解上存在耗时长、效率低的局限, 难以应对客户位置、需求即时变化的动态需求. 针对上述快速求解挑战, 设计一种基于强化学习及变邻域搜索协同的知识驱动强化学习求解算法, 优化2L-CVRP的车辆行驶距离. 首先, 以车辆行驶距离为奖励设计基于注意力机制与指针网络的Actor-Critic强化学习框架, 在此框架下采用多种启发式算法协同处理装箱约束, 改进不可行解, 生成车辆初始路径; 然后, 设计一种高效的问题知识驱动的变邻域搜索策略, 改进端到端网络得到的初始路径序列; 最后, 基于经典2L-CVRP测试集验证所提出算法的有效性. 仿真实验表明, 相比经典启发式方法, 所提出算法在小规模实例上车辆行驶距离减少21.52%, 并更新50%的大规模实例最优解. 同时, 所提出算法的求解速度显著优于对比算法, 大规模测例中求解效率优势更加明显, 验证了所提出算法求解2L-CVRP的高效性.

    Abstract:

    Logistics distribution efficiency and cost optimization are among the core challenges in manufacturing supply chain management, with related problems often modeled as vehicle routing problems. For fragile goods such as home appliances, which cannot be stacked and must be laid flat during transportation, this practical constraint is incorporated by adding two-dimensional loading constraints to the traditional vehicle routing model, forming the capacitated vehicle routing problem with two-dimensional loading constraints (2L-CVRP). This problem integrates both route planning and two-dimensional packing subproblems, characterized by strong constraints and multi-extreme combinatorial optimization. Traditional exact algorithms and heuristic methods face limitations in solving large-scale instances due to high time consumption and low efficiency, making them inadequate for dynamic demands with real-time changes in customer locations and requirements.To address these rapid-solving challenges, this paper designs a knowledge-driven reinforcement learning algorithm based on the collaboration of reinforcement learning and variable neighborhood search, aiming to optimize the total travel distance in the 2L-CVRP. First, an Actor-Critic reinforcement learning framework based on attention mechanisms and pointer networks is developed, using travel distance as the reward. Within this framework, multiple heuristic algorithms are employed to handle packing constraints and improve infeasible solutions, generating initial vehicle routes. Subsequently, an efficient problem-knowledge-driven variable neighborhood search strategy is designed to refine the initial route sequences obtained from the end-to-end network. In terms of simulation experiments, the proposed algorithm is validated on classical 2L-CVRP benchmark sets. Experimental results demonstrate that compared to classical heuristic methods, the proposed algorithm reduces the travel distance by 21.52% on small-scale instances and updates the best-known solutions for 50% of large-scale instances. Moreover, the proposed algorithm significantly outperforms comparative algorithms in solving speed, with advantages becoming more pronounced in large-scale cases, verifying its high efficiency in solving the 2L-CVRP.

    参考文献
    相似文献
    引证文献
引用本文

周梦,王境琦,吴楚格,等.带有二维装箱约束车辆路径问题的知识驱动强化学习求解[J].控制与决策,2026,41(4):931-943

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-08-29
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2026-03-24
  • 出版日期: 2026-04-10
文章二维码