强化学习驱动的进化算法求解多车覆盖配送问题
DOI:
CSTR:
作者:
作者单位:

1.湖南工程学院商学院;2.湖南工程学院纺织服装学院;3.北京工业大学信息科学技术学院;4.数字社区教育部工程研究中心;5.湖南工商大学工商管理学院

作者简介:

通讯作者:

中图分类号:

TP181; U491

基金项目:

国家社会科学基金年度项目(24GBL247),国家自然科学基金项目(62403272)


Reinforcement Learning-Driven Evolutionary Algorithm for Solving Multi-Vehicle Covering Delivery Problem
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    包裹储物柜有效解决了末端配送客户与快递员交接时间不匹配的问题.然而,随着包裹储物柜的大量部署,如何让客户更便利地使用储物柜进行交接,已成为影响末端配送服务满意度的关键因素之一.针对这一挑战,本文提出了考虑客户满意度的多车辆覆盖配送问题.在问题层面,本文运用问卷调查构建了客户满意度函数,进而建立该问题的数学模型;在方法层面,本文提出一种强化学习驱动的进化算法.首先,设计了混合启发式方法,用于生成高质量初始种群;其次,基于问题特性设计了9种邻域算子与贪婪修复启发式方法,用于高效搜索满意可行解;进而,提出了一种强化学习驱动的搜索机制,用于自适应选用合适算子;最后,设计了一种基于单调下降基准函数的状态表征方法,用于引导智能体学习算法的收敛进程.仿真实验结果表明,所提算法在不同规模问题上均获得高质量解,其求解性能优于对比算法和求解器;消融实验结果表明,状态函数引导的强化学习搜索机制平均提升算法性能达5%.

    Abstract:

    Parcel lockers effectively resolve the mismatched handover times between customers and couriers. However, with the widespread deployment of parcel lockers, selecting a locker convenient for customer handover has become one of the key factors influencing satisfaction in last-mile delivery services. To address this challenge, this paper proposes a multi-vehicle coverage delivery problem considering customer satisfaction. At the problem level, a questionnaire survey approach was employed to model the customer satisfaction, thereby establishing the mathematical model for this problem. At the methodological level, a reinforcement learning-driven evolutionary algorithm is proposed. Firstly, a hybrid heuristic method is designed to generate a high-quality initial population. Secondly, nine neighbourhood operators and a greedy repair heuristic are devised based on the problem's characteristics to efficiently search for satisfactory feasible solutions. Subsequently, a reinforcement learning-driven search mechanism is proposed to adaptively select suitable operators. Finally, a state representation method based on a monotonic descent benchmark function is designed to guide the convergence process of the agent learning algorithm. Simulation results demonstrate that the proposed algorithm achieves high-quality solutions across problems of varying scales, exhibiting superior computational performance compared to benchmark algorithms and solvers. Ablation experiments reveal that the state function-guided reinforcement learning search mechanism enhances algorithmic performance by an average of 5%.

    参考文献
    相似文献
    引证文献
引用本文
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2026-02-13
  • 最后修改日期:2026-04-22
  • 录用日期:2026-04-24
  • 在线发布日期:
  • 出版日期:
文章二维码