基于多起点和Mask策略的深度强化学习算法求解覆盖旅行商问题
CSTR:
作者:
作者单位:

1. 江南大学 江苏省人工智能国际合作联合实验室, 江苏 无锡 214122;2. 江南大学 江苏省模式识别与 计算智能工程实验室,江苏 无锡 214122;3. 中国船舶科学研究中心,江苏 无锡 214082

作者简介:

通讯作者:

E-mail: luhengyang@jiangnan.edu.cn.

中图分类号:

TP399

基金项目:

国家自然科学基金项目(62073155,62002137,62106088,62206113);船舶总体性能创新研究开放基金项目(22422213).


Deep reinforcement learning algorithm based on multi-start and Mask strategy for solving the covering salesman problem
Author:
Affiliation:

1. International Joint Laboratory on Artificial Intelligence of Jiangsu Province,Jiangnan University,Wuxi 214122,China;2. Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence,Jiangnan University,Wuxi 214122,China;3. China Ship Scientific Research Center,Wuxi 214082,China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    覆盖旅行商问题(covering salesman problem,CSP) 是旅行商问题的变体,在防灾规划、急救管理中有着广泛应用.由于传统方法求解问题实例耗时严重,近年来深度神经网络被提出用于解决该类组合优化问题,在求解速度和泛化性上有明显的优势.现有基于深度神经网络求解CSP的方法求解质量较低,特别在大规模实例上与传统的启发式方法相比存在较大差距.针对上述问题,提出一种新的基于深度强化学习求解CSP的方法,由编码器对输入特征进行编码,提出新的Mask策略对解码器使用自注意力机制构造解的过程进行约束,并提出多起点策略改善训练过程、提高求解质量.实验结果表明,所提方法对比现有基于深度神经网络的求解方法进一步缩小了最优间隙,同时有着更高的样本效率,在不同规模和不同覆盖类型的CSP中展现出更强的泛化能力,与启发式算法相比在求解速度上有10sim40倍的提升.

    Abstract:

    The covering salesman problem(CSP) is a variant of the traveling salesman problem, which is widely used in disaster planning and emergency management. Since the traditional solvers are time-consuming for solving problem instances, deep neural networks have been proposed for solving this type of combinatorial optimization problem in recent years, which have obvious advantages in terms of solving speed and generalization. However, existing methods based on deep neural networks for solving CSP problems have low solution quality, especially in large-scale instances, compared with traditional heuristics. Therefore, we propose a new method based on deep reinforcement learning to solve the CSP problem, in which the encoder encodes the input features, propose a new Mask strategy to constrain the decoder to construct a solution using the self-attention mechanism, and propose a multi-start strategy to improve the training process and the solution quality. Experimental results show that the proposed method further reduces the optimality gap compared with existing deep neural network-based solution methods, has higher sample efficiency, shows stronger generalization ability in CSP tasks of different sizes and coverage types, and has a 10-40 times improvement in solution speed compared with heuristic algorithms.

    参考文献
    相似文献
    引证文献
引用本文

方伟,接中冰,陆恒杨,等.基于多起点和Mask策略的深度强化学习算法求解覆盖旅行商问题[J].控制与决策,2024,39(4):1160-1166

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-03-15
  • 出版日期: 2024-04-20
文章二维码