基于Petri网与多智能体深度强化学习的AGV路径规划

doi:10.13195/j.kzyjc.2023.1796

首页 > 过刊浏览>年第0卷第期 >. DOI:10.13195/j.kzyjc.2023.1796

基于Petri网与多智能体深度强化学习的AGV路径规划
DOI:
                        10.13195/j.kzyjc.2023.1796
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:杭州电子科技大学
作者简介:
通讯作者:
中图分类号:TH165+.1;TP23;TP18
基金项目:此项工作得到国家自然科学基金（项目批准号:62073107）和 浙江省自然科学基金重点项目（项目批准号: Z21F030002）资助.

AGV Path Planning based on Petri Net and Multi-Agent Deep Reinforcement Learning

Author:

Affiliation:

Hangzhou Dianzi University

Fund Project:

Supported by the National Natural Science Foundation of China (62073107) and the Natural Science Foundation of Zhejiang Province (Z21F030002).

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

在无人仓库系统中，解决自动导引车（AGV）之间的碰撞、死锁以及路径规划问题至关重要。本文提出一种用Petri网对仓库环境中AGV系统进行建模的方法，有效解决了AGV运输货物时产生冲突的问题。在此基础上，提出一种多智能体深度强化学习AGV路径规划框架，视AGV路径规划问题为部分可观测马尔科夫决策过程，将深度确定性策略梯度算法扩展至多智能体系统，通过设计AGV的观测空间、状态空间、动作空间以及奖励函数实现Petri网中AGV无冲突路径规划。由于在设置奖励函数时加入了Petri网触发条件后的反馈，极大程度减少了AGV运输货物时拥塞的产生，增加了仓库在规定时间内的送货总量。此外，由于本文提出的框架将路径分支点设置为智能体，有效地应对了多个任务起点随机产生以及环境中AGV数量时刻变化的情况，提升了神经网络泛化能力。仿真实验在AnyLogic软件平台中进行，通过对比不同AGV规模下的货物运输情况以及奖励函数中有无Petri网条件正负反馈的对照实验，验证了该路径规划方法的可行性和有效性。

Abstract:

In unmanned warehouse systems, it is important to solve the collision, deadlock and path planning problems between automated guided vehicles (AGVs). This paper presents a method of modeling AGV system in warehouse environment with Petri net, which effectively solves the problem of conflict when AGV transports goods. On this basis, a multi-agent deep reinforcement learning AGV path planning framework is proposed. The AGV path planning problem is regarded as a partially observable Markov decision process, and the deep deterministic policy gradient algorithm is extended to multi-agent systems. Observation space, state space, action space and reward function of AGV are designed to realize AGV conflict-free path planning in Petri Net. Due to the addition of feedback after Petri Net trigger condition when setting the reward function, the congestion generated when AGV transported goods is greatly reduced, and the total amount of delivery in the warehouse is increased within the specified time. In addition, because the proposed framework sets the path branch points as agents, it can effectively cope with the random generation of multiple task starts and the change of the number of AGVs in the environment, and improve the generalization ability of neural networks. Simulation experiments were carried out on AnyLogic software platform. The feasibility and effectiveness of the path planning method were verified by comparing the cargo transportation situation under different AGV scales and the control experiments with or without Petri Net condition positive and negative feedback in the reward function.

参考文献

相似文献

引证文献

引用本文

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2023-12-29
最后修改日期:2024-10-19
录用日期:2024-10-24
在线发布日期: 2024-11-14
出版日期:

首页

期刊简介

编委会

作者中心

精选专辑

品牌联动

引用本文

分享

文章指标

历史

文章二维码