基于蒙特卡洛Q值函数的多智能体决策方法
CSTR:
作者:
作者单位:

(1. 中国人民解放军战略支援部队航天工程大学,北京101416;2. 中国人民解放军63628部队,河北三河065201;3. 中国人民解放军63919部队,北京100089)

作者简介:

通讯作者:

E-mail: haitaoyang79@126.com.

中图分类号:

TP301.6

基金项目:


Multi-agent decision making using Monte Carlo Q-value function
Author:
Affiliation:

(1. Space Engineering University,PAL Strategic Support Force,Beijing101416,China;2. The 63628 Army of PLA,Sanhe065201,China;3. The 63919 Army of PLA,Beijing100089,China)

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    多智能体决策问题是人工智能领域的研究热点.与单智能体决策问题相比,多智能体决策的策略搜索空间更大.分布式局部感知马尔可夫决策过程(Dec-POMDPs)建立了不确定环境下多智能体决策问题的通用模型,自提出以来受到很大关注,但是求解Dec-POMDPs问题计算复杂度高,内存占用大.基于此,提出一种新的Q值函数表示-----蒙特卡洛Q值函数$(Q_MC)$,并从理论上证明$Q_MC$是最优Q值函数$Q^\ast$的上界,能够保证启发式搜索到最优解;运用自适应抽样方法,平衡收敛准确性和求解时间的关系;结合启发式搜索的精确性和蒙特卡洛方法随机抽样的一般性,提出一种基于$Q_MC$的蒙特卡洛聚类/扩展算法(CEMC),CEMC整合了Q值函数求解和策略搜索过程,避免保存所有值函数,只按需求解.实验结果表明,CEMC在时间和内存占用上超过目前性能最好的使用紧凑Q值函数的启发式方法.

    Abstract:

    Multi-agent decision making problems are very popular in artificial intelligence. Compared with single agent decision making problems, multi-agent decision making problems have larger policy space. Decentralized partially observable Markov decision processes(Dec-POMDPs) are general models for multi-agent decision making under uncertainty, which have caught much attention among researchers. Solving Dec-POMDPs has high computational complexity and takes much memory. This article presents a new Q-value function representation --- Monte Carlo Q-value function$(Q_MC)$, which is proved to be the upper bound of $Q^*$. This guarantees that the optimal policy can be found. An adaptive sampling method is used to balance the precision of convergence and solving time. And an algorighm called clustering and expansion for Monte Carlo(CEMC) based on $Q_MC$ is proposed, which combines the precision of heuristic search with the generality of Monte Carlo random sampling. This algorithm integrates Q-value function solving with policy search and calculates value functions as needed, which avoids the need to backup all Q-value functions. The experiments show that the proposed method outperforms the state-of-the-art heuristic methods, with the compact Q-value function.

    参考文献
    相似文献
    引证文献
引用本文

张健,潘耀宗,杨海涛,等.基于蒙特卡洛Q值函数的多智能体决策方法[J].控制与决策,2020,35(3):637-644

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2020-02-22
  • 出版日期:
文章二维码