Abstract:In order to solve the problem of insufficient exploration ability of deep deterministic strategy gradient algorithm, this paper proposes a Multi-actions Parallel Asynchronous Deep Deterministic Policy Gradient algorithm for continuous control of deterministic strategy in reinforcement learning.This algorithm uses multiple actor networks for different initialization and training, which greatly increases the exploration to different degrees, at the same time, the relationship between exploration and utilization is revealed by extending the critical architecture of deterministic selection policy. This algorithm uses multiple DDPGS instead of a single DDPG, which can alleviate the poor performance of one DDPG, improve the learning stability, and improve the data utilization efficiency and speed up the network convergence by using parallel asynchronous structure, finally, actor gets better strategy gradient by influencing critical update.The results show that the performance of MPADDPG algorithm is improved compared with DDPG algorithm.