Abstract:When solving the sequential decision-making problems in continuous spaces, the traditional actor-critic(AC) methods are often difficult to get good convergence speed and quality. To overcome the above weakness, an AC algorithm framework, which uses a Gaussian distribution as the policy distribution, is proposed based on the symmetric perturbation sampling. At each time step, the framework generates two actions through two symmetric perturbations on the current action mean, and takes them to interact with the environment in parallel. Then, the framework selects the Agent’s behavior action
and updates the value-function parameters based on the maximum temporal difference(TD) error, and updates the policy parameters based on the average regular gradient or the average incremental natural gradient. The theoretical analysis and simulation results show that the framework not only has a better convergence performance, but also has a high computational efficiency.