Abstract:The search and rescue optimization algorithm(SAR) proposed in 2020 is a meta-heuristic optimization algorithm. It simulates the search and rescue behavior, which is used to solve constrained engineering optimization problems. However, the SAR has slow convergence and its individuals can not adaptively select operations. A modifed version of the SAR based on reinforcement learning, namely RLSAR, is proposed, which redesigns the local search and global search of the SAR, and adds path adjustment operation. The asynchronous advanced actor critic algorithm(A3C) is used to train the reinforcement learning model so that the SAR individuals acquire the ability to adaptively select operators. All agents are trained in a dynamic environment in which the number, location and size of threat areas are randomly generated, and then exploratory experiments are conducted on the trained model from three aspects: The contribution of each action, the path length planned under different threat areas, and the execution sequence of each individual. The results show that the RLSAR has higher convergence speed than the standard SAR, the differential evolution algorithm and the squirrel search algorithm. Furthermore, it can successfully plan a more economical, safe and effective feasible path for an unmanned aerial vehicle(UAV) in a randomly generated three-dimensional dynamic environment, which shows that the proposed algorithm can serve as an effective path planning method for UAVs.