The multi-depot vehicle routing problem(MDVRP) is widely used in the supply chain at present. Most of the existing algorithms use heuristic methods, which are slow to solve the problem and cannot guarantee the quality of the solution. It is of great academic significance and application value to study a fast and high-quality algorithm to solve the problem. With the goal of minimizing the total vehicle routing distance, a multi-agent deep reinforcement learning model is proposed. Firstly, the form of multi-agent reinforcement learning for the multi-depot vehicle routing problem is defined, including state, action, reward, and transition function, so that the model can be trained by multi-agent reinforcement learning. Through the definition of node neighbor and the masking mechanism of the MDVRP, a policy network composed of multi-agent networks based on the attention mechanism is designed. And the policy gradient algorithm is used to train the model. Then, the 2-opt local search strategy and the sampling search strategy are used to improve the solution. Finally through the comparison of the simulation experiments of different scale problems with other algorithms, it is verified that the proposed multi-agent deep reinforcement learning model and its combination with the search strategy can obtain high-quality solutions within a short period.