Abstract:Offline reinforcement learning aims to learn policies from fixed static datasets, providing a significant avenue for the transition of reinforcement learning from simulated environments to real-world applications. However, offline datasets are typically collected from strategies of varying proficiency, resulting in a multi-modal action distribution that is challenging to articulate. Meanwhile, high-return trajectories in offline datasets are scarce, impeding the efficiency of policy learning. To address these challenges, this paper proposes an offline reinforcement learning approach based on advantage-constrained diffusion policy. Initially, policy is generated through the reverse diffusion steps of a diffusion model to better fit multi-modal behavior policy. Subsequently, a method is proposed to guide policy improvement using advantage functions, aimed at assisting agents in focusing more on trajectories with scarce yet high rewards. Finally, two types of advantage functions are developed specifically for co