To address the challenges of low sample efficiency, unstable policy updates, and insufficient hardware utilization in traditional reinforcement learning methods for charging gun assembly, we propose an enhanced soft actor-critic (SAC) algorithm that integrates hindsight experience replay (HER) and delayed policy updates (DPU). First, a charging gun assembly model is established. The HER is integrated into the replay buffer to redefine goals and generate "pseudo-success" experiences. Then, the DPU is applied during the gradient update phase, where the value network is updated multiple times before each policy update to ensure more stable value estimation. Finally, during training with the SAC-HER-DPU algorithm, a dual-thread architecture is adopted to decouple data collection from neural network training, improving overall training efficiency. Experimental results show that the proposed algorithm achieves convergence in 33.2 hours, with an average of 75 assembly steps. Compared to the baseline SAC algorithm, it reduces convergence time by 21.4 hours and decreases the average number of assembly steps by 17. Moreover, it effectively improves sample efficiency, policy stability, and training speed.