Abstract:To address the challenges of perceptual uncertainty and limited policy generalization in dynamic, unstructured environments, this paper proposes a robust autonomous navigation policy optimization framework based on asymmetric reinforcement learning, termed Robust Asymmetric Navigation (RANav). The framework integrates implicit environment estimation, domain randomization, and asymmetric reinforcement learning to enhance the robot’s modeling and decision-making capabilities in dynamic settings. Specifically, a multimodal implicit environment estimation network is designed to accurately extract dynamic obstacle features and improve scene representation. A behavior-driven domain randomization mechanism is introduced to facilitate Sim-to-Real policy transfer. Finally, an asymmetric proximal policy optimization (PPO) algorithm is employed, where privileged information is provided to the Critic network during training to improve policy learning efficiency. Extensive simulations and real-world experiments demonstrate that RANav significantly outperforms existing methods in terms of navigation success rate, obstacle avoidance robustness, and path efficiency, verifying its strong generalization and deployment potential in complex, unstructured environments.