基于物理约束梯度引导的微电网能量调度安全强化学习策略
CSTR:
作者:
作者单位:

浙江工业大学 信息工程学院

作者简介:

通讯作者:

中图分类号:

TP273

基金项目:

中国国家自然科学基金项目(项目编号62373328);浙江省自然科学基金资助项目(项目编号LR25F030003)


Physics-Constrained and Gradient-Guided Reinforcement Learning for Secure Energy Dispatch in Microgrids
Author:
Affiliation:

Fund Project:

The National Natural Science Foundation of China under Grants No. 62373328;Zhejiang Provincial Natural Science Foundation of China under Grant No. LR25F030003.

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    当前微电网能量调度面临的挑战在于时序耦合约束导致决策空间维度显著提升,以及交流潮流方程引入非线性约束,增加了计算复杂度,使得整体优化模型具有较强的非凸性,从而大幅增加了问题的求解难度.针对上述问题,本文提出一种基于安全强化学习和物理约束梯度引导的微电网能量调度方法.该方法构建基于深度学习的动作修正安全层网络,在环境交互过程中对智能体动作进行投影,以保障动作满足物理可行性并有效提升探索效率.进一步地,将该安全层嵌入至网络训练过程,从而提升了强化学习Critic网络$Q$值估计精度以及Actor网络对物理约束的学习效率.基于IEEE 14节点模型构建的微电网电-氢耦合潮流系统实验表明,所提方法在调度决策性能上优于拉格朗日乘子法(TD3-Lag)和惩罚项法(TD3-Pen).同时与基于数值优化的安全层方法相比,保持了相近的性能表现,但部署速度提升了约三个数量级.

    Abstract:

    Current microgrid energy scheduling faces critical challenges, where temporal coupling constraints lead to a significant expansion of the decision space, and AC power flow equations introduce nonlinear constraints that increase computational complexity, resulting in an overall optimization model with strong non-convexity and substantially increased solving difficulty.To address these issues, this paper proposes a microgrid energy scheduling method based on safe reinforcement learning and physics-constrained gradient guidance. The method constructs a deep learning-based action correction safety layer that projects agent actions into the feasible domain during environment interaction, ensuring operational physical feasibility while effectively improving exploration efficiency.Furthermore, by embedding this safety layer into the network training process, it enhances the $Q$-value estimation accuracy of the Critic network and improves the physical constraint learning efficiency of the Actor network. Experimental results on an IEEE 14-bus model-based microgrid with electro-hydrogen coupled power flow demonstrate that the proposed method outperforms the Lagrangian multiplier method (TD3-Lag) and the penalty-based method (TD3-Pen) in scheduling decision performance. Compared to the numerically optimized safety layer approach, it achieves approximately three orders of magnitude faster deployment speed while maintaining similar performance levels.

    参考文献
    相似文献
    引证文献
引用本文
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-11-27
  • 最后修改日期:2026-03-03
  • 录用日期:2026-03-03
  • 在线发布日期: 2026-03-23
  • 出版日期:
文章二维码