Abstract:Accurate prediction of road network traffic flow is the foundation for ensuring the efficient operation of intelligent transportation systems. Aiming at the problem that existing methods are difficult to effectively model the complex nonlinear spatio-temporal dynamic dependency relationships in traffic flow data, a traffic flow prediction method based on enhanced spatio-temporal Transformer (ESTformer) is proposed. This method designs a multi-scale temporal Transformer and an enhanced spatial Transformer to respectively capture the temporal dependency relationships among traffic flow sequence data and the spatial dependency relationships among different nodes. The multi-scale time Transformer builds a short-term gated convolutional network to capture short-term time-dependent relationships in traffic flow data, and introduces a time multi-head self-attention mechanism to capture long-term dynamic time-dependent relationships. The enhanced space Transformer enhances the feature expression ability of the key vector through dual transformation and dynamically updates the key vector using time-varying mask matrix, thereby improving the model's ability to simultaneously capture node features and edge features. The test results on four real traffic flow datasets show that, compared with the baseline method, the proposed traffic flow prediction method based on ESTformer has superior prediction performance. Compared with the 13 baseline methods that performed best on different datasets, the mean absolute error (MAE) and root mean square error (RMSE) of the proposed method improved by 1.14\%-3.88\% and 0.36\%-1.78\%, respectively, at 12 time steps.