Abstract:An unsupervised monocular visual odometery based on the principle of multi-view geometry and effective combination of convolutional neural network for image depth estimation and matching screening. Aiming at the problem that mainstream depth estimation networks tend to lose the shallow features of images, a depth estimation network based on improved dense blocks is constructed to effectively aggregate shallow features and improve the accuracy of image depth estimation. The odometery uses the depth estimation network to accurately predict the depth of the monocular image, uses the optical flow network to obtain two-way optical flow, and selects a high-quality match based on the principle of frontward and backward optical flow consistency. The initial pose and calculated depth are obtained by using multi-view geometric principles and optimization methods, and a 6-degree-of-freedom pose with the fixed global scale is obtained through a specific scale alignment principle. At the same time, in order to improve the network"s ability to learn scene details and the information of weak texture regions, the feature measurement loss based on feature map synthesis is combined into the network loss function. On the KITTI dataset, the depth estimation under different thresholds has achieved accuracy rates of 85.9%, 95.8%, and 97.2%, and the absolute trajectory error of the odometry on the 09 and 10 sequences is 0.007m.Experimental results show the effectiveness and accuracy of proposedmethod, and prove that it is superior to the existing methods on the task of visual odometry.