基于双分支特征融合的场景文本检测方法
CSTR:
作者:
作者单位:

1. 安徽大学 计算智能与信号处理教育部重点实验室,合肥 230601;2. 安徽大学 计算机科学与技术学院,合肥 230601

作者简介:

通讯作者:

E-mail: zhaopeng_ad@163.com.

中图分类号:

TP391.4

基金项目:

国家自然科学基金项目(61602004);安徽省高校自然科学研究重点项目(KJ2018A0013,KJ2017A011);安徽省自然科学基金项目(1908085MF188,1908085MF182);安徽省重点研究与开发计划项目(1804d08020309).


A scene text detection based on dual-path feature fusion
Author:
Affiliation:

1. Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education,Anhui University,Hefei 230601,China;2. School of Computer Science and Technology, Anhui University,Hefei 230601,China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    现有的基于深度学习的自然场景文本检测方法一般采用大型深度神经网络作为主干网络进行特征提取,虽然效果显著但检测模型十分庞大,检测效率较低,若直接将主干网络换成轻量型网络则不能提取出足够的特征信息,直接导致检测效果大幅降低.为了降低文本检测模型的规模以及更为高效地检测文本,提出基于双分支特征融合的场景文本检测方法,在采用相对轻量级的主干网络EfficientNet-b3的基础上,使用双路分支进行特征融合进而检测场景文本.一路分支使用特征金字塔网络,融合不同层级的特征;另一路分支使用空洞卷积空间金字塔池化结构,扩大感受野,然后融合两个分支的特征,在小幅增加计算量的同时获取更多的特征,弥补小型网络提取特征不足的问题.在3个公开数据集上的实验结果显示,所提出方法在保持较高检测水平的情况下,可以大幅度降低模型的参数量,大幅度提升检测速度.

    Abstract:

    The existing scene text detection methods based on deep learning generally use a deep neural network as the backbone network for feature extraction. Although it can achieve a striking detection effect,the entire detection model is very large which results in poor detection efficiency. If the large backbone network is replaced by a small backbone network directly, it will often fail to extract enough semantic features and can't achieve an ideal detection result. To reduce the size of the scene text detection model and promote the detection efficiency, a dual-path feature fusion based scene text detection(DPFFSTD) is proposed. Based on a relatively lightweight basic network EfficientNet-b3, the DPFF uses two branches for feature fusion to detect scene text. One branch uses a feature pyramid network to fuse the features with different levels. The other branch uses an atrous spatial pyramid pooling to enlarge receptive field and obtains the features of different scales. And then the features from the above two branches are fused to form more features only with a very small increasing computation, which makes up for the shortage of features caused by the small backbone network. The experimental results on three benchmark datasets show that the proposed method significantly reduces the number of the model parameters and greatly improves the detection efficiency while maintaining a high detection effect.

    参考文献
    相似文献
    引证文献
引用本文

赵鹏,徐本朋,闫石,等.基于双分支特征融合的场景文本检测方法[J].控制与决策,2021,36(9):2179-2186

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2021-08-09
  • 出版日期: 2021-09-20
文章二维码