基于深度/单目融合视觉及强化学习的机器人定位棋局与行棋策略
CSTR:
作者:
作者单位:

1. 南京师范大学 电气与自动化工程学院,南京 210023;2. 南京林业大学 机械电子工程学院,南京 210037;3. 南京邮电大学 自动化学院、人工智能学院,南京 210003;4. 东南大学 仪器工程与科学学院,南京 210018

作者简介:

通讯作者:

E-mail: xiefei@njnu.edu.cn.

中图分类号:

TP273

基金项目:

国家自然科学基金项目(41974033);江苏省科技成果转化项目(BA2020004);江苏省省级工业和信息产业转型升级专项资金项目(JITC-2000AX0676-71);南京市优势产业关键技术突破招标项目(2018003).


Chess positioning and playing strategy of robot based on integrated depth/mono vision and reinforcement learning
Author:
Affiliation:

1. School of Electrical and Automation Engineering,Nanjing Normal University,Nanjing 210023,China;2. School of Mechanical and Electronic Engineering, Nanjing Forestry University,Nanjing 210037,China;3. College of Automation & College of Artificial Intelligence,Nanjing University of Posts and Telecommunications,Nanjing 210003,China;4. College of Instrument Science and Engineering,Southeast University,Nanjing 210018,China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    中国象棋对弈机器人系统实现的关键包括棋局识别定位和自主行棋策略.\:首先,针对棋局识别与定位问题,提出一种基于单目相机与深度相机视觉融合的棋局识别定位方法.\:该方法利用立体棋子三维特征获取棋子位置,与二维图像识别结果融合计算定位,以提高棋子的识别定位精度.\:其次,针对行棋策略问题,提出一种基于深度神经网络与蒙特卡洛树搜索的决策方法.\:该方法利用具有终局特征判断的蒙特卡洛树进行搜索,使用优化的随机行棋策略指导模拟行棋,训练具有多尺度及残差结构的策略价值网络模型.\:最后,通过自对弈获取训练数据,通过智能体对抗验证、更新模型参数.\:实验表明,相较于单目视觉识别,所提出方法具有更高的精确度和稳定性,识别率达到97%;相较于基准剪枝搜索算法,所提出方法对弈时最多赢得82%的对局,且所需运算时间缩短41%.

    Abstract:

    The key to the realization of the Chinese chess system lies in the board recognition and chess strategy. Firstly, for the problem of chessboard recognition, a method based on the fusion of mono vision and depth vision is proposed. This method designs a chess piece grid recognition network, uses the three-dimensional characteristics of chess pieces to convert the depth image into a chessboard grid, and integrates the chess piece coordinates with the chessboard grid to effectively improve the recognition accuracy of the chessboard. Secondly, aiming at the problem of the chess strategy, a method based on the deep neural network and Monte-Carlo tree search is proposed. This method uses the improved random search strategy with end-game feature judgment to guide the simulation of chess, which trains a policy and value network with residual structure. Finally, the training data is obtained through self-playing, and the parameters are updated and verified through the agent confrontation. Experiments show that compared with mono-only visual recognition, this method has higher accuracy and stability, and the recognition rate reaches 97%. Compared with the pruning search algorithm baseline, this method wins 82% of the games, and the computing time is reduced by 41%.

    参考文献
    相似文献
    引证文献
引用本文

吴启宇,谢非,黄磊,等.基于深度/单目融合视觉及强化学习的机器人定位棋局与行棋策略[J].控制与决策,2022,37(12):3278-3288

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2022-11-17
  • 出版日期: 2022-12-20
文章二维码