Fusing attention mechanism with Mask R-CNN for instance segmentation of grape cluster in the field

Lei Shen, Jinya Su, Rong Huang, Wumeng Quan, Yuyang Song, Yulin Fang, Baofeng Su

Research output: Contribution to journalArticlepeer-review

9 Citations (Scopus)
4 Downloads (Pure)


Accurately detecting and segmenting grape cluster in the field is fundamental for precision viticulture. In this paper, a new backbone network, ResNet50-FPN-ED, was proposed to improve Mask R-CNN instance segmentation so that the detection and segmentation performance can be improved under complex environments, cluster shape variations, leaf shading, trunk occlusion and grapes overlapping. An Efficient Channel Attention (ECA) mechanism was first introduced in the backbone network to correct the extracted features for better grape cluster detection. To obtain detailed feature map information, Dense Upsampling Convolution (DUC) was used in feature pyramid fusion to improve model segmentation accuracy. Moreover, model generalization performance was also improved by training the model on two different datasets. The developed algorithm was validated on a large dataset with 682 annotated images, where
the experimental results indicate that the model achieves an Average Precision (AP) of 60.1% on object detection and 59.5% on instance segmentation. Particularly, on object detection task, the AP improved by 1.4% and 1.8% over the original Mask R-CNN (ResNet50-FPN) and Faster R-CNN (ResNet50-FPN). For the instance segmentation, the AP improved by 1.6% and 2.2% over the original Mask R-CNN and SOLOv2. When tested on different datasets, the improved model had high detection and segmentation accuracy and inter-varietal generalization performance in complex growth environments, which is able to provide technical support for intelligent vineyard management.
Original languageEnglish
Article number934450
Number of pages17
JournalFrontiers in plant science
Early online date22 Jul 2022
Publication statusPublished - 22 Jul 2022

Bibliographical note

This work was supported by the National Key R&D Program Project of China
(Grant No. 2019YFD1002500), Guangxi Key R&D Program Project (Grant No. Gui
Ke AB21076001) and the Shaanxi Provincial Key R&D Program Project (Grant No.
We would like to thank Shan Chen, Lijie Song, Shihao Zhang and Qifan Chen for
their work on field data collection.


  • Grape
  • Instance segmentation
  • Mask R-CNN
  • Attention mechanism
  • Dense Upsampling Convolution


Dive into the research topics of 'Fusing attention mechanism with Mask R-CNN for instance segmentation of grape cluster in the field'. Together they form a unique fingerprint.

Cite this