Abstract
We present an end-to-end trainable multi-task model that locates and retrieves target objects from multi-object scenes. The model is an extension of the Siamese Mask R-CNN, which combines the components of Siamese Neural Networks (SNNs) and Mask R-CNN for performing one-shot instance segmentation. The proposed network, called Grasping Siamese Mask R-CNN (GSMR-CNN), extends Siamese Mask R-CNN by adding an additional branch for grasp detection in parallel to the previous object detection head branches. This allows our model to identify a target object with a suitable grasp simultaneously, as opposed to other approaches that require the training of separate models to achieve the same task. The inherent SNN properties enable the proposed model to generalize and recognize new object categories that were not present during training, which is beyond the capabilities of standard object detectors. Moreover, an end-to-end solution uses shared features entailing less model parameters. The model achieves grasp accuracy scores of 92.1 % and 90.4% on the OCID grasp dataset on image-wise and object-wise splits. Physical experiments show that the model achieves a grasp success rate of 76.4 % when correctly identifying the object. Code and models are available at https://github.com/valerijah/grasping_siamese_mask_rcnn
Original language | English |
---|---|
Title of host publication | Proceedings - ICRA 2023 |
Subtitle of host publication | IEEE International Conference on Robotics and Automation |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 3808-3814 |
Number of pages | 7 |
ISBN (Electronic) | 9798350323658 |
DOIs | |
Publication status | Published - 4 Jul 2023 |
Event | 2023 IEEE International Conference on Robotics and Automation, ICRA 2023 - London, United Kingdom Duration: 29 May 2023 → 2 Jun 2023 |
Conference
Conference | 2023 IEEE International Conference on Robotics and Automation, ICRA 2023 |
---|---|
Country/Territory | United Kingdom |
City | London |
Period | 29/05/23 → 2/06/23 |
Bibliographical note
Funding Information:This research is funded by a studentship awarded by the School of Engineering at the University of Aberdeen, Scotland UK.