GSMR-CNN: An End-to-End Trainable Architecture for Grasping Target Objects from Multi-Object Scenes

Valerija Holomjova; Andrew J. Starkey; Pascal Meißner

doi:10.1109/ICRA48891.2023.10161009

GSMR-CNN: An End-to-End Trainable Architecture for Grasping Target Objects from Multi-Object Scenes

Valerija Holomjova, Andrew J. Starkey, Pascal Meißner

Research output: Chapter in Book/Report/Conference proceeding › Published conference contribution

Abstract

We present an end-to-end trainable multi-task model that locates and retrieves target objects from multi-object scenes. The model is an extension of the Siamese Mask R-CNN, which combines the components of Siamese Neural Networks (SNNs) and Mask R-CNN for performing one-shot instance segmentation. The proposed network, called Grasping Siamese Mask R-CNN (GSMR-CNN), extends Siamese Mask R-CNN by adding an additional branch for grasp detection in parallel to the previous object detection head branches. This allows our model to identify a target object with a suitable grasp simultaneously, as opposed to other approaches that require the training of separate models to achieve the same task. The inherent SNN properties enable the proposed model to generalize and recognize new object categories that were not present during training, which is beyond the capabilities of standard object detectors. Moreover, an end-to-end solution uses shared features entailing less model parameters. The model achieves grasp accuracy scores of 92.1 % and 90.4% on the OCID grasp dataset on image-wise and object-wise splits. Physical experiments show that the model achieves a grasp success rate of 76.4 % when correctly identifying the object. Code and models are available at https://github.com/valerijah/grasping_siamese_mask_rcnn

Original language	English
Title of host publication	Proceedings - ICRA 2023
Subtitle of host publication	IEEE International Conference on Robotics and Automation
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	3808-3814
Number of pages	7
ISBN (Electronic)	9798350323658
DOIs	https://doi.org/10.1109/ICRA48891.2023.10161009
Publication status	Published - 4 Jul 2023
Event	2023 IEEE International Conference on Robotics and Automation, ICRA 2023 - London, United Kingdom Duration: 29 May 2023 → 2 Jun 2023

Conference

Conference	2023 IEEE International Conference on Robotics and Automation, ICRA 2023
Country/Territory	United Kingdom
City	London
Period	29/05/23 → 2/06/23

Bibliographical note

Funding Information:
This research is funded by a studentship awarded by the School of Engineering at the University of Aberdeen, Scotland UK.

Access to Document

10.1109/ICRA48891.2023.10161009Licence: Unspecified

Cite this

GSMR-CNN: An End-to-End Trainable Architecture for Grasping Target Objects from Multi-Object Scenes. / Holomjova, Valerija; Starkey, Andrew J.; Meißner, Pascal.
Proceedings - ICRA 2023: IEEE International Conference on Robotics and Automation. Institute of Electrical and Electronics Engineers Inc., 2023. p. 3808-3814.

Research output: Chapter in Book/Report/Conference proceeding › Published conference contribution

Holomjova, V, Starkey, AJ & Meißner, P 2023, GSMR-CNN: An End-to-End Trainable Architecture for Grasping Target Objects from Multi-Object Scenes. in Proceedings - ICRA 2023: IEEE International Conference on Robotics and Automation. Institute of Electrical and Electronics Engineers Inc., pp. 3808-3814, 2023 IEEE International Conference on Robotics and Automation, ICRA 2023, London, United Kingdom, 29/05/23. https://doi.org/10.1109/ICRA48891.2023.10161009

@inproceedings{33f7839c46654ddca62a19e56e466ac0,

title = "GSMR-CNN: An End-to-End Trainable Architecture for Grasping Target Objects from Multi-Object Scenes",

abstract = "We present an end-to-end trainable multi-task model that locates and retrieves target objects from multi-object scenes. The model is an extension of the Siamese Mask R-CNN, which combines the components of Siamese Neural Networks (SNNs) and Mask R-CNN for performing one-shot instance segmentation. The proposed network, called Grasping Siamese Mask R-CNN (GSMR-CNN), extends Siamese Mask R-CNN by adding an additional branch for grasp detection in parallel to the previous object detection head branches. This allows our model to identify a target object with a suitable grasp simultaneously, as opposed to other approaches that require the training of separate models to achieve the same task. The inherent SNN properties enable the proposed model to generalize and recognize new object categories that were not present during training, which is beyond the capabilities of standard object detectors. Moreover, an end-to-end solution uses shared features entailing less model parameters. The model achieves grasp accuracy scores of 92.1 % and 90.4% on the OCID grasp dataset on image-wise and object-wise splits. Physical experiments show that the model achieves a grasp success rate of 76.4 % when correctly identifying the object. Code and models are available at https://github.com/valerijah/grasping_siamese_mask_rcnn",

author = "Valerija Holomjova and Starkey, {Andrew J.} and Pascal Mei{\ss}ner",

note = "Funding Information: This research is funded by a studentship awarded by the School of Engineering at the University of Aberdeen, Scotland UK. ; 2023 IEEE International Conference on Robotics and Automation, ICRA 2023 ; Conference date: 29-05-2023 Through 02-06-2023",

year = "2023",

month = jul,

day = "4",

doi = "10.1109/ICRA48891.2023.10161009",

language = "English",

pages = "3808--3814",

booktitle = "Proceedings - ICRA 2023",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

address = "United States",

}

TY - GEN

T1 - GSMR-CNN

T2 - 2023 IEEE International Conference on Robotics and Automation, ICRA 2023

AU - Holomjova, Valerija

AU - Starkey, Andrew J.

AU - Meißner, Pascal

N1 - Funding Information: This research is funded by a studentship awarded by the School of Engineering at the University of Aberdeen, Scotland UK.

PY - 2023/7/4

Y1 - 2023/7/4

N2 - We present an end-to-end trainable multi-task model that locates and retrieves target objects from multi-object scenes. The model is an extension of the Siamese Mask R-CNN, which combines the components of Siamese Neural Networks (SNNs) and Mask R-CNN for performing one-shot instance segmentation. The proposed network, called Grasping Siamese Mask R-CNN (GSMR-CNN), extends Siamese Mask R-CNN by adding an additional branch for grasp detection in parallel to the previous object detection head branches. This allows our model to identify a target object with a suitable grasp simultaneously, as opposed to other approaches that require the training of separate models to achieve the same task. The inherent SNN properties enable the proposed model to generalize and recognize new object categories that were not present during training, which is beyond the capabilities of standard object detectors. Moreover, an end-to-end solution uses shared features entailing less model parameters. The model achieves grasp accuracy scores of 92.1 % and 90.4% on the OCID grasp dataset on image-wise and object-wise splits. Physical experiments show that the model achieves a grasp success rate of 76.4 % when correctly identifying the object. Code and models are available at https://github.com/valerijah/grasping_siamese_mask_rcnn

AB - We present an end-to-end trainable multi-task model that locates and retrieves target objects from multi-object scenes. The model is an extension of the Siamese Mask R-CNN, which combines the components of Siamese Neural Networks (SNNs) and Mask R-CNN for performing one-shot instance segmentation. The proposed network, called Grasping Siamese Mask R-CNN (GSMR-CNN), extends Siamese Mask R-CNN by adding an additional branch for grasp detection in parallel to the previous object detection head branches. This allows our model to identify a target object with a suitable grasp simultaneously, as opposed to other approaches that require the training of separate models to achieve the same task. The inherent SNN properties enable the proposed model to generalize and recognize new object categories that were not present during training, which is beyond the capabilities of standard object detectors. Moreover, an end-to-end solution uses shared features entailing less model parameters. The model achieves grasp accuracy scores of 92.1 % and 90.4% on the OCID grasp dataset on image-wise and object-wise splits. Physical experiments show that the model achieves a grasp success rate of 76.4 % when correctly identifying the object. Code and models are available at https://github.com/valerijah/grasping_siamese_mask_rcnn

UR - http://www.scopus.com/inward/record.url?scp=85168708268&partnerID=8YFLogxK

U2 - 10.1109/ICRA48891.2023.10161009

DO - 10.1109/ICRA48891.2023.10161009

M3 - Published conference contribution

AN - SCOPUS:85168708268

SP - 3808

EP - 3814

BT - Proceedings - ICRA 2023

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 29 May 2023 through 2 June 2023

ER -

GSMR-CNN: An End-to-End Trainable Architecture for Grasping Target Objects from Multi-Object Scenes

Abstract

Conference

Bibliographical note

Access to Document

Other files and links

Fingerprint

Cite this