Abstract
Task-oriented grasping models aim to predict a suitable grasp pose on an object to fulfill a task. These systems have limited generalization capabilities to new tasks, but have shown the ability to generalize to novel objects by recognizing affordances. This object generalization comes at the cost of being unable to recognize the object category being grasped, which could lead to unpredictable or risky behaviors. To overcome these generalization limitations, we contribute a novel system for task-oriented grasping called the One-Shot Task-Oriented Grasping (OS-TOG) framework. OS-TOG comprises four interchangeable neural networks that interact through dependable reasoning components, resulting in a single system that predicts multiple grasp candidates for a specific object and task from multi-object scenes. Embedded one-shot learning models leverage references within a database for OS-TOG to generalize to novel objects and tasks more efficiently than existing alternatives. Additionally, the paper presents suitable candidates for the framework's neural components, covering essential adjustments for their integration and evaluative comparisons to state-of-the-art. In physical experiments with novel objects, OS-TOG recognizes 69.4% of detected objects correctly and predicts suitable task-oriented grasps with 82.3% accuracy, having a physical grasp success rate of 82.3%.
Original language | English |
---|---|
Pages (from-to) | 8232-8238 |
Number of pages | 7 |
Journal | IEEE Robotics and Automation Letters |
Volume | 8 |
Issue number | 12 |
Early online date | Nov 2023 |
DOIs | |
Publication status | Published - 1 Dec 2023 |
Keywords
- computer vision for automation
- Deep learning in grasping and manipulation
- grasping
- recognition