Learning with Capsules: A Survey

Fabio De Sousa Ribeiro, Kevin Duarte, Miles Anthony Everett, Georgios Leontidis, Mubarak Shah

Research output: Working paperPreprint


Capsule networks were proposed as an alternative approach to Convolutional Neural Networks (CNNs) for learning object-centric representations, which can be leveraged for improved generalization and sample complexity. Unlike CNNs, capsule networks are designed to explicitly model part-whole hierarchical relationships by using groups of neurons to encode visual entities, and learn the relationships between those entities. Promising early results achieved by capsule networks have motivated the deep learning community to continue trying to improve their performance and scalability across several application areas. However, a major hurdle for capsule network research has been the lack of a reliable point of reference for understanding their foundational ideas and motivations. The aim of this survey is to provide a comprehensive overview of the capsule network research landscape, which will serve as a valuable resource for the community going forward. To that end, we start with an introduction to the fundamental concepts and motivations behind capsule networks, such as equivariant inference in computer vision. We then cover the technical advances in the capsule routing mechanisms and the various formulations of capsule networks, e.g. generative and geometric. Additionally, we provide a detailed explanation of how capsule networks relate to the popular attention mechanism in Transformers, and highlight non-trivial conceptual similarities between them in the context of representation learning. Afterwards, we explore the extensive applications of capsule networks in computer vision, video and motion, graph representation learning, natural language processing, medical imaging and many others. To conclude, we provide an in-depth discussion regarding the main hurdles in capsule network research, and highlight promising research directions for future work.
Original languageEnglish
Number of pages29
Publication statusPublished - 6 Jun 2022

Bibliographical note

The authors would like to thank all reviewers, and especially Professor Chris Williams from the School of Informatics of the University of Edinburgh, who provided constructive feedback and ideas on how to improve this


  • deep learning
  • capsule networks
  • deep neural networks
  • convolutional neural networks
  • transformers
  • routing by agreement
  • self attention
  • representation learning
  • object centric learning
  • generative models
  • clustering
  • computer vision


Dive into the research topics of 'Learning with Capsules: A Survey'. Together they form a unique fingerprint.

Cite this