Abstract
Deducing the 3D structure of endoscopic scenes from images remains extremely challenging. In addition to deformation and view-dependent lighting, tubular structures like the colon present problems stemming from the self-occluding, repetitive anatomical structures. In this paper, we propose SimCol, a synthetic dataset for camera pose estimation in colonoscopy and a novel method that explicitly learns a bimodal distribution to predict the endoscope pose. Our dataset replicates real colonoscope motion and highlights drawbacks of existing methods. We publish 18k RGB images from simulated colonoscopy with corresponding depth and camera poses and make our data generation environment in Unity publicly available. We evaluate different camera pose prediction methods and demonstrate that, when trained on our data, they generalize to real colonoscopy sequences and our bimodal approach outperforms prior unimodal work.
Original language | English |
---|---|
Publisher | ArXiv |
Pages | 1-11 |
Number of pages | 11 |
Volume | 2204.04968 |
DOIs | |
Publication status | Published - 11 Apr 2022 |
Bibliographical note
This work was supported by the Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS) [203145Z/16/Z]; Engineeringand Physical Sciences Research Council (EPSRC) [EP/P027938/1,
EP/R004080/1, EP/P012841/1]; The Royal Academy of Engineering
Chair in Emerging Technologies scheme; and the EndoMapper project
by Horizon 2020 FET (GA 863146). For the purpose of open access,
the author has applied a CC BY public copyright licence to any author
accepted manuscript version arising from this submission.
The authors would like to thank Javier Morlana from University of
Zaragoza for providing the COLMAP results for real colonoscopy sequences and both Sophia Bano from UCL and the anonymous reviewers
for the constructive discussions and comments.
Keywords
- 3D reconstruction
- camera pose estimation
- endoscopy
- SLAM
- surgical AI