Bimodal Camera Pose Prediction for Endoscopy

Anita Rau; Binod Bhattarai; Lourdes Agapito; Danail Stoyanov

doi:10.48550/arXiv.2204.04968

Bimodal Camera Pose Prediction for Endoscopy

Anita Rau, Binod Bhattarai, Lourdes Agapito, Danail Stoyanov

Computing Science

University College London

Research output: Working paper › Preprint

2 Downloads (Pure)

Abstract

Deducing the 3D structure of endoscopic scenes from images remains extremely challenging. In addition to deformation and view-dependent lighting, tubular structures like the colon present problems stemming from the self-occluding, repetitive anatomical structures. In this paper, we propose SimCol, a synthetic dataset for camera pose estimation in colonoscopy and a novel method that explicitly learns a bimodal distribution to predict the endoscope pose. Our dataset replicates real colonoscope motion and highlights drawbacks of existing methods. We publish 18k RGB images from simulated colonoscopy with corresponding depth and camera poses and make our data generation environment in Unity publicly available. We evaluate different camera pose prediction methods and demonstrate that, when trained on our data, they generalize to real colonoscopy sequences and our bimodal approach outperforms prior unimodal work.

Original language	English
Publisher	ArXiv
Pages	1-11
Number of pages	11
Volume	2204.04968
DOIs	https://doi.org/10.48550/arXiv.2204.04968
Publication status	Published - 11 Apr 2022

Bibliographical note

This work was supported by the Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS) [203145Z/16/Z]; Engineering
and Physical Sciences Research Council (EPSRC) [EP/P027938/1,
EP/R004080/1, EP/P012841/1]; The Royal Academy of Engineering
Chair in Emerging Technologies scheme; and the EndoMapper project
by Horizon 2020 FET (GA 863146). For the purpose of open access,
the author has applied a CC BY public copyright licence to any author
accepted manuscript version arising from this submission.

The authors would like to thank Javier Morlana from University of
Zaragoza for providing the COLMAP results for real colonoscopy sequences and both Sophia Bano from UCL and the anonymous reviewers
for the constructive discussions and comments.

Keywords

3D reconstruction
camera pose estimation
endoscopy
SLAM
surgical AI

Access to Document

10.48550/arXiv.2204.04968Licence: CC BY

Rau_etal_ArXiv_Bimodal_Camera_Pose_VOR
https://creativecommons.org/licenses/by/4.0/
Final published version, 6.49 MBLicence: CC BY

Cite this

@techreport{561b5dc805734693aa62eb25298c091e,

title = "Bimodal Camera Pose Prediction for Endoscopy",

abstract = "Deducing the 3D structure of endoscopic scenes from images remains extremely challenging. In addition to deformation and view-dependent lighting, tubular structures like the colon present problems stemming from the self-occluding, repetitive anatomical structures. In this paper, we propose SimCol, a synthetic dataset for camera pose estimation in colonoscopy and a novel method that explicitly learns a bimodal distribution to predict the endoscope pose. Our dataset replicates real colonoscope motion and highlights drawbacks of existing methods. We publish 18k RGB images from simulated colonoscopy with corresponding depth and camera poses and make our data generation environment in Unity publicly available. We evaluate different camera pose prediction methods and demonstrate that, when trained on our data, they generalize to real colonoscopy sequences and our bimodal approach outperforms prior unimodal work.",

keywords = "3D reconstruction, camera pose estimation, endoscopy, SLAM, surgical AI",

author = "Anita Rau and Binod Bhattarai and Lourdes Agapito and Danail Stoyanov",

note = "This work was supported by the Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS) [203145Z/16/Z]; Engineering and Physical Sciences Research Council (EPSRC) [EP/P027938/1, EP/R004080/1, EP/P012841/1]; The Royal Academy of Engineering Chair in Emerging Technologies scheme; and the EndoMapper project by Horizon 2020 FET (GA 863146). For the purpose of open access, the author has applied a CC BY public copyright licence to any author accepted manuscript version arising from this submission. The authors would like to thank Javier Morlana from University of Zaragoza for providing the COLMAP results for real colonoscopy sequences and both Sophia Bano from UCL and the anonymous reviewers for the constructive discussions and comments.",

year = "2022",

month = apr,

day = "11",

doi = "10.48550/arXiv.2204.04968",

language = "English",

volume = "2204.04968",

pages = "1--11",

publisher = "ArXiv",

type = "WorkingPaper",

institution = "ArXiv",

}

TY - UNPB

T1 - Bimodal Camera Pose Prediction for Endoscopy

AU - Rau, Anita

AU - Bhattarai, Binod

AU - Agapito, Lourdes

AU - Stoyanov, Danail

N1 - This work was supported by the Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS) [203145Z/16/Z]; Engineering and Physical Sciences Research Council (EPSRC) [EP/P027938/1, EP/R004080/1, EP/P012841/1]; The Royal Academy of Engineering Chair in Emerging Technologies scheme; and the EndoMapper project by Horizon 2020 FET (GA 863146). For the purpose of open access, the author has applied a CC BY public copyright licence to any author accepted manuscript version arising from this submission. The authors would like to thank Javier Morlana from University of Zaragoza for providing the COLMAP results for real colonoscopy sequences and both Sophia Bano from UCL and the anonymous reviewers for the constructive discussions and comments.

PY - 2022/4/11

Y1 - 2022/4/11

N2 - Deducing the 3D structure of endoscopic scenes from images remains extremely challenging. In addition to deformation and view-dependent lighting, tubular structures like the colon present problems stemming from the self-occluding, repetitive anatomical structures. In this paper, we propose SimCol, a synthetic dataset for camera pose estimation in colonoscopy and a novel method that explicitly learns a bimodal distribution to predict the endoscope pose. Our dataset replicates real colonoscope motion and highlights drawbacks of existing methods. We publish 18k RGB images from simulated colonoscopy with corresponding depth and camera poses and make our data generation environment in Unity publicly available. We evaluate different camera pose prediction methods and demonstrate that, when trained on our data, they generalize to real colonoscopy sequences and our bimodal approach outperforms prior unimodal work.

AB - Deducing the 3D structure of endoscopic scenes from images remains extremely challenging. In addition to deformation and view-dependent lighting, tubular structures like the colon present problems stemming from the self-occluding, repetitive anatomical structures. In this paper, we propose SimCol, a synthetic dataset for camera pose estimation in colonoscopy and a novel method that explicitly learns a bimodal distribution to predict the endoscope pose. Our dataset replicates real colonoscope motion and highlights drawbacks of existing methods. We publish 18k RGB images from simulated colonoscopy with corresponding depth and camera poses and make our data generation environment in Unity publicly available. We evaluate different camera pose prediction methods and demonstrate that, when trained on our data, they generalize to real colonoscopy sequences and our bimodal approach outperforms prior unimodal work.

KW - 3D reconstruction

KW - camera pose estimation

KW - endoscopy

KW - SLAM

KW - surgical AI

U2 - 10.48550/arXiv.2204.04968

DO - 10.48550/arXiv.2204.04968

M3 - Preprint

VL - 2204.04968

SP - 1

EP - 11

BT - Bimodal Camera Pose Prediction for Endoscopy

PB - ArXiv

ER -

Bimodal Camera Pose Prediction for Endoscopy

Abstract

Bibliographical note

Keywords

Access to Document

Fingerprint

Cite this