首页 | 本学科首页   官方微博 | 高级检索  
     


Detection,segmentation, and 3D pose estimation of surgical tools using convolutional neural networks and algebraic geometry
Affiliation:1. EnCoV, Institut Pascal, UMR 6602 CNRS/Université Clermont-Auvergne, Clermont-Ferrand, France;2. Department of Electrical and Electronic Engineering, Khulna University of Engineering & Technology, Khulna 9203, Bangladesh;1. Department of Computer Science, University of North Carolina at Chapel Hill, USA;2. GE Healthcare, USA;3. Kitware Inc., USA;4. Department of Mathematics, University of North Carolina at Chapel Hill, USA;1. Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy;2. Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, Italy;3. The BioRobotics Institute, Scuola Superiore Sant’Anna, Pisa, Italy;4. Department of Excellence in Robotics and AI, Scuola Superiore Sant’Anna, Pisa, Italy;5. Department of Information Engineering, Universitá Politecnica delle Marche, Ancona, Italy;6. Department of Fetal and Perinatal Medicine, Istituto “Giannina Gaslini”, Genoa, Italy;1. Inserm, UMR 1101, Brest F-29200, France;2. Univ Bretagne Occidentale, Brest F-29200, France;3. Institut Mines-Télécom Atlantique, Brest F-29200, France;4. Service d’Ophtalmologie, CHRU Brest, Brest F-29200, France
Abstract:Background and objective:Surgical tool detection, segmentation, and 3D pose estimation are crucial components in Computer-Assisted Laparoscopy (CAL). The existing frameworks have two main limitations. First, they do not integrate all three components. Integration is critical; for instance, one should not attempt computing pose if detection is negative. Second, they have highly specific requirements, such as the availability of a CAD model. We propose an integrated and generic framework whose sole requirement for the 3D pose is that the tool shaft is cylindrical. Our framework makes the most of deep learning and geometric 3D vision by combining a proposed Convolutional Neural Network (CNN) with algebraic geometry. We show two applications of our framework in CAL: tool-aware rendering in Augmented Reality (AR) and tool-based 3D measurement. Methods:We name our CNN as ART-Net (Augmented Reality Tool Network). It has a Single Input Multiple Output (SIMO) architecture with one encoder and multiple decoders to achieve detection, segmentation, and geometric primitive extraction. These primitives are the tool edge-lines, mid-line, and tip. They allow the tool’s 3D pose to be estimated by a fast algebraic procedure. The framework only proceeds if a tool is detected. The accuracy of segmentation and geometric primitive extraction is boosted by a new Full resolution feature map Generator (FrG). We extensively evaluate the proposed framework with the EndoVis and new proposed datasets. We compare the segmentation results against several variants of the Fully Convolutional Network (FCN) and U-Net. Several ablation studies are provided for detection, segmentation, and geometric primitive extraction. The proposed datasets are surgery videos of different patients.Results:In detection, ART-Net achieves 100.0% in both average precision and accuracy. In segmentation, it achieves 81.0% in mean Intersection over Union (mIoU) on the robotic EndoVis dataset (articulated tool), where it outperforms both FCN and U-Net, by 4.5pp and 2.9pp, respectively. It achieves 88.2% in mIoU on the remaining datasets (non-articulated tool). In geometric primitive extraction, ART-Net achieves 2.45 and 2.23 in mean Arc Length (mAL) error for the edge-lines and mid-line, respectively, and 9.3 pixels in mean Euclidean distance error for the tool-tip. Finally, in terms of 3D pose evaluated on animal data, our framework achieves 1.87 mm, 0.70 mm, and 4.80 mm mean absolute errors on the X, Y, and Z coordinates, respectively, and 5.94 angular error on the shaft orientation. It achieves 2.59 mm and 1.99 mm in mean and median location error of the tool head evaluated on patient data.Conclusions:The proposed framework outperforms existing ones in detection and segmentation. Compared to separate networks, integrating the tasks in a single network preserves accuracy in detection and segmentation but substantially improves accuracy in geometric primitive extraction. Overall, our framework has similar or better accuracy in 3D pose estimation while largely improving robustness against the very challenging imaging conditions of laparoscopy. The source code of our framework and our annotated dataset will be made publicly available at https://github.com/kamruleee51/ART-Net.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号