首页 | 本学科首页   官方微博 | 高级检索  
     


Multi-task recurrent convolutional network with correlation loss for surgical video analysis
Affiliation:1. Department of Computer Science and Engineering, The Chinese University of Hong Kong, China;2. Centre for Smart Health, School of Nursing, The Hong Kong Polytechnic University, China;3. T Stone Robotics Institute, The Chinese University of Hong Kong, China;1. Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China;2. Centre for Smart Health, School of Nursing, The Hong Kong Polytechnic University, Hong Kong, China;1. Inserm, UMR 1101, Brest F-29200, France;2. Univ Bretagne Occidentale, Brest F-29200, France;3. Institut Mines-Télécom Atlantique, Brest F-29200, France;4. Service d’Ophtalmologie, CHRU Brest, Brest F-29200, France;1. Multimedia Systems Department, Faculty of Electronics, Telecommunications, and Informatics, Gdansk University of Technology, ul. Narutowicza 11/12, Gdansk 80–233, Poland;2. Systems Research Institute of the Polish Academy of Sciences, ul. Newelska 6, Warsaw 01–447, Poland;3. Biomedical Engineering Department, Faculty of Electronics, Telecommunications, and Informatics, Gdansk University of Technology, ul. Narutowicza 11/12, Gdansk 80–233, Poland";1. BME Dept, National University of Singapore (NUS);2. EE Dept, Chinese University of Hong Kong (CUHK);3. Department of Instrumentation and Control Engineering, NIT Trichy, India;4. Department of Otolaryngology, Singapore General Hospital, Singapore
Abstract:Surgical tool presence detection and surgical phase recognition are two fundamental yet challenging tasks in surgical video analysis as well as very essential components in various applications in modern operating rooms. While these two analysis tasks are highly correlated in clinical practice as the surgical process is typically well-defined, most previous methods tackled them separately, without making full use of their relatedness. In this paper, we present a novel method by developing a multi-task recurrent convolutional network with correlation loss (MTRCNet-CL) to exploit their relatedness to simultaneously boost the performance of both tasks. Specifically, our proposed MTRCNet-CL model has an end-to-end architecture with two branches, which share earlier feature encoders to extract general visual features while holding respective higher layers targeting for specific tasks. Given that temporal information is crucial for phase recognition, long-short term memory (LSTM) is explored to model the sequential dependencies in the phase recognition branch. More importantly, a novel and effective correlation loss is designed to model the relatedness between tool presence and phase identification of each video frame, by minimizing the divergence of predictions from the two branches. Mutually leveraging both low-level feature sharing and high-level prediction correlating, our MTRCNet-CL method can encourage the interactions between the two tasks to a large extent, and hence can bring about benefits to each other. Extensive experiments on a large surgical video dataset (Cholec80) demonstrate outstanding performance of our proposed method, consistently exceeding the state-of-the-art methods by a large margin, e.g., 89.1% v.s. 81.0% for the mAP in tool presence detection and 87.4% v.s. 84.5% for F1 score in phase recognition.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号