首页 | 本学科首页   官方微博 | 高级检索  
检索        


Unsupervised feature disentanglement for video retrieval in minimally invasive surgery
Institution:1. Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, China;2. T Stone Robotics Institute, The Chinese University of Hong Kong, China;3. Robotics and Microsystems Center, School of Mechanical and Electric Engineering, Soochow University, Suzhou, China;4. Department of Computer Science and Engineering, The Chinese University of Hong Kong, China;5. Cornerstone Robotics Limited, Shatin, Hong Kong, China;6. Department of Obstetrics and Gynaecology, Prince of Wales Hospital, The Chinese University of Hong Kong, China;1. Univ Lyon, UCBL, Inserm, INSA Lyon, CNRS, CREATIS, UMR5220, U1294,Villeurbanne 69621, France;2. Centre Hospitalier Universitaire de Nice, Service de Cardiologie, Nice, France;1. School of Artificial Intelligence, Beijing Normal University, China;2. Vicomtech Foundation, San Sebastián, Spain;3. Biodonostia Health Research Institute, San Sebastián, Spain;4. BCN MedTech, Dept. of Information and Communication Technologies, Universitát Pompeu Fabra, Barcelona, Spain;5. ICREA, Barcelona, Spain;1. College of Computer Science, Sichuan University, China;2. Guangdong Provincial Key Laboratory of Computer Vision and Virtual Reality Technology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, China;3. Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong;1. The Department of Computer Science and Engineering, The Chinese University of Hong Kong, HKSAR, China;2. The Department of Statistics and Actuarial Science, The University of Hong Kong, HKSAR, China;1. Siemens Healthineers, Digital Technology and Innovation, Princeton, NJ 08540 USA;2. Havard Medical School, Boston, MA 02115 USA
Abstract:In this paper, we propose a novel method of Unsupervised Disentanglement of Scene and Motion (UDSM) representations for minimally invasive surgery video retrieval within large databases, which has the potential to advance intelligent and efficient surgical teaching systems. To extract more discriminative video representations, two designed encoders with a triplet ranking loss and an adversarial learning mechanism are established to respectively capture the spatial and temporal information for achieving disentangled features from each frame with promising interpretability. In addition, the long-range temporal dependencies are improved in an integrated video level using a temporal aggregation module and then a set of compact binary codes that carries representative features is yielded to realize fast retrieval. The entire framework is trained in an unsupervised scheme, i.e., purely learning from raw surgical videos without using any annotation. We construct two large-scale minimally invasive surgery video datasets based on the public dataset Cholec80 and our in-house dataset of laparoscopic hysterectomy, to establish the learning process and validate the effectiveness of our proposed method qualitatively and quantitatively on the surgical video retrieval task. Extensive experiments show that our approach significantly outperforms the state-of-the-art video retrieval methods on both datasets, revealing a promising future for injecting intelligence in the next generation of surgical teaching systems.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号