首页 | 本学科首页   官方微博 | 高级检索  
检索        


Augmenting Product Defect Surveillance Through Web Crawling and Machine Learning in Singapore
Authors:Ang  Pei San  Teo  Desmond Chun Hwee  Dorajoo  Sreemanee Raaj  Prem Kumar  Mukundaram  Chan  Yi Hao  Choong  Chih Tzer  Phuah  Doris Sock Tin  Tan  Dorothy Hooi Myn  Tan  Filina Meixuan  Huang  Huilin  Tan  Maggie Siok Hwee  Ng  Michelle Sau Yuen  Poh  Jalene Wang Woon
Institution:1.Vigilance and Compliance Branch, Health Products Regulation Group, Health Sciences Authority, 11 Biopolis Way, #11-01 Helios, Singapore, 138667, Singapore
;
Abstract:Introduction

Substandard medicines are medicines that fail to meet their quality standards and/or specifications. Substandard medicines can lead to serious safety issues affecting public health. With the increasing number of pharmaceuticals and the complexity of the pharmaceutical manufacturing supply chain, monitoring for substandard medicines via manual environmental scanning can be laborious and time consuming.

Methods

A web crawler was developed to automatically detect and extract alerts on substandard medicines published on the Internet by regulatory agencies. The crawled data were labelled as related to substandard medicines or not. An expert-derived keyword-based classification algorithm was compared against machine learning algorithms to identify substandard medicine alerts on two validation datasets (n = 4920 and n = 2458) from a later time period than training data. Models were comparatively assessed for recall, precision and their F1 scores (harmonic mean of precision and recall).

Results

The web crawler routinely extracted alerts from the 46 web pages belonging to nine regulatory agencies. From October 2019 to May 2020, 12,156 unique alerts were crawled of which 7378 (60.7%) alerts were set aside for validation and contained 1160 substandard medicine alerts (15.7%). An ensemble approach of combining machine learning and keywords achieved the best recall (94% and 97%), precision (85% and 80%) and F1 scores (89% and 88%) on temporal validation.

Conclusions

Combining robust web crawler programmes with rigorously tested filtering algorithms based on machine learning and keyword models can automate and expand horizon scanning capabilities for issues relating to substandard medicines.

Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号