Identifying Mycobacterium tuberculosis complex strain families using spoligotypes |
| |
Authors: | Inna Vitol Jeffrey Driscoll Barry Kreiswirth Natalia Kurepina Kristin P. Bennett |
| |
Affiliation: | Computer Science Department, Rensselaer Polytechnic Institute, 110 8th St, Troy, NY 12180, USA. vitoli@rpi.edu |
| |
Abstract: | We present a novel approach for analysis of Mycobacterium tuberculosis complex (MTC) strain genotyping data. Our work presents a first step in an ongoing project dedicated to the development of decision support tools for tuberculosis (TB) epidemiologists exploiting both genotyping and epidemiological data. We focus on spacer oligonucleotide typing (spoligotyping), a genotyping method based on analysis of a direct repeat (DR) locus. We use mixture models to identify strain families of MTC based on their spoligotyping patterns. Our algorithm, SPOTCLUST, incorporates biological information on spoligotype evolution, without attempting to derive the full phylogeny of MTC. We applied our algorithm to 535 different spoligotype patterns identified among 7166 MTC strains isolated between 1996 and 2004 from New York State TB patients. Two models were employed and validated: a 36-component model based on global spoligotype database SpolDB3, and a randomly initialized model (RIM) containing 48 components. Our analysis both confirmed previously expert-defined families of MTC strains and suggested certain new families. SPOTCLUST, which is available online, can be further improved by incorporating data obtained using additional strain genetic markers and epidemiological information. We demonstrate on New York City (NYC) patient data how the resulting models can potentially form the basis of TB control tools using genotyping. |
| |
Keywords: | Tuberculosis Pattern recognition Automated Automatic data processing Public health informatics |
本文献已被 ScienceDirect 等数据库收录! |
|