共查询到20条相似文献,搜索用时 15 毫秒
1.
Gene expression profile classification is a pivotal research domain assisting in the transformation from traditional to personalized medicine. A major challenge associated with gene expression data classification is the small number of samples relative to the large number of genes. To address this problem, researchers have devised various feature selection algorithms to reduce the number of genes. Recent studies have been experimenting with the use of semantic similarity between genes in Gene Ontology (GO) as a method to improve feature selection. While there are few studies that discuss how to use GO for feature selection, there is no simulation study that addresses when to use GO-based feature selection. To investigate this, we developed a novel simulation, which generates binary class datasets, where the differentially expressed genes between two classes have some underlying relationship in GO. This allows us to investigate the effects of various factors such as the relative connectedness of the underlying genes in GO, the mean magnitude of separation between differentially expressed genes denoted by δ, and the number of training samples. Our simulation results suggest that the connectedness in GO of the differentially expressed genes for a biological condition is the primary factor for determining the efficacy of GO-based feature selection. In particular, as the connectedness of differentially expressed genes increases, the classification accuracy improvement increases. To quantify this notion of connectedness, we defined a measure called Biological Condition Annotation Level BCAL( G), where G is a graph of differentially expressed genes. Our main conclusions with respect to GO-based feature selection are the following: (1) it increases classification accuracy when BCAL( G) ⩾ 0.696; (2) it decreases classification accuracy when BCAL( G) ⩽ 0.389; (3) it provides marginal accuracy improvement when 0.389 < BCAL( G) < 0.696 and δ < 1; (4) as the number of genes in a biological condition increases beyond 50 and δ ⩾ 0.7, the improvement from GO-based feature selection decreases; and (5) we recommend not using GO-based feature selection when a biological condition has less than ten genes. Our results are derived from datasets preprocessed using RMA (Robust Multi-array Average), cases where δ is between 0.3 and 2.5, and training sample sizes between 20 and 200, therefore our conclusions are limited to these specifications. Overall, this simulation is innovative and addresses the question of when SoFoCles-style feature selection should be used for classification instead of statistical-based ranking measures. 相似文献
4.
Medical Subject Headings (MeSH) are used to index the majority of databases generated by the National Library of Medicine. Essentially, MeSH terms are designed to make information, such as scientific articles, more retrievable and assessable to users of systems such as PubMed. This paper proposes a novel method for automating the assignment of biomedical publications with MeSH terms that takes advantage of citation references to these publications. Our findings show that analysing the citation references that point to a document can provide a useful source of terms that are not present in the document. The use of these citation contexts, as they are known, can thus help to provide a richer document feature representation, which in turn can help improve text mining and information retrieval applications, in our case MeSH term classification. In this paper, we also explore new methods of selecting and utilising citation contexts. In particular, we assess the effect of weighting the importance of citation terms (found in the citation contexts) according to two aspects: (i) the section of the paper they appear in and (ii) their distance to the citation marker.We conduct intrinsic and extrinsic evaluations of citation term quality. For the intrinsic evaluation, we rely on the UMLS Metathesaurus conceptual database to explore the semantic characteristics of the mined citation terms. We also analyse the “informativeness” of these terms using a class-entropy measure. For the extrinsic evaluation, we run a series of automatic document classification experiments over MeSH terms. Our experimental evaluation shows that citation contexts contain terms that are related to the original document, and that the integration of this knowledge results in better classification performance compared to two state-of-the-art MeSH classification systems: MeSHUP and MTI. Our experiments also demonstrate that the consideration of Section and Distance factors can lead to statistically significant improvements in citation feature quality, thus opening the way for better document feature representation in other biomedical text processing applications. 相似文献
5.
Transient gene expression assays were developed to assess the function of the regulatory sequences of baculoviruses Bombyx mori nuclear polyhedrosis virus (BmNPV) and Autographa californica nuclear polyhedrosis virus (AcNPV) in insect cells of Bombyx mori and Spodoptera frugiperda, respectively. DNA sequences encoding luciferase (luc) of the firefly Photinus pyralis was successfully employed in the expression assay as a reporter gene. Recombinant plasmids were constructed containing the luc gene under control of baculovirus-specific or heterologous promoters. Cotransfection of Bombyx mori and Spodoptera frugiperda cells with recombinant plasmids carrying virus-specific promoter sequences and BmNPV and AcNPV DNA, respectively, gave rise to efficient synthesis of luciferase (Luc), while heterologous promoters induced a low level of luc expression. We found that flanking sequences of the AcNPV DNA in the transfer plasmid contained an unknown promoter conferring an efficient luc expression. The activity of this promoter was modulated by the polh promoter sequences. The assay allows one to conduct highly sensitive monitoring of the transient expression of foreign genes from the transfecting plasmids prior to construction of recombinant viruses. 相似文献
6.
In this paper, we present an approach to term classification based on verb selectional patterns (VSPs), where such a pattern is defined as a set of semantic classes that could be used in combination with a given domain-specific verb. VSPs have been automatically learnt based on the information found in a corpus and an ontology in the biomedical domain. Prior to the learning phase, the corpus is terminologically processed: term recognition is performed by both looking up the dictionary of terms listed in the ontology and applying the C/NC-value method for on-the-fly term extraction. Subsequently, domain-specific verbs are automatically identified in the corpus based on the frequency of occurrence and the frequency of their co-occurrence with terms. VSPs are then learnt automatically for these verbs. Two machine learning approaches are presented. The first approach has been implemented as an iterative generalisation procedure based on a partial order relation induced by the domain-specific ontology. The second approach exploits the idea of genetic algorithms. Once the VSPs are acquired, they can be used to classify newly recognised terms co-occurring with domain-specific verbs. Given a term, the most frequently co-occurring domain-specific verb is selected. Its VSP is used to constrain the search space by focusing on potential classes of the given term. A nearest-neighbour approach is then applied to select a class from the constrained space of candidate classes. The most similar candidate class is predicted for the given term. The similarity measure used for this purpose combines contextual, lexical, and syntactic properties of terms. 相似文献
7.
Gene selection is an important task in bioinformatics studies, because the accuracy of cancer classification generally depends upon the genes that have biological relevance to the classifying problems. In this work, randomization test (RT) is used as a gene selection method for dealing with gene expression data. In the method, a statistic derived from the statistics of the regression coefficients in a series of partial least squares discriminant analysis (PLSDA) models is used to evaluate the significance of the genes. Informative genes are selected for classifying the four gene expression datasets of prostate cancer, lung cancer, leukemia and non-small cell lung cancer (NSCLC) and the rationality of the results is validated by multiple linear regression (MLR) modeling and principal component analysis (PCA). With the selected genes, satisfactory results can be obtained. 相似文献
8.
The aim of this study was to assess the efficacy of three‐dimensional texture analysis (3D TA) of conventional MR images for the classification of childhood brain tumours in a quantitative manner. The dataset comprised pre‐contrast T1‐ and T2‐weighted MRI series obtained from 48 children diagnosed with brain tumours (medulloblastoma, pilocytic astrocytoma and ependymoma). 3D and 2D TA were carried out on the images using first‐, second‐ and higher order statistical methods. Six supervised classification algorithms were trained with the most influential 3D and 2D textural features, and their performances in the classification of tumour types, using the two feature sets, were compared. Model validation was carried out using the leave‐one‐out cross‐validation (LOOCV) approach, as well as stratified 10‐fold cross‐validation, in order to provide additional reassurance. McNemar's test was used to test the statistical significance of any improvements demonstrated by 3D‐trained classifiers. Supervised learning models trained with 3D textural features showed improved classification performances to those trained with conventional 2D features. For instance, a neural network classifier showed 12% improvement in area under the receiver operator characteristics curve (AUC) and 19% in overall classification accuracy. These improvements were statistically significant for four of the tested classifiers, as per McNemar's tests. This study shows that 3D textural features extracted from conventional T1‐ and T2‐weighted images can improve the diagnostic classification of childhood brain tumours. Long‐term benefits of accurate, yet non‐invasive, diagnostic aids include a reduction in surgical procedures, improvement in surgical and therapy planning, and support of discussions with patients' families. It remains necessary, however, to extend the analysis to a multicentre cohort in order to assess the scalability of the techniques used. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献
11.
神经精神疾病的神经病理机制仍有许多未知,客观临床诊断标准也十分欠缺,其诊断与预后面临巨大挑战.随着神经影像技术的快速发展,神经影像数据被广泛应用于神经精神疾病神经病理机制的探索和潜在生物标志物的发掘.相比于实现群体水平分析的传统单变量分析方法,机器学习模型基于神经影像数据,实现神经精神 疾病的个体化、智能化预测.综述近... 相似文献
12.
骨质疏松性骨折是老年人发病和死亡的重要原因之一,建立高效的预测模型为老年人尽早提供诊断和治疗建议十分必要。实验利用Stacking构建了一种异构分类器EtDtb-S,将16个相关性较高的特征作为特征向量,选用极端随机树(ET)、基于决策树的装袋集成模型(DTB)作为初级学习器,逻辑回归作为次级学习器进行集成。实验验证将EtDtb-S与单模型、同构分类器进行骨质疏松性骨折预测对比,结果表明异构分类器相对于最优单模型预测精度提高2.8%,相对于最优同构分类器预测精度提高1.5%,具有更高的预测性能。 相似文献
13.
A portable gait analysis and activity-monitoring system for the evaluation of activities of daily life could facilitate clinical and research studies. This current study developed a small sensor unit comprising an accelerometer and a gyroscope in order to detect shank and foot segment motion and orientation during different walking conditions. The kinematic data obtained in the pre-swing phase were used to classify five walking conditions: stair ascent, stair descent, level ground, upslope and downslope. The kinematic data consisted of anterior-posterior acceleration and angular velocity measured from the shank and foot segments. A machine learning technique known as support vector machine (SVM) was applied to classify the walking conditions. SVM was also compared with other machine learning methods such as artificial neural network (ANN), radial basis function network (RBF) and Bayesian belief network (BBN). The SVM technique was shown to have a higher performance in classification than the other three methods. The results using SVM showed that stair ascent and stair descent could be distinguished from each other and from the other walking conditions with 100% accuracy by using a single sensor unit attached to the shank segment. For classification results in the five walking conditions, performance improved from 78% using the kinematic signals from the shank sensor unit to 84% by adding signals from the foot sensor unit. The SVM technique with the portable kinematic sensor unit could automatically recognize the walking condition for quantitative analysis of the activity pattern. 相似文献
14.
应用机器学习进行分类是基因功能预测的一种重要手段。但是许多预测集中的阳性样本过少,会降低功能预测的效果。针对此问题,本研究对结合支持向量机(SVM)算法的几种常用非平衡数据分类方法进行实验比较,包括投票整合分类器和移动分类面等。在此基础上提出通过加权修正投票的整合策略,以提高预测效果。实验结果显示,结合多数类样本限数取样及整合思想的投票整合法预测效果优于移动分类面法,而在投票整合法基础上的加权修正整合方法在所有方法中获得更好更稳定的结果。 相似文献
15.
Remodeling of tissue microvasculature commonly promotes neoplastic growth; however, there is no imaging modality in oncology yet that noninvasively quantifies microvascular changes in clinical routine. Although blood capillaries cannot be resolved in typical magnetic resonance imaging (MRI) measurements, their geometry and distribution influence the integral nuclear magnetic resonance (NMR) signal from each macroscopic MRI voxel. We have numerically simulated the expected transverse relaxation in NMR voxels with different dimensions based on the realistic microvasculature in healthy and tumor-bearing mouse brains (U87 and GL261 glioblastoma). The 3D capillary structure in entire, undissected brains was acquired using light sheet fluorescence microscopy to produce large datasets of the highly resolved cerebrovasculature. Using this data, we trained support vector machines to classify virtual NMR voxels with different dimensions based on the simulated spin dephasing accountable to field inhomogeneities caused by the underlying vasculature. In prediction tests with previously blinded virtual voxels from healthy brain tissue and GL261 tumors, stable classification accuracies above 95% were reached. Our results indicate that high classification accuracies can be stably attained with achievable training set sizes and that larger MRI voxels facilitated increasingly successful classifications, even with small training datasets. We were able to prove that, theoretically, the transverse relaxation process can be harnessed to learn endogenous contrasts for single voxel tissue type classifications on tailored MRI acquisitions. If translatable to experimental MRI, this may augment diagnostic imaging in oncology with automated voxel-by-voxel signal interpretation to detect vascular pathologies. 相似文献
16.
Breast cancer is a complex disease encompassing multiple tumour entities each with a characteristic morphology and behaviour. Current clinical practice relies on the recognition of various pathology prognostic factors to guide patient management, including histological type and grade, stage and biomarker receptor status. However, there is increasing concern that these parameters are of limited value for the accurate prediction of individual patient outcome. The introduction of genome-wide microarray-based expression profiling studies has allowed better understanding of the molecular underpinning of several characteristics of breast cancer, including histological grade and metastatic potential. Expression profiling has also facilitated the identification of prognostic and predictive gene expression signatures and novel therapeutic targets. Here we review the evolution of molecular classification of breast cancer, including special types, the implications for clinical management, limitations of findings thus far and predictions for the future. 相似文献
17.
Introduction: Systemic lupus erythematosus (SLE) is the prototype of systemic autoimmune diseases. Patients with SLE display a wide spectrum of clinical and serological findings that can mislead and delay the diagnosis. Diagnostic criteria have not been developed yet, whereas several sets of classification criteria are available; however, none of them has 100% sensitivity and 100% specificity, i.e. the hallmark of diagnostic criteria. Nevertheless, classification criteria are often misused as diagnostic criteria, which may affect earliness of diagnosis and lead to more misdiagnosed cases. Areas covered: In this review, we compare old and new classification criteria, discussing their application and pinpointing their limitations in the management of patients. Moreover, we will focus on current and novel biomarkers for SLE diagnosis, highlighting their predictive value and applicability in clinical practice. Expert commentary: SLE diagnosis still represents a challenge, remaining largely based on a clinical judgment. Besides SLE diagnosis, even its classification is still challenging to date. Indeed, although classification of SLE seems to be achieved more frequently with the 2012 SLICC criteria than with the previous 1997 ACR criteria, this last-updated 2012 set might be improved. Notably, diagnostic and classification criteria should be applied to any subject in the world, and consequently they should include immunological variables validated in different populations, which is still an unmet need. 相似文献
18.
Translation of electroencephalographic (EEG) recordings into control signals for brain–computer interface (BCI) systems needs to be based on a robust classification of the various types of information. EEG-based BCI features are often noisy and likely to contain outliers. This contribution describes the application of a fuzzy support vector machine (FSVM) with a radial basis function kernel for classifying motor imagery tasks, while the statistical features over the set of the wavelet coefficients were extracted to characterize the time–frequency distribution of EEG signals. In the proposed FSVM classifier, a low fraction of support vectors was used as a criterion for choosing the kernel parameter and the trade-off parameter, together with the membership parameter based solely on training data. FSVM and support vector machine (SVM) classifiers outperformed the winner of the BCI Competition 2003 and other similar studies on the same Graz dataset, in terms of the competition criterion of the mutual information (MI), while the FSVM classifier yielded a better performance than the SVM approach. FSVM and SVM classifiers perform much better than the winner of the BCI Competition 2005 on the same Graz dataset for the subject O3 according to the competition criterion of the maximal MI steepness, while the FSVM classifier outperforms the SVM method. The proposed FSVM model has potential in reducing the effects of noise or outliers in the online classification of EEG signals in BCIs. 相似文献
19.
ECG heartbeat type detection and classification are regarded as important procedures since they can significantly help to
provide an accurate automated diagnosis. This paper addresses the specific problem of detecting atrial premature beats, that
had been demonstrated to be a marker for stroke risk or cardiac arrhythmias. The proposed methodology consists of a stage
to estimate characteristics such as morphology of P wave and QRS complex as well as indices of prematurity and a non-supervised
stage used by the algorithm J-means to separate heartbeat feature vectors into classes. Partition initialization is carried out by a Max–Min approach.
Experimental data set is taken from MIT-BIH arrhythmia database. Results evidence the reliability of the method since achieved
sensitivity and specificity are high, 92.9 and 99.6%, respectively, for an average output number of 12 discovered clusters
that can be considered as appropriate value to separate heartbeat classes from recordings. 相似文献
20.
Summary
Rhizopus niveus has been transformed to blasticidin S resistance by vectors containing the bacterial blasticidin S resistance gene under the control of a Rhizopus promoter. Southern analysis of the total DNA from transformants indicated that the introduced DNA was rearranged, and that one of the transformants harbored extrachromosomal plasmids with rearranged DNA. Using this transformation system, the introduction of pUBSR101, a plasmid carrying the Escherichia coli lacZ gene fused to the promoter and the N-terminal region of the R. niveus aspartic proteinase-II ( RNAP-II) gene, resulted in an increase of -galactosidase activity in the cell extract, indicating expression of the lacZ fusion gene in R. niveus. This is the first report of a transformation system for filamentous fungi using the blasticidin S resistance gene as a dominant selectable marker. 相似文献
|