首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Parallel cascade identification (PCI) is a method for approximating the behavior of a nonlinear system, from input/output training data, by constructing a parallel array of cascaded dynamic linear and static nonlinear elements. PCI has previously been shown to provide an effective means for classifying protein sequences into structure/function families. In the present study, PCI is used to distinguish proteins that are binding to adenosine triphosphate or guanine triphosphate molecules from those that are nonbinding. Classification accuracy of 87.1% using the hydrophobicity scale of Rose et al. (Hydrophobicity of amino acid residues in globular proteins. Science 229:834–838, 1985), and 88.8% using Korenberg's SARAH1 scale, are obtained, as measured by tenfold cross-validation testing. Nearest-neighbor and K-nearest-neighbor (KNN) classifiers are constructed, and the resulting accuracy is, respectively, 88.0% and 90.8% on the SARAH1–encoded test data set, as measured by the above testing protocol. Significantly improved classification accuracy is achieved by combining PCI and KNN classifiers using quadratic discriminant analysis: accuracy rises from 87.9% (PCI) and 87.4% (KNN) to 96.5% for the combination, as measured by twofold cross-validation testing on the SARAH1–encoded test data set. © 2003 Biomedical Engineering Society. PAC2003: 8714Ee, 8715Cc, 8715Aa  相似文献   

2.
A recent paper introduced the approach of using nonlinear system identification as a means for automatically classifying protein sequences into their structure/function families. The particular technique utilized, known as parallel cascade identification (PCI), could train classifiers on a very limited set of exemplars from the protein families to be distinguished and still achieve impressively good two-way classifications. For the nonlinear system classifiers to have numerical inputs, each amino acid in the protein was mapped into a corresponding hydrophobicity value, and the resulting hydrophobicity profile was used in place of the primary amino acid sequence. While the ensuing classification accuracy was gratifying, the use of (Rose scale) hydrophobicity values had some disadvantages. These included representing multiple amino acids by the same value, weighting some amino acids more heavily than others, and covering a narrow numerical range, resulting in a poor input for system identification. This paper introduces binary and multilevel sequence codes to represent amino acids, for use in protein classification. The new binary and multilevel sequences, which are still able to encode information such as hydrophobicity, polarity, and charge, avoid the above disadvantages and increase classification accuracy. Indeed, over a much larger test set than in the original study, parallel cascade models using numerical profiles constructed with the new codes achieved slightly higher two-way classification rates than did hidden Markov models (HMMs) using the primary amino acid sequences, and combining PCI and HMM approaches increased accuracy. © 2000 Biomedical Engineering Society. PAC00: 8714Ee, 8715Cc, 3620Fz, 8715Aa  相似文献   

3.
ObjectiveThe profusion of data accumulating in the form of medical records could be of great help for developing medical decision support systems. The objective of this paper is to present a methodology for designing data-driven medical diagnostic tools, based on neural network classifiers.MethodsThe proposed approach adopts the radial basis function (RBF) neural network architecture and the non-symmetric fuzzy means (NSFM) training algorithm, which presents certain advantages including better approximation capabilities and shorter computational times. The novelty in this work consists of adapting the NSFM algorithm to train RBF classifiers, and suitably tailoring the evolutionary simulated annealing (ESA) technique to optimize the produced RBF models. The integration of ESA is critical as it helps the optimization procedure to escape from local minima, which could arise from the application of the traditional simulated annealing algorithm, and thus discover improved solutions. The resulting method is evaluated in nine different medical benchmark datasets, where the common objective is to train a suitable classifier. The evaluation includes a comparison with two different schemes for training classifiers, including a standard RBF training technique and support vector machines (SVMs). Accuracy% and the Matthews Correlation Coefficient (MCC) are used for comparing the performance of the three classifiers.ResultsResults show that the use of ESA helps to greatly improve the performance of the NSFM algorithm and provide satisfactory classification accuracy. In almost all benchmark datasets, the best solution found by the ESA-NSFM algorithm outperforms the results produced by the SFM algorithm and SVMs, considering either the accuracy% or the MCC criterion. Furthermore, in the majority of datasets, the average solution of the ESA-NSFM population is statistically significantly higher in terms of accuracy% and MCC at the 95% confidence level, compared to the global optimum solution that its rivals could achieve. As far as computational times are concerned, the proposed approach was found to be faster compared to SVMs.ConclusionsThe results of this study suggest that the ESA-NSFM algorithm can form the basis of a generic method for knowledge extraction from data originating from different kinds of medical records. Testing the proposed approach on a number of benchmark datasets, indicates that it provides increased diagnostic accuracy in comparison with two different classifier training methods.  相似文献   

4.
Representation, identification, and modeling are investigated for nonlinear biomedical systems. We begin by considering the conditions under which a nonlinear system can be represented or accurately approximated by a Volterra series (or functional expansion). Next, we examine system identification through estimating the kernels in a Volterra functional expansion approximation for the system. A recent kernel estimation technique that has proved to be effective in a number of biomedical applications is investigated as to running time and demonstrated on both clean and noisy data records, then it is used to illustrate identification of cascades of alternating dynamic linear and static nonlinear systems, both single-input single-output and multivariable cascades. During the presentation, we critically examine some interesting biological applications of kernel estimation techniques.  相似文献   

5.
Representation, identification, and modeling are investigated for nonlinear biomedical systems. We begin by considering the conditions under which a nonlinear system can be represented or accurately approximated by a Volterra series (or functional expansion). Next, we examine system identification through estimating the kernels in a Volterra functional expansion approximation for the system. A recent kernel estimation technique that has proved to be effective in a number of biomedical applications is investigated as to running time and demonstrated on both clean and noisy data records, then it is used to illustrate identification of cascades of alternating dynamic linear and static nonlinear systems, both single-input single-output and multivariable cascades. During the presentation, we critically examine some interesting biological applications of kernel estimation techniques  相似文献   

6.
Accuracy plays a vital role in the medical field as it concerns with the life of an individual. Extensive research has been conducted on disease classification and prediction using machine learning techniques. However, there is no agreement on which classifier produces the best results. A specific classifier may be better than others for a specific dataset, but another classifier could perform better for some other dataset. Ensemble of classifiers has been proved to be an effective way to improve classification accuracy. In this research we present an ensemble framework with multi-layer classification using enhanced bagging and optimized weighting. The proposed model called “HM-BagMoov” overcomes the limitations of conventional performance bottlenecks by utilizing an ensemble of seven heterogeneous classifiers. The framework is evaluated on five different heart disease datasets, four breast cancer datasets, two diabetes datasets, two liver disease datasets and one hepatitis dataset obtained from public repositories. The analysis of the results show that ensemble framework achieved the highest accuracy, sensitivity and F-Measure when compared with individual classifiers for all the diseases. In addition to this, the ensemble framework also achieved the highest accuracy when compared with the state of the art techniques. An application named “IntelliHealth” is also developed based on proposed model that may be used by hospitals/doctors for diagnostic advice.  相似文献   

7.
8.
Radiologists are adept at recognizing the character and extent of lung parenchymal abnormalities in computed tomography (CT) scans. However, the inconsistent differential diagnosis due to subjective aggregation necessitates the exploration of automated classification based on supervised or unsupervised learning. The robustness of supervised learning depends on the training samples. Towards optimizing emphysema classification, we introduce a physician-in-the-loop feedback approach to minimize ambiguity in the selected training samples. An experienced thoracic radiologist selected 412 regions of interest (ROIs) across 15 datasets to represent 124, 129, 139 and 20 training samples of mild, moderate, severe emphysema and normal appearance, respectively. Using multi-view (multiple metrics to capture complementary features) inductive learning, an ensemble of seven un-optimized support vector models (SVM) each based on a specific metric was constructed in less than 6 s. The training samples were classified using seven SVM models and consensus labels were created using majority voting. In the active relearning phase, the ensemble-expert label conflicts were resolved by the expert. The efficacy and generality of active relearning feedback was assessed in the optimized parameter space of six general purpose classifiers across the seven dissimilarity metrics. The proposed just-in-time active relearning feedback with un-optimized SVMs yielded 15 % increase in classification accuracy and 25 % reduction in the number of support vectors. The average improvement in accuracy of six classifiers in their optimized parameter space was 21 %. The proposed cooperative feedback method enhances the quality of training samples used to construct automated classification of emphysematous CT scans. Such an approach could lead to substantial improvement in quantification of emphysema.  相似文献   

9.
We consider the problem of classification in noisy, high-dimensional, and class-imbalanced protein datasets. In order to design a complete classification system, we use a three-stage machine learning framework consisting of a feature selection stage, a method addressing noise and class-imbalance, and a method for combining biologically related tasks through a prior-knowledge based clustering. In the first stage, we employ Fisher's permutation test as a feature selection filter. Comparisons with the alternative criteria show that it may be favorable for typical protein datasets. In the second stage, noise and class imbalance are addressed by using minority class over-sampling, majority class under-sampling, and ensemble learning. The performance of logistic regression models, decision trees, and neural networks is systematically evaluated. The experimental results show that in many cases ensembles of logistic regression classifiers may outperform more expressive models due to their robustness to noise and low sample density in a high-dimensional feature space. However, ensembles of neural networks may be the best solution for large datasets. In the third stage, we use prior knowledge to partition unlabeled data such that the class distributions among non-overlapping clusters significantly differ. In our experiments, training classifiers specialized to the class distributions of each cluster resulted in a further decrease in classification error.  相似文献   

10.
Previously, we introduced a distance (similarity)-based mapping for the visualization of high-dimensional patterns and their relative relationships. The mapping preserves exactly the original distances from all points to any two reference patterns in a special two-dimensional coordinate system, the relative distance plane (RDP). We extend the RDP mapping's applicability from visualization to classification. Several of the classifiers use the RDP directly. These include the standard linear discriminant analysis (LDA), nearest neighbor classifiers, and a transvariation probabilities-based classification method that is natural in the RDP. Several reference directions can also be combined to create new coordinate systems in which arbitrary classifiers can be developed. We obtain increased confidence in the classification results by cycling through all possible reference pairs and computing a misclassification-based weighted accuracy. The classification results on several high-dimensional biomedical datasets are compared.  相似文献   

11.
Extreme learning machine (ELM) is an effective machine learning technique with simple theory and fast implementation, which has gained increasing interest from various research fields recently. A new method that combines ELM with probabilistic model method is proposed in this paper to classify the electroencephalography (EEG) signals in synchronous brain–computer interface (BCI) system. In the proposed method, the softmax function is used to convert the ELM output to classification probability. The Chernoff error bound, deduced from the Bayesian probabilistic model in the training process, is adopted as the weight to take the discriminant process. Since the proposed method makes use of the knowledge from all preceding training datasets, its discriminating performance improves accumulatively. In the test experiments based on the datasets from BCI competitions, the proposed method is compared with other classification methods, including the linear discriminant analysis, support vector machine, ELM and weighted probabilistic model methods. For comparison, the mutual information, classification accuracy and information transfer rate are considered as the evaluation indicators for these classifiers. The results demonstrate that our method shows competitive performance against other methods.  相似文献   

12.
This paper considers some aspects of the application of control theory to endocrine regulation and control. Consideration is given to both the structural and functional aspects of various control concepts and ideas in this context. For single-input, single-output feedback control structures, emphasis is placed on loop gain and its importance in establishing the functional capability of such structures. Examples are given of both functional and nonfunctional feedback structures proposed by endocrinologists. For multi-input, multi-output structures, emphasis is placed on the concept of input and output decoupling, and the possible applicability of these concepts to hormonal control system interrelationships is illustrated. Finally, the possible application of optimal control theory to endocrine regulation and control is illustrated by means of a naive and highly simplified example involving control of the thyroid gland by the pituitary gland, and several surprising and interesting implications are shown to be implicit in the resulting control structure. Some of the material in this paper was developed while the author was an MHTP Fellow, Department of Psychiatry and Laboratory of Environmental Neurobiology, UCLA School of Medicine.  相似文献   

13.
Many of the current procedures for detecting coding regions on human DNA sequences combine a number of individual techniques such as discriminant analysis and neural net methods. Recent papers have used techniques from nonlinear systems identification, in particular, parallel cascade identification (PCI), as one means for classifying protein sequences into their structure/function groups. In the present paper, PCI is used in a pilot study to distinguish exon (coding) from intron (noncoding; interspersed within genes) human DNA sequences. Only the first exon and first intron sequences with known boundaries in genomic DNA from the T-cell receptor locus were used for training. Then, the parallel cascade classifiers were able to achieve classification rates of about 89% on novel sequences in a test set, and averaged about 82% when results of a blind test were included. In testing over a much wider range of human nucleotide sequences, PCI classifiers averaged 83.6% correct classifications. These results indicate that parallel cascade classifiers may be useful components in future coding region detection programs. © 2002 Biomedical Engineering Society. PAC2002: 8715Cc, 8714Gg, 8715Aa  相似文献   

14.
目的 蛋白转导域(PTD)是一类能携带分子穿越细胞膜的短肽,利用支持向量机对多肽片段PTD进行预测.方法 对来源于SwissProt数据库的多肽序列用68个特征值描述其整体和局部的理化特性以及空间结构特征,利用支持向量机(SVM)和直推式支持向量机(TSVM)并结合聚类的方法进行PTD的预测.结果 5次交叉验证的结果显示,TSVM的预测准确率达到(94±4)%,SVM预测准确率达到(94±5)%.2种预测方法共同预测了1210个可能的PTD片段.结论 TSVM和SVM均显示了很好的预测性能,预测的PTD为实验方法有目的 地发现、确认PT提供了基础.  相似文献   

15.
The goal of this study is to evaluate the efficacy of deep convolutional neural networks (DCNNs) in differentiating subtle, intermediate, and more obvious image differences in radiography. Three different datasets were created, which included presence/absence of the endotracheal (ET) tube (n = 300), low/normal position of the ET tube (n = 300), and chest/abdominal radiographs (n = 120). The datasets were split into training, validation, and test. Both untrained and pre-trained deep neural networks were employed, including AlexNet and GoogLeNet classifiers, using the Caffe framework. Data augmentation was performed for the presence/absence and low/normal ET tube datasets. Receiver operating characteristic (ROC), area under the curves (AUC), and 95% confidence intervals were calculated. Statistical differences of the AUCs were determined using a non-parametric approach. The pre-trained AlexNet and GoogLeNet classifiers had perfect accuracy (AUC 1.00) in differentiating chest vs. abdominal radiographs, using only 45 training cases. For more difficult datasets, including the presence/absence and low/normal position endotracheal tubes, more training cases, pre-trained networks, and data-augmentation approaches were helpful to increase accuracy. The best-performing network for classifying presence vs. absence of an ET tube was still very accurate with an AUC of 0.99. However, for the most difficult dataset, such as low vs. normal position of the endotracheal tube, DCNNs did not perform as well, but achieved a reasonable AUC of 0.81.  相似文献   

16.
Successful secondary structure predictions provide a starting point for direct tertiary structure modelling, and also can significantly improve sequence analysis and sequence-structure threading for aiding in structure and function determination. Hence the improvement of predictive accuracy of the secondary structure prediction becomes essential for future development of the whole field of protein research. In this work we present several multi-classifiers that combine the predictions of the best current classifiers available on Internet. Our results prove that combining the predictions of a set of classifiers by creating composite classifiers is a fruitful one. We have created multi-classifiers that are more accurate than any of the component classifiers. The multi-classifiers are based on Bayesian networks. They are validated with 9 different datasets. Their predictive accuracy results outperform the best secondary structure predictors by 1.21% on average. Our main contributions are: (i) we improved the best know predictive accuracy by 1.21%, (ii) our best results have been obtained with a new semi na?ve Bayes approach named Pazzani-EDA and (iii) our multi-classifiers combine results of previously build classifiers predictions obtained through Internet, thanks to our development of a Java application.  相似文献   

17.
目的:旨在建立一种基于18F-FDG PET/CT的临床—影像组学相结合的综合模型用于区分非小细胞肺癌中的腺癌和鳞癌。方法:回顾性收集上海交通大学附属胸科医院120例经病理学验证为腺癌(65例)和鳞癌(55例)的患者,从预处理的CT图像和PET图像中分别提取1218、108个影像组学特征,并纳入10个临床特征因素;卡方检验和Wilcoxon检验用于对临床特征进行筛选,并使用Relief算法和最小绝对收缩和选择算子(LASSO)对影像组学特征进行筛选;通过6种机器学习分类器分别建立临床、影像组学、综合模型。通过受试者工作特征(ROC)曲线及曲线下面积(AUC)来评价模型的分类能力。结果:综合模型在训练集和测试集中均表现出最高的AUC值和准确率,其中随机森林(RF)和Bagging分类器表现出的分类效果最佳。经五折交叉验证后,训练集中RF和Bagging的AUC值和准确率分别为0.92±0.03、0.86±0.06和0.92±0.02、0.83±0.02;测试集中RF和Bagging的AUC值和准确率分别为0.92、0.81和0.91、0.86。结论:结合1...  相似文献   

18.
A new approach based on adaptive neuro-fuzzy inference system (ANFIS) was presented for detection of erythemato-squamous diseases. The domain contained records of patients with known diagnosis. Given a training set of such records, the ANFIS classifiers learned how to differentiate a new case in the domain. The six ANFIS classifiers were used to detect the six erythemato-squamous diseases when 34 features defining six disease indications were used as inputs. To improve diagnostic accuracy, the seventh ANFIS classifier (combining ANFIS) was trained using the outputs of the six ANFIS classifiers as input data. The proposed ANFIS model combined the neural network adaptive capabilities and the fuzzy logic qualitative approach. Some conclusions concerning the impacts of features on the detection of erythemato-squamous diseases were obtained through analysis of the ANFIS. The performances of the ANFIS model were evaluated in terms of training performances and classification accuracies and the results confirmed that the proposed ANFIS model has some potential in detecting the erythemato-squamous diseases. The ANFIS model achieved accuracy rates which were higher than that of the stand-alone neural network model.  相似文献   

19.
OBJECTIVE: The auditory brainstem response (ABR) is an evoked response obtained from brain electrical activity when an auditory stimulus is applied to the ear. An audiologist can determine the threshold level of hearing by applying stimuli at reducing levels of intensity, and can also diagnose various otological, audiological, and neurological abnormalities by examining the morphology of the waveform and the latencies of the individual waves. This is a subjective process requiring considerable expertise. The aim of this research was to develop software classification models to assist the audiologist with an automated detection of the ABR waveform and also to provide objectivity and consistency in this detection. MATERIALS AND METHODS: The dataset used in this study consisted of 550 waveforms derived from tests using a range of stimulus levels applied to 85 subjects ranging in hearing ability. Each waveform had been classified by a human expert as 'response=Yes' or 'response=No'. Individual software classification models were generated using time, frequency and cross-correlation measures. Classification employed both artificial neural networks (NNs) and the C5.0 decision tree algorithm. Accuracies were validated using six-fold cross-validation, and by randomising training, validation and test datasets. RESULTS: The result was a two stage classification process whereby strong responses were classified to an accuracy of 95.6% in the first stage. This used a ratio of post-stimulus to pre-stimulus power in the time domain, with power measures at 200, 500 and 900Hz in the frequency domain. In the second stage, outputs from time, frequency and cross-correlation classifiers were combined using the Dempster-Shafer method to produce a hybrid model with an accuracy of 85% (126 repeat waveforms). CONCLUSION: By combining the different approaches a hybrid system has been created that emulates the approach used by an audiologist in analysing an ABR waveform. Interpretation did not rely on one particular feature but brought together power and frequency analysis as well as consistency of subaverages. This provided a system that enhanced robustness to artefacts while maintaining classification accuracy.  相似文献   

20.
Brain signal variation across different subjects and sessions significantly impairs the accuracy of most brain–computer interface (BCI) systems. Herein, we present a classification algorithm that minimizes such variation, using linear programming support-vector machines (LP-SVM) and their extension to multiple kernel learning methods. The minimization is based on the decision boundaries formed in classifiers’ feature spaces and their relation to BCI variation. Specifically, we estimate subject/session-invariant features in the reproducing kernel Hilbert spaces (RKHS) induced with Gaussian kernels. The idea is to construct multiple subject/session-dependent RKHS and to perform classification with LP-SVMs. To evaluate the performance of the algorithm, we applied it to oxy-hemoglobin data sets acquired from eight sessions and seven subjects as they performed two different mental tasks. Results show that our classifiers maintain good performance when applied to random patterns across varying sessions/subjects.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号