首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Evolutionary computing for knowledge discovery in medical diagnosis   总被引:6,自引:0,他引:6  
One of the major challenges in medical domain is the extraction of comprehensible knowledge from medical diagnosis data. In this paper, a two-phase hybrid evolutionary classification technique is proposed to extract classification rules that can be used in clinical practice for better understanding and prevention of unwanted medical events. In the first phase, a hybrid evolutionary algorithm (EA) is utilized to confine the search space by evolving a pool of good candidate rules, e.g. genetic programming (GP) is applied to evolve nominal attributes for free structured rules and genetic algorithm (GA) is used to optimize the numeric attributes for concise classification rules without the need of discretization. These candidate rules are then used in the second phase to optimize the order and number of rules in the evolution for forming accurate and comprehensible rule sets. The proposed evolutionary classifier (EvoC) is validated upon hepatitis and breast cancer datasets obtained from the UCI machine-learning repository. Simulation results show that the evolutionary classifier produces comprehensible rules and good classification accuracy for the medical datasets. Results obtained from t-tests further justify its robustness and invariance to random partition of datasets.  相似文献   

2.
This paper concerns an application of evolutionary feature weighting for diagnosis support in neuropathology. The original data in the classification task are the microscopic images of ten classes of central nervous system (CNS) neuroepithelial tumors. These images are segmented and described by the features characterizing regions resulting from the segmentation process. The final features are in part irrelevant. Thus, we employ an evolutionary algorithm to reduce the number of irrelevant attributes, using the predictive accuracy of a classifier ('wrapper' approach) as an individual's fitness measure. The novelty of our approach consists in the application of evolutionary algorithm for feature weighting, not only for feature selection. The weights obtained give quantitative information about the relative importance of the features. The results of computational experiments show a significant improvement of predictive accuracy of the evolutionarily found feature sets with respect to the original feature set.  相似文献   

3.
DNA microarray experiments generating thousands of gene expression measurements, are used to collect information from tissue and cell samples regarding gene expression differences that could be useful for diagnosis disease, distinction of the specific tumor type, etc. One important application of gene expression microarray data is the classification of samples into known categories. As DNA microarray technology measures the gene expression en masse, this has resulted in data with the number of features (genes) far exceeding the number of samples. As the predictive accuracy of supervised classifiers that try to discriminate between the classes of the problem decays with the existence of irrelevant and redundant features, the necessity of a dimensionality reduction process is essential. We propose the application of a gene selection process, which also enables the biology researcher to focus on promising gene candidates that actively contribute to classification in these large scale microarrays. Two basic approaches for feature selection appear in machine learning and pattern recognition literature: the filter and wrapper techniques. Filter procedures are used in most of the works in the area of DNA microarrays. In this work, a comparison between a group of different filter metrics and a wrapper sequential search procedure is carried out. The comparison is performed in two well-known DNA microarray datasets by the use of four classic supervised classifiers. The study is carried out over the original-continuous and three-intervals discretized gene expression data. While two well-known filter metrics are proposed for continuous data, four classic filter measures are used over discretized data. The same wrapper approach is used for both continuous and discretized data. The application of filter and wrapper gene selection procedures leads to considerably better accuracy results in comparison to the non-gene selection approach, coupled with interesting and notable dimensionality reductions. Although the wrapper approach mainly shows a more accurate behavior than filter metrics, this improvement is coupled with considerable computer-load necessities. We note that most of the genes selected by proposed filter and wrapper procedures in discrete and continuous microarray data appear in the lists of relevant-informative genes detected by previous studies over these datasets. The aim of this work is to make contributions in the field of the gene selection task in DNA microarray datasets. By an extensive comparison with more popular filter techniques, we would like to make contributions in the expansion and study of the wrapper approach in this type of domains.  相似文献   

4.
In the study, an efficient method to perform supervised classification of surface electromyogram (EMG) signals is proposed. The method is based on the choice of a relevant representation space and its optimisation with respect to a training set. As EMG signals are the summation of compact-support waveforms (the motor unit action potentials), a natural tool for their representation is the discrete dyadic wavelet transform. The feature space was thus built from the marginals of a discrete wavelet decomposition. The mother wavelet was designed to minimise the probability of classification error estimated on the learning set (supervised classification). As a representative example, the method was applied to simulate surface EMG signals generated by motor units with different degrees of short-term synchronisation. The proposed approach was able to distinguish surface EMG signals with degrees of synchronisation that differed by 10%, with a misclassification rate of 8%. The performance of a spectral-based classification (error rate approximately 33%) and of the classification with Daubechies wavelet (21%) was significantly poorer than with the proposed wavelet optimisation. The method can be used for a number of different application fields of surface EMG classification, as the feature space is adapted to the characteristics of the signal that discriminate between classes. An erratum to this article is available at .  相似文献   

5.
OBJECTIVE: In the present paper, we describe an application of case-based retrieval to the domain of end stage renal failure patients, treated with hemodialysis. MATERIALS AND METHODS: Defining a dialysis session as a case, retrieval of past similar cases has to operate both on static and on dynamic features, since most of the monitoring variables of a dialysis session are time series. Retrieval is then articulated as a two-step procedure: (1) classification, based on static features and (2) intra-class retrieval, in which dynamic features are considered. As regards step (2), we concentrate on a classical dimensionality reduction technique for time series allowing for efficient indexing, namely discrete Fourier transform (DFT). Thanks to specific index structures (i.e. k -d trees), range queries (on local feature similarity) can be efficiently performed on our case base, allowing the physician to examine the most similar stored dialysis sessions with respect to the current one. RESULTS: The retrieval tool has been positively tested on real patients' data, coming from the nephrology and dialysis unit of the Vigevano hospital, in Italy. CONCLUSIONS: The overall system can be seen as a means for supporting quality assessment of the hemodialysis service, providing a useful input from the knowledge management perspective.  相似文献   

6.
We present a general approach for evaluating and visualizing evolutionary dynamics of self-replicators using a graph-based representation for genealogy. Through a transformation from the space of species and mutations to the space of nodes and links, evolutionary dynamics are understood as a flow in graph space. A formalism is introduced to quantify such genealogical flows in terms of the complete history of localized evolutionary events recorded at the finest level of detail. Represented in a multidimensional viewing space, collective dynamical properties of an evolving genealogy are characterized in the form of aggregate flows. We demonstrate the effectiveness of this approach by using it to compare the evolutionary exploration behavior of self-replicating loops under two different environmental settings.  相似文献   

7.
Brain–computer interfacing (BCI) has been the most researched technology in neuroprosthesis in the last two decades. Feature extractors and classifiers play an important role in BCI research for the generation of suitable control signals to drive an assistive device. Due to the high dimensionality of feature vectors in practical BCI systems, implantation of efficient feature selection algorithms has been an integral area of research in the past decade. This article proposes an efficient feature selection technique, realized by means of an evolutionary algorithm, which attempts to overcome some of the shortcomings of several state-of-the-art approaches in this field. The outlined scheme produces a subset of salient features which improves the classification accuracy while maintaining a trade-off with the computational speed of the complete scheme. For this purpose, an efficient memetic algorithm has also been proposed for the optimization purpose. Extensive experimental validations have been conducted on two real-world datasets to establish the efficacy of our approach. We have compared our approach to existing algorithms and have established the superiority of our algorithm to the rest.  相似文献   

8.
An accurate and computationally efficient means of classifying surface myoelectric signal patterns has been the subject of considerable research effort in recent years. Effective feature extraction is crucial to reliable classification and, in the quest to improve the accuracy of transient myoelectric signal pattern classification, an ensemble of time-frequency based representations are proposed. It is shown that feature sets based upon the short-time Fourier transform, the wavelet transform, and the wavelet packet transform provide an effective representation for classification, provided that they are subject to an appropriate form of dimensionality reduction.  相似文献   

9.
The chief goal of the present study is to examine the validity of a dimensional approach to the classification of depressive disorders. Each of the the major diagnostic criteria for depression including symptoms, duration and frequency is examined with respect to a series of clinical validators. The sample is comprised of a cohort of 591 individuals from the total population of 18-19 year olds in Zurich, Switzerland who were followed for a period of 15 years. The results revealed that: (1) depression may be better represented on a continuum than as a discrete category; (2) there is a direct relationship between the number of symptoms of depression, the frequency and the duration of depressive episodes and indicators of validity of depression; and (3) in addition to the number of symptoms of depression, a combination of frequency and duration criteria enhance the validity of the classification of depression.  相似文献   

10.
ObjectiveThis research is motivated by the issue of classifying illnesses of chronically ill patients for decision support in clinical settings. Our main objective is to propose multi-label classification of multivariate time series contained in medical records of chronically ill patients, by means of quantization methods, such as bag of words (BoW), and multi-label classification algorithms. Our second objective is to compare supervised dimensionality reduction techniques to state-of-the-art multi-label classification algorithms. The hypothesis is that kernel methods and locality preserving projections make such algorithms good candidates to study multi-label medical time series.MethodsWe combine BoW and supervised dimensionality reduction algorithms to perform multi-label classification on health records of chronically ill patients. The considered algorithms are compared with state-of-the-art multi-label classifiers in two real world datasets. Portavita dataset contains 525 diabetes type 2 (DT2) patients, with co-morbidities of DT2 such as hypertension, dyslipidemia, and microvascular or macrovascular issues. MIMIC II dataset contains 2635 patients affected by thyroid disease, diabetes mellitus, lipoid metabolism disease, fluid electrolyte disease, hypertensive disease, thrombosis, hypotension, chronic obstructive pulmonary disease (COPD), liver disease and kidney disease. The algorithms are evaluated using multi-label evaluation metrics such as hamming loss, one error, coverage, ranking loss, and average precision.ResultsNon-linear dimensionality reduction approaches behave well on medical time series quantized using the BoW algorithm, with results comparable to state-of-the-art multi-label classification algorithms. Chaining the projected features has a positive impact on the performance of the algorithm with respect to pure binary relevance approaches.ConclusionsThe evaluation highlights the feasibility of representing medical health records using the BoW for multi-label classification tasks. The study also highlights that dimensionality reduction algorithms based on kernel methods, locality preserving projections or both are good candidates to deal with multi-label classification tasks in medical time series with many missing values and high label density.  相似文献   

11.
IntroductionNumeric time series are present in a very wide range of domains, including many branches of medicine. Data mining techniques have proved to be useful for knowledge discovery in this type of data and for supporting decision-making processes.ObjectivesThe overall objective is to classify time series based on the discovery of frequent patterns. These patterns will be discovered in symbolic sequences obtained from the time series data by means of a temporal abstraction process.MethodsFirstly, we transform numeric time series into symbolic time sequences, where the symbols aim to represent the relevant domain concepts. These symbols can be defined using either public or expert domain knowledge. Then we apply a symbolic pattern discovery technique to the output symbolic sequences. This technique identifies the subsequences frequently found in a population group. These subsequences (patterns) are representative of population groups. Finally, we employ a classification technique based on the identified patterns in order to classify new individuals. Thanks to the inclusion of domain knowledge, the classification results can be explained using domain terminology. This makes the results easier to interpret for the domain specialist (physician).ResultsThis method has been applied to brainstem auditory evoked potentials (BAEPs) time series. Preliminary experiments were carried out to analyse several aspects of the method including the best configuration of the pattern discovery technique parameters. We then applied the method to the BAEPs of 83 individuals belonging to four classes (healthy, conductive hearing loss, vestibular schwannoma—brainstem involvement and vestibular schwannoma—8th-nerve involvement). According to the results of the cross-validation, overall accuracy was 99.4%, sensitivity (recall) was 97.6% and specificity was 100% (no false positives).ConclusionThe proposed method effectively reduces dimensionality. Additionally, if the symbolic transformation includes the right domain knowledge, the method arguably outputs a data representation that denotes the relevant domain concepts more clearly. The method is capable of finding patterns in BAEPs time series and is very accurate at correctly predicting whether or not new patients have an auditory-related disorder.  相似文献   

12.
The genomic tree as revealed from whole proteome comparisons.   总被引:15,自引:4,他引:11  
The availability of a number of complete cellular genome sequences allows the development of organisms' classification, taking into account their genome content, the loss or acquisition of genes, and overall gene similarities as signatures of common ancestry. On the basis of correspondence analysis and hierarchical classification methods, a methodological framework is introduced here for the classification of the available 20 completely sequenced genomes and partial information for Schizosaccharomyces pombe, Homo sapiens, and Mus musculus. The outcome of such an analysis leads to a classification of genomes that we call a genomic tree. Although these trees are phenograms, they carry with them strong phylogenetic signatures and are remarkably similar to 16S-like rRNA-based phylogenies. Our results suggest that duplication and deletion events that took place through evolutionary time were globally similar in related organisms. The genomic trees presented here place the Archaea in the proximity of the Bacteria when the whole gene content of each organism is considered, and when ancestral gene duplications are eliminated. Genomic trees represent an additional approach for the understanding of evolution at the genomic level and may contribute to the proper assessment of the evolutionary relationships between extant species.  相似文献   

13.
This study investigates the effect of the feature dimensionality reduction strategies on the classification of surface electromyography (EMG) signals toward developing a practical myoelectric control system. Two dimensionality reduction strategies, feature selection and feature projection, were tested on both EMG feature sets, respectively. A feature selection based myoelectric pattern recognition system was introduced to select the features by eliminating the redundant features of EMG recordings instead of directly choosing a subset of EMG channels. The Markov random field (MRF) method and a forward orthogonal search algorithm were employed to evaluate the contribution of each individual feature to the classification, respectively. Our results from 15 healthy subjects indicate that, with a feature selection analysis, independent of the type of feature set, across all subjects high overall accuracies can be achieved in classification of seven different forearm motions with a small number of top ranked original EMG features obtained from the forearm muscles (average overall classification accuracy >95% with 12 selected EMG features). Compared to various feature dimensionality reduction techniques in myoelectric pattern recognition, the proposed filter-based feature selection approach is independent of the type of classification algorithms and features, which can effectively reduce the redundant information not only across different channels, but also cross different features in the same channel. This may enable robust EMG feature dimensionality reduction without needing to change ongoing, practical use of classification algorithms, an important step toward clinical utility.  相似文献   

14.
The discrimination of ventricular tachycardias with 1:1 retrograde conduction from sinus tachycardia still remains a challenge for rate based algorithms commonly used in dual-chamber implantable cardioverter defibrillators. Morphology based analysis techniques for a classification of antegrade and retrograde atrial activation patterns can be used to cope with this problem. Here time-domain template matching techniques are known approaches. However, a time-domain representation of endocardial electrograms is not optimal for classification tasks as the dimensionality of the underlying signal space is high and features being irrelevant for a signal characterization are involved in the analysis. Therefore, the aim of this study is to develop an enhanced morphological analysis tool for a classification of antegrade and retrograde atrial activation by using a transform domain representation of endocardial electrograms. For this, we applied an adapted wavelet-packet decomposition to extract discriminating features in endocardial electrograms representing antegrade and retrograde activation patterns. Further, a feed-forward neural network was utilized to produce a classification based on the extracted information. In using our hybrid method, no false classification of the physiological and pathological cardiac state was made. It is concluded that the proposed classification scheme represents a highly efficient approach for a classification of antegrade and retrograde atrial activation.  相似文献   

15.
The process of neurite outgrowth is critically dependent on proper microtubule assembly. However, characterizing the dynamics of microtubule assembly and their quantitative relationship to neurite outgrowth is a difficult task. The difficulty can be reduced by using time series analysis which has broad application in characterizing the dynamics of stochastic, or “noisy,” behaviors. Here we apply time series analysis to quantitatively compare simulated microtubule assembly and neurite outgrowth in vitro. Microtubule length life histories were simulated assuming constant growth and shrinkage rates coupled with random selection of growth and shrinkage times, a formulation based on the dynamic instability model of microtubules assembly. Net length displacements of simulated microtubules were calculated at discrete, evenly spaced times, and the resulting time series were characterized by both spectral and autocorrelation analysis. Depending on the sampling rate and the dynamic parameters, simulated microtubules exhibited significant autocorrelation and periodicity. To make a comparison to neurite outgrowth, we characterized the dynamic behavior of simulated microtubule populations and found it was not significantly different from that of single microtubules. The net displacements of rat superior cervical ganglion neurite tips were measured and characterized using time series methods. Their behavior was consistent with the microtubule dynamics for appropriate simulation parameters and sampling rates. Our results show that time series analysis can provide a useful tool for quantitative characterization of microtubule dynamics and neurite outgrowth and for assessing the relationship between them.  相似文献   

16.
We present a novel approach for automatically, accurately and reliably determining the 3D motion of the cervical spine from a series of stereo or biplane radiographic images. These images could be acquired through a variety of different imaging hardware configurations. We follow a hierarchical, anatomically-aware, multi-bone approach that takes into account the complex structure of cervical vertebrae and inter-vertebrae overlapping, as well as the temporal coherence in the imaging series. These significant innovations improve the speed, accuracy, reliability and flexibility of the tracking process. Evaluation on cervical data shows that the approach is as accurate (average precision 0.3 mm and 1°) as the expert human-operator driven method that was previously state of the art. However, unlike the previously used method, the hierarchical approach is automatic and robust; even in the presence of implanted hardware. Therefore, the method has solid potential for clinical use to evaluate the effectiveness of surgical interventions.  相似文献   

17.
Medical applications are often characterized by a large number of disease markers and a relatively small number of data records. We demonstrate that complete feature ranking followed by selection can lead to appreciable reductions in data dimensionality, with significant improvements in the implementation and performance of classifiers for medical diagnosis. We describe a novel approach for ranking all features according to their predictive quality using properties unique to learning algorithms based on the group method of data handling (GMDH). An abductive network training algorithm is repeatedly used to select groups of optimum predictors from the feature set at gradually increasing levels of model complexity specified by the user. Groups selected earlier are better predictors. The process is then repeated to rank features within individual groups. The resulting full feature ranking can be used to determine the optimum feature subset by starting at the top of the list and progressively including more features until the classification error rate on an out-of-sample evaluation set starts to increase due to overfitting. The approach is demonstrated on two medical diagnosis datasets (breast cancer and heart disease) and comparisons are made with other feature ranking and selection methods. Receiver operating characteristics (ROC) analysis is used to compare classifier performance. At default model complexity, dimensionality reduction of 22 and 54% could be achieved for the breast cancer and heart disease data, respectively, leading to improvements in the overall classification performance. For both datasets, considerable dimensionality reduction introduced no significant reduction in the area under the ROC curve. GMDH-based feature selection results have also proved effective with neural network classifiers.  相似文献   

18.
OBJECTIVE: To demonstrate and compare the application of different genetic programming (GP) based intelligent methodologies for the construction of rule-based systems in two medical domains: the diagnosis of aphasia's subtypes and the classification of pap-smear examinations. MATERIAL: Past data representing (a) successful diagnosis of aphasia's subtypes from collaborating medical experts through a free interview per patient, and (b) correctly classified smears (images of cells) by cyto-technologists, previously stained using the Papanicolaou method. METHODS: Initially a hybrid approach is proposed, which combines standard genetic programming and heuristic hierarchical crisp rule-base construction. Then, genetic programming for the production of crisp rule based systems is attempted. Finally, another hybrid intelligent model is composed by a grammar driven genetic programming system for the generation of fuzzy rule-based systems. RESULTS: Results denote the effectiveness of the proposed systems, while they are also compared for their efficiency, accuracy and comprehensibility, to those of an inductive machine learning approach as well as to those of a standard genetic programming symbolic expression approach. CONCLUSION: The proposed GP-based intelligent methodologies are able to produce accurate and comprehensible results for medical experts performing competitive to other intelligent approaches. The aim of the authors was the production of accurate but also sensible decision rules that could potentially help medical doctors to extract conclusions, even at the expense of a higher classification score achievement.  相似文献   

19.
Classification analysis of microarray gene expression data has been widely used to uncover biological features and to distinguish closely related cell types that often appear in the diagnosis of cancer. However, the number of dimensions of gene expression data is often very high, e.g., in the hundreds or thousands. Accurate and efficient classification of such high-dimensional data remains a contemporary challenge. In this paper, we propose a comprehensive vertical sample-based KNN/LSVM classification approach with weights optimized by genetic algorithms for high-dimensional data. Experiments on common gene expression datasets demonstrated that our approach can achieve high accuracy and efficiency at the same time. The improvement of speed is mainly related to the vertical data representation, P-tree,Patents are pending on the P-tree technology. This work is partially supported by GSA Grant ACT#:K96130308. and its optimized logical algebra. The high accuracy is due to the combination of a KNN majority voting approach and a local support vector machine approach that makes optimal decisions at the local level. As a result, our approach could be a powerful tool for high-dimensional gene expression data analysis.  相似文献   

20.
目的:针对脑电信号普遍存在的数据维度高、难以预测的问题,提出一种多重分形去趋势波动分析特征提取方法与长短时记忆网络(LSTM)相结合的脑电信号分类方法。方法:首先对信号样本进行多重分形去趋势波动分析计算得到脑电信号样本的多重分形谱,计算广义Hurst指数hq和广义维数Dq之间的函数关系;然后对多重分形谱进行分析,找出最具代表性的坐标值作为信号的特征向量;最后将其用于LSTM进行训练和分类测试。实验采用波恩大学采集的经过处理的癫痫脑电数据集。结果:当训练样本占总体样本比例超过10%之后,LSTM分类器的测试准确率均稳定在98%以上;当占比超过80%时LSTM分类器的测试准确率达到了100%;即使训练样本较少时也有95%之上的准确率。结论:该算法有良好的准确率和稳定性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号