首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到15条相似文献,搜索用时 15 毫秒
1.
Nonlinear independent component analysis is combined with diffusion-map data analysis techniques to detect good observables in high-dimensional dynamic data. These detections are achieved by integrating local principal component analysis of simulation bursts by using eigenvectors of a Markov matrix describing anisotropic diffusion. The widely applicable procedure, a crucial step in model reduction approaches, is illustrated on stochastic chemical reaction network simulations.  相似文献   

2.
Compressive strength is the most significant metric to evaluate the mechanical properties of concrete. Machine Learning (ML) methods have shown promising results for predicting compressive strength of concrete. However, at present, no in-depth studies have been devoted to the influence of dimensionality reduction on the performance of different ML models for this application. In this work, four representative ML models, i.e., Linear Regression (LR), Support Vector Regression (SVR), Extreme Gradient Boosting (XGBoost), and Artificial Neural Network (ANN), are trained and used to predict the compressive strength of concrete based on its mixture composition and curing age. For each ML model, three kinds of features are used as input: the eight original features, six Principal Component Analysis (PCA)-selected features, and six manually selected features. The performance as well as the training speed of those four ML models with three different kinds of features is assessed and compared. Based on the obtained results, it is possible to make a relatively accurate prediction of concrete compressive strength using SVR, XGBoost, and ANN with an R-square of over 0.9. When using different features, the highest R-square of the test set occurs in the XGBoost model with manually selected features as inputs (R-square = 0.9339). The prediction accuracy of the SVR model with manually selected features (R-square = 0.9080) or PCA-selected features (R-square = 0.9134) is better than the model with original features (R-square = 0.9003) without dramatic running time change, indicating that dimensionality reduction has a positive influence on SVR model. For XGBoost, the model with PCA-selected features shows poorer performance (R-square = 0.8787) than XGBoost model with original features or manually selected features. A possible reason for this is that the PCA-selected features are not as distinguishable as the manually selected features in this study. In addition, the running time of XGBoost model with PCA-selected features is longer than XGBoost model with original features or manually selected features. In other words, dimensionality reduction by PCA seems to have an adverse effect both on the performance and the running time of XGBoost model. Dimensionality reduction has an adverse effect on the performance of LR model and ANN model because the R-squares on test set of those two models with manually selected features or PCA-selected features are lower than models with original features. Although the running time of ANN is much longer than the other three ML models (less than 1s) in three scenarios, dimensionality reduction has an obviously positive influence on running time without losing much prediction accuracy for ANN model.  相似文献   

3.
A critical problem faced in many scientific fields is the adequate separation of data derived from individual sources. Often, such datasets require analysis of multiple features in a highly multidimensional space, with overlap of features and sources. The datasets generated by simultaneous recording from hundreds of neurons emitting phasic action potentials have produced the challenge of separating the recorded signals into independent data subsets (clusters) corresponding to individual signal-generating neurons. Mathematical methods have been developed over the past three decades to achieve such spike clustering, but a complete solution with fully automated cluster identification has not been achieved. We propose here a fully automated mathematical approach that identifies clusters in multidimensional space through recursion, which combats the multidimensionality of the data. Recursion is paired with an approach to dimensional evaluation, in which each dimension of a dataset is examined for its informational importance for clustering. The dimensions offering greater informational importance are given added weight during recursive clustering. To combat strong background activity, our algorithm takes an iterative approach of data filtering according to a signal-to-noise ratio metric. The algorithm finds cluster cores, which are thereafter expanded to include complete clusters. This mathematical approach can be extended from its prototype context of spike sorting to other datasets that suffer from high dimensionality and background activity.Cluster analysis is important in many fields, ranging from biochemistry (1) to genetics (2) to neuroscience (3, 4). In neuroscience, improved sensors (4) have permitted large increases in the size and dimensionality of recorded datasets. An essential problem remains that the brain contains millions of simultaneously active neurons, emitting action potentials (spikes) with varying frequencies and patterns related to ongoing behavior and brain state (3, 5). The identification of signals from individual neurons, in a sea of brain action potential, is critical. Commonly used four-sensor electrodes (tetrodes) (6, 7) and array recording methods (4) produce large, multidimensional datasets, which then require cluster analysis to separate signals from individual neurons. These expansive datasets present the need for fully automated methods for spike sorting (3, 4) with mathematical calculations and algorithms capable of analyzing multidimensional recordings (8). In particular, there is a need for algorithms than can process data containing overlapping clusters with unclear borders in the persistence of strong background signals.Due to its great complexity, spike sorting currently lacks a well-developed solution (3, 9). The recordings made from many neurons with varying proximity to probes provide no knowledge as to the number of clusters present (6). Also, there are often no clear boundaries between the signals of the different neurons recorded, and the density of neurons varies widely across different regions of the brain and across different recording methods (10). Overlapping clusters and strong background activity, produced by neighboring neurons, as well as the similarity of spike waveforms in given classes of neurons, present different problems for algorithms that rely on matching spike waveforms to templates, principal component analysis (PCA), density, and distance metrics (5). Compounding the complexity of the spike-sorting problem, recordings can involve 10–20 “useful” dimensions, especially those that use tetrodes or multisensor probes (8).Our approach individually solves the three primary challenges of spike sorting: space complexity, cluster overlap, and differing cluster densities, all in the presence of background activity. To combat feature space complexity, our algorithm employs a method of space evaluation, whereby each dimension in the feature space is independently evaluated based on its contribution to the goal of clustering. To overcome the challenge of cluster overlap and bridges, our algorithm includes a system of extensive preprocessing that removes all data except for cluster cores, which are identified by larger spike density relative to their surroundings. Later, during postprocessing, clusters are rebuilt around these cores. Finally, to take into account differing clustering densities and a wide range of signal-to-noise ratio (SNR) in regions of the data spaces, our algorithm introduces a multipass clustering method. Upon each iteration, the algorithm changes its threshold for SNR and removes successful clusters from the data space, thereby simplifying the space and making it more likely to find clusters that are typically difficult to find.  相似文献   

4.
We have developed a mathematical approach to the study of dynamical biological networks, based on combining large-scale numerical simulation with nonlinear "dimensionality reduction" methods. Our work was motivated by an interest in the complex organization of the signaling cascade centered on the neuronal phosphoprotein DARPP-32 (dopamine- and cAMP-regulated phosphoprotein of molecular weight 32,000). Our approach has allowed us to detect robust features of the system in the presence of noise. In particular, the global network topology serves to stabilize the net state of DARPP-32 phosphorylation in response to variation of the input levels of the neurotransmitters dopamine and glutamate, despite significant perturbation to the concentrations and levels of activity of a number of intermediate chemical species. Further, our results suggest that the entire topology of the network is needed to impart this stability to one portion of the network at the expense of the rest. This could have significant implications for systems biology, in that large, complex pathways may have properties that are not easily replicated with simple modules.  相似文献   

5.
In this paper, we present a method for time series analysis based on empirical intrinsic geometry (EIG). EIG enables one to reveal the low-dimensional parametric manifold as well as to infer the underlying dynamics of high-dimensional time series. By incorporating concepts of information geometry, this method extends existing geometric analysis tools to support stochastic settings and parametrizes the geometry of empirical distributions. However, the statistical models are not required as priors; hence, EIG may be applied to a wide range of real signals without existing definitive models. We show that the inferred model is noise-resilient and invariant under different observation and instrumental modalities. In addition, we show that it can be extended efficiently to newly acquired measurements in a sequential manner. These two advantages enable us to revisit the Bayesian approach and incorporate empirical dynamics and intrinsic geometry into a nonlinear filtering framework. We show applications to nonlinear and non-Gaussian tracking problems as well as to acoustic signal localization.  相似文献   

6.
The link between mind, brain, and behavior has mystified philosophers and scientists for millennia. Recent progress has been made by forming statistical associations between manifest variables of the brain (e.g., electroencephalogram [EEG], functional MRI [fMRI]) and manifest variables of behavior (e.g., response times, accuracy) through hierarchical latent variable models. Within this framework, one can make inferences about the mind in a statistically principled way, such that complex patterns of brain–behavior associations drive the inference procedure. However, previous approaches were limited in the flexibility of the linking function, which has proved prohibitive for understanding the complex dynamics exhibited by the brain. In this article, we propose a data-driven, nonparametric approach that allows complex linking functions to emerge from fitting a hierarchical latent representation of the mind to multivariate, multimodal data. Furthermore, to enforce biological plausibility, we impose both spatial and temporal structure so that the types of realizable system dynamics are constrained. To illustrate the benefits of our approach, we investigate the model’s performance in a simulation study and apply it to experimental data. In the simulation study, we verify that the model can be accurately fitted to simulated data, and latent dynamics can be well recovered. In an experimental application, we simultaneously fit the model to fMRI and behavioral data from a continuous motion tracking task. We show that the model accurately recovers both neural and behavioral data and reveals interesting latent cognitive dynamics, the topology of which can be contrasted with several aspects of the experiment.  相似文献   

7.
8.
Computational neuroscience has uncovered a number of computational principles used by nervous systems. At the same time, neuromorphic hardware has matured to a state where fast silicon implementations of complex neural networks have become feasible. En route to future technical applications of neuromorphic computing the current challenge lies in the identification and implementation of functional brain algorithms. Taking inspiration from the olfactory system of insects, we constructed a spiking neural network for the classification of multivariate data, a common problem in signal and data analysis. In this model, real-valued multivariate data are converted into spike trains using “virtual receptors” (VRs). Their output is processed by lateral inhibition and drives a winner-take-all circuit that supports supervised learning. VRs are conveniently implemented in software, whereas the lateral inhibition and classification stages run on accelerated neuromorphic hardware. When trained and tested on real-world datasets, we find that the classification performance is on par with a naïve Bayes classifier. An analysis of the network dynamics shows that stable decisions in output neuron populations are reached within less than 100 ms of biological time, matching the time-to-decision reported for the insect nervous system. Through leveraging a population code, the network tolerates the variability of neuronal transfer functions and trial-to-trial variation that is inevitably present on the hardware system. Our work provides a proof of principle for the successful implementation of a functional spiking neural network on a configurable neuromorphic hardware system that can readily be applied to real-world computing problems.The remarkable sensory and behavioral capabilities of all higher organisms are provided by the network of neurons in their nervous systems. The computing principles of the brain have inspired many powerful algorithms for data processing, most importantly the perceptron and, building on top of that, multilayer artificial neural networks, which are being applied with great success to various data analysis problems (1). Although these networks operate with continuous values, computation in biological neuronal networks relies on the exchange of action potentials, or “spikes.”Simulating networks of spiking neurons with software tools is computationally intensive, imposing limits to the duration of simulations and maximum network size. To overcome this limitation, several groups around the world have started to develop hardware realizations of spiking neuron models and neuronal networks (210) for studying the behavior of biological networks (11). The approach of the Spikey hardware system used in the present study is to enable high-throughput network simulations by speeding up computation by a factor of 104 compared with biological real time (12, 13). It has been developed as a reconfigurable multineuron computing substrate supporting a wide range of network topologies (14).In addition to providing faster tools for neurosimulation, high-throughput spiking network computation in hardware offers the possibility of using spiking networks to solve real-world computational problems. The massive parallelism is a potential advantage over conventional computing when processing large amounts of data in parallel. However, conventional algorithms are often difficult to implement using spiking networks for which many neuromorphic hardware substrates are designed. Novel algorithms have to be designed that embrace the inherent parallelism of a brain-like computing architecture.A common problem in data analysis is classification of multivariate data. Many problems in artificial intelligence relate to classification in some way or the other, such as object recognition or decision making. It is the basis for data mining and, as such, has widespread applications in industry. We interact with classification systems in many aspects of daily life, for example in the form of Web shop recommendations, driver assistance systems, or when sending a letter with a handwritten address that is deciphered automatically in the post office.In this work, we present a neuromorphic network for supervised classification of multivariate data. We implemented the spiking network part on a neuromorphic hardware system. Using a range of datasets, we demonstrate how the classifier network supports nonlinear separation through encoding by virtual receptors, whereas lateral inhibition transforms the input data into a sparser encoding that is better suited for learning.  相似文献   

9.
10.

Purpose

The purpose of this study is to propose an efficient coal workers' pneumoconiosis (CWP) clinical prediction system and put it into clinical use for clinical diagnosis of pneumoconiosis.

Methods

Patients with CWP and dust-exposed workers who were enrolled from August 2021 to December 2021 were included in this study. Firstly, we chose the embedded method through using three feature selection approaches to perform the prediction analysis. Then, we performed the machine learning algorithms as the model backbone and combined them with three feature selection methods, respectively, to determine the optimal predictive model for CWP.

Results

Through applying three feature selection approaches based on machine learning algorithms, it was found that AaDO2 and some pulmonary function indicators played an important role in prediction for identifying CWP of early stage. The support vector machine (SVM) algorithm was proved as the optimal machine learning model for predicting CWP, with the ROC curves obtained from three feature selection methods using SVM algorithm whose AUC values of 97.78%, 93.7%, and 95.56%, respectively.

Conclusion

We developed the optimal model (SVM algorithm) through comparisons and analyses among the performances of different models for the prediction of CWP as a clinical application.  相似文献   

11.
Due to a lack of integration between different sensors, false alarms (FA) in the intensive care unit (ICU) are frequent and can lead to reduced standard of care. We present a novel framework for FA reduction using a machine learning approach to combine up to 114 signal quality and physiological features extracted from the electrocardiogram, photoplethysmograph, and optionally the arterial blood pressure waveform. A machine learning algorithm was trained and evaluated on a database of 4107 expert-labeled life-threatening arrhythmias, from 182 separate ICU visits. On the independent test data, FA suppression results with no true alarm (TA) suppression were 86.4% for asystole, 100% for extreme bradycardia and 27.8% for extreme tachycardia. For the ventricular tachycardia alarms, the best FA suppression performance was 30.5% with a TA suppression rate below 1%. To reduce the TA suppression rate to zero, a reduction in FA suppression performance to 19.7% was required.  相似文献   

12.
13.
14.
It is expected that a large amount of data related to diabetes and other chronic diseases will be generated. However, databases constructed without standardized data item sets can be limited in their usefulness. To address this, the Collaborative Committee of Clinical Informatization in Diabetes Mellitus was established in 2011 by the Japan Diabetes Society and Japan Association for Medical Informatics. The committee has developed core item sets and self‐management item sets for diabetes mellitus, hypertension, dyslipidemia, and chronic kidney disease in collaboration with the Japanese Society of Hypertension, Japan Atherosclerosis Society, Japanese Society of Nephrology, and Japanese Society of Laboratory Medicine, as well as a mapping table that aligns the self‐management item sets with the Japanese standardized codes for laboratory testing. The committee also determined detailed specifications for implementing the four self‐management item sets in personal health record (PHR) applications to facilitate risk stratification, the generation of alerts using information and communications technology systems, the avoidance of data input errors, and the generation of reminders to input the self‐management item set data. The approach developed by the committee may be useful for combining databases for various purposes (such as for clinical studies, patient education, and electronic medical record systems) and for facilitating collaboration between PHR administrators.  相似文献   

15.
Overnight pulse oximetry allows the relatively non‐invasive estimation of peripheral blood haemoglobin oxygen saturations (SpO2), and forms part of the typical polysomnogram (PSG) for investigation of obstructive sleep apnoea (OSA). While the raw SpO2 signal can provide detailed information about OSA‐related pathophysiology, this information is typically summarized with simple statistics such as the oxygen desaturation index (ODI, number of desaturations per hour). As such, this study reviews the technical methods for quantifying OSA‐related patterns in oximetry data. The technical methods described in literature can be broadly grouped into four categories: (i) Describing the detailed characteristics of desaturations events; (ii) Time series statistics; (iii) Analysis of power spectral distribution (i.e. frequency domain analysis); and (d) Non‐linear analysis. These are described and illustrated with examples of oximetry traces. The utilization of these techniques is then described in two applications. First, the application of detailed oximetry analysis allows the accurate automated classification of PSG‐defined OSA. Second, quantifications which better characterize the severity of desaturation events are better predictors of OSA‐related epidemiological outcomes than standard clinical metrics. Finally, methodological considerations and further applications and opportunities are considered.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号