首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
Many behavioral tasks require goal-directed actions to obtain delayed reward. The prefrontal cortex appears to mediate many aspects of goal-directed decision making. This article presents a model of prefrontal cortex function emphasizing the influence of goal-related activity on the choice of the next motor output. The model can be interpreted in terms of key elements of Reinforcement Learning Theory. Different neocortical minicolumns represent distinct sensory input states and distinct motor output actions. The dynamics of each minicolumn include separate phases of encoding and retrieval. During encoding, strengthening of excitatory connections forms forward and reverse associations between each state, the following action, and a subsequent state, which may include reward. During retrieval, activity spreads from reward states throughout the network. The interaction of this spreading activity with a specific input state directs selection of the next appropriate action. Simulations demonstrate how these mechanisms can guide performance in a range of goal-directed tasks, and provide a functional framework for some of the neuronal responses previously observed in the medial prefrontal cortex during performance of spatial memory tasks in rats.  相似文献   

2.
Adam Ponzi   《Neural networks》2008,21(2-3):322-330
A simple working memory model based on recurrent network activation is proposed and its application to selection and reinforcement of an action is demonstrated as a solution to the temporal credit assignment problem. Reactivation of recent salient cue states is generated and maintained as a type of salience gated recurrently active working memory, while lower salience distractors are ignored. Cue reactivation during the action selection period allows the cue to select an action while its reactivation at the reward period allows the reinforcement of the action selected by the reactivated state, which is necessarily the action which led to the reward being found. A down-gating of the external input during the reactivation and maintenance prevents interference. A double winner-take-all system which selects only one cue and only one action allows the targeting of the cue–action allocation to be modified. This targeting works both to reinforce a correct cue–action allocation and to punish the allocation when cue–action allocations change. Here we suggest a firing rate neural network implementation of this system based on the basal ganglia anatomy with input from a cortical association layer where reactivations are generated by signals from the thalamus. Striatum medium spiny neurons represent actions. Auto-catalytic feedback from a dopamine reward signal modulates three-way Hebbian long term potentiation and depression at the cortical–striatal synapses which represent the cue–action associations. The model is illustrated by the numerical simulations of a simple example — that of associating a cue signal to a correct action to obtain reward after a delay period, typical of primate cue reward tasks. Through learning, the model shows a transition from an exploratory phase where actions are generated randomly, to a stable directed phase where the animal always chooses the correct action for each experienced state. When cue–action allocations change, we show that this is noticed by the model, the incorrect cue–action allocations are punished and the correct ones discovered.  相似文献   

3.
The understanding of other individuals' actions is a fundamental cognitive skill for all species living in social groups. Recent neurophysiological evidence suggests that an observer may achieve the understanding by mapping visual information onto his own motor repertoire to reproduce the action effect. However, due to differences in embodiment, environmental constraints or motor skills, this mapping very often cannot be direct. In this paper, we present a dynamic network model which represents in its layers the functionality of neurons in different interconnected brain areas known to be involved in action observation/execution tasks. The model aims at substantiating the idea that action understanding is a continuous process which combines sensory evidence, prior task knowledge and a goal-directed matching of action observation and action execution. The model is tested in variations of an imitation task in which an observer with dissimilar embodiment tries to reproduce the perceived or inferred end-state of a grasping-placing sequence. We also propose and test a biologically plausible learning scheme which allows establishing during practice a goal-directed organization of the distributed network. The modeling results are discussed with respect to recent experimental findings in action observation/execution studies.  相似文献   

4.
This paper presents two novel neural networks based on snap-drift in the context of self-organisation and sequence learning. The snap-drift neural network employs modal learning that is a combination of two modes; fuzzy AND learning (snap), and Learning Vector Quantisation (drift). We present the snap-drift self-organising map (SDSOM) and the recurrent snap-drift neural network (RSDNN). The SDSOM uses the standard SOM architecture, where a layer of input nodes connects to the self-organising map layer and the weight update consists of either snap (min of input and weight) or drift (LVQ, as in SOM). The RSDNN uses a simple recurrent network (SRN) architecture, with the hidden layer values copied back to the input layer. A form of reinforcement learning is deployed in which the mode is swapped between the snap and drift when performance drops, and in which adaptation is probabilistic, whereby the probability of a neuron being adapted is reduced as performance increases. The algorithms are evaluated on several well known data sets, and it is found that these exhibit effective learning that is faster than alternative neural network methods.  相似文献   

5.
《Neural networks》1999,12(7-8):1143-1155
One of the difficulties encountered in the application of the reinforcement learning to real-world problems is the construction of a discrete state space from a continuous sensory input signal. In the absence of a priori knowledge about the task, a straightforward approach to this problem is to discretize the input space into a grid, and to use a lookup table. However, this method suffers from the curse of dimensionality. Some studies use continuous function approximators such as neural networks instead of lookup tables. However, when global basis functions such as sigmoid functions are used, convergence cannot be guaranteed. To overcome this problem, we propose a method in which local basis functions are incrementally assigned depending on the task requirement. Initially, only one basis function is allocated over the entire space. The basis function is divided according to the statistical property of locally weighted temporal difference error (TD error) of the value function. We applied this method to an autonomous robot collision avoidance problem, and evaluated the validity of the algorithm in simulation. The proposed algorithm, which we call adaptive basis division (ABD) algorithm, achieved the task using a smaller number of basis functions than the conventional methods. Moreover, we applied the method to a goal-directed navigation problem of a real mobile robot. The action strategy was learned using a database of sensor data, and it was then used for navigation of a real machine. The robot reached the goal using a smaller number of internal states than with the conventional methods.  相似文献   

6.
Zilli EA  Hasselmo ME 《Hippocampus》2008,18(2):193-209
The mechanisms of goal-directed behavior have been studied using reinforcement learning theory, but these theoretical techniques have not often been used to address the role of memory systems in performing behavioral tasks. This work addresses this shortcoming by providing a way in which working memory (WM) and episodic memory may be included in the reinforcement learning framework, then simulating the successful acquisition and performance of six behavioral tasks, drawn from or inspired by the rat experimental literature, that require WM or episodic memory for correct performance. With no delay imposed during the tasks, simulations with WM can solve all of the tasks at above the chance level. When a delay is imposed, simulations with both episodic memory and WM can solve all of the tasks except a disambiguation of odor sequences task.  相似文献   

7.
We examine two methods which are used to deal with complex machine learning problems: compressed sensing and model compression. We discuss both methods in the context of feed-forward artificial neural networks and develop the backpropagation method in compressed parameter space. We further show that compressing the weights of a layer of a multilayer perceptron is equivalent to compressing the input of the layer. Based on this theoretical framework, we will use orthogonal functions and especially random projections for compression and perform experiments in supervised and reinforcement learning to demonstrate that the presented methods reduce training time significantly.  相似文献   

8.
《Neuropsychopharmacology》2019,85(11):936-945
BackgroundDisruptions in the decision-making processes that guide action selection are a core feature of many psychiatric disorders, including addiction. Decision making is influenced by the goal-directed and habitual systems that can be computationally characterized using model-based and model-free reinforcement learning algorithms, respectively. Recent evidence suggests an imbalance in the influence of these reinforcement learning systems on behavior in individuals with substance dependence, but it is unknown whether these disruptions are a manifestation of chronic drug use and/or are a preexisting risk factor for addiction.MethodsWe trained adult male rats on a multistage decision-making task to quantify model-free and model-based processes before and after self-administration of methamphetamine or saline.ResultsIndividual differences in model-free, but not model-based, learning prior to any drug use predicted subsequent methamphetamine self-administration; rats with lower model-free behavior took more methamphetamine than rats with higher model-free behavior. This relationship was selective to model-free updating following a rewarded, but not unrewarded, choice. Both model-free and model-based learning were reduced in rats following methamphetamine self-administration, which was due to a decrement in the ability of rats to use unrewarded outcomes appropriately. Moreover, the magnitude of drug-induced disruptions in model-free learning was not correlated with disruptions in model-based behavior, indicating that drug self-administration independently altered both reinforcement learning strategies.ConclusionsThese findings provide direct evidence that model-free and model-based learning mechanisms are involved in select aspects of addiction vulnerability and pathology, and they provide a unique behavioral platform for conducting systems-level analyses of decision making in preclinical models of mental illness.  相似文献   

9.
This work proposes a hierarchical biologically-inspired architecture for learning sensor-based spatial representations of a robot environment in an unsupervised way. The first layer is comprised of a fixed randomly generated recurrent neural network, the reservoir, which projects the input into a high-dimensional, dynamic space. The second layer learns instantaneous slowly-varying signals from the reservoir states using Slow Feature Analysis (SFA), whereas the third layer learns a sparse coding on the SFA layer using Independent Component Analysis (ICA). While the SFA layer generates non-localized activations in space, the ICA layer presents high place selectivity, forming a localized spatial activation, characteristic of place cells found in the hippocampus area of the rodent's brain. We show that, using a limited number of noisy short-range distance sensors as input, the proposed system learns a spatial representation of the environment which can be used to predict the actual location of simulated and real robots, without the use of odometry. The results confirm that the reservoir layer is essential for learning spatial representations from low-dimensional input such as distance sensors. The main reason is that the reservoir state reflects the recent history of the input stream. Thus, this fading memory is essential for detecting locations, mainly when locations are ambiguous and characterized by similar sensor readings.  相似文献   

10.
Free-energy based reinforcement learning (FERL) was proposed for learning in high-dimensional state and action spaces. However, the FERL method does only really work well with binary, or close to binary, state input, where the number of active states is fewer than the number of non-active states. In the FERL method, the value function is approximated by the negative free energy of a restricted Boltzmann machine (RBM). In our earlier study, we demonstrated that the performance and the robustness of the FERL method can be improved by scaling the free energy by a constant that is related to the size of network. In this study, we propose that RBM function approximation can be further improved by approximating the value function by the negative expected energy (EERL), instead of the negative free energy, as well as being able to handle continuous state input. We validate our proposed method by demonstrating that EERL: (1) outperforms FERL, as well as standard neural network and linear function approximation, for three versions of a gridworld task with high-dimensional image state input; (2) achieves new state-of-the-art results in stochastic SZ-Tetris in both model-free and model-based learning settings; and (3) significantly outperforms FERL and standard neural network function approximation for a robot navigation task with raw and noisy RGB images as state input and a large number of actions.  相似文献   

11.
A crucial aspect of organizing goal-directed behavior is the ability to form neural representations of relationships between environmental stimuli, actions and reinforcement. Very little is known yet about the neural encoding of response-reward relationships, a process which is deemed essential for purposeful behavior. To investigate this, tetrode recordings were made in the medial prefrontal cortex (PFC) of rats performing a Go-NoGo task. After task acquisition, a subset of neurons showed a sustained change in firing during the rewarded action sequence that was triggered by a specific visual cue. When these changes were monitored in the course of learning, they were seen to develop in parallel with the behavioral learning curve and were highly sensitive to a switch in reward contingencies. These sustained changes correlated with the reward-associated action sequence, not with sensory or reward-predicting properties of the cue or individual motor acts per se. This novel type of neural plasticity may contribute to the formation of response-reinforcer associations and of behavioral strategies for guiding goal-directed action.  相似文献   

12.
A Fuzzy Adaptive Resonance Theory (ART) model capable of rapid stable learning of recognition categories in response to arbitrary sequences of analog or binary input patterns is described. Fuzzy ART incorporates computations from fuzzy set theory into the ART 1 neural network, which learns to categorize only binary input patterns. The generalization to learning both analog and binary input patterns is achieved by replacing appearances of the intersection operator (∩) in ART 1 by the MIN operator (Λ) of fuzzy set theory. The MIN operator reduces to the intersection operator in the binary case. Category proliferation is prevented by normalizing input vectors at a preprocessing stage. A normalization procedure called complement coding leads to a symmetric theory in which the MIN operator (Λ) and the MAX operator () of fuzzy set theory play complementary roles. Complement coding uses on-cells and of-cells to represent the input pattern, and preserves individual feature amplitudes while normalizing the total on-cell/off-cell vector. Learning is stable because all adaptive weights can only decrease in time. Decreasing weights correspond to increasing sizes of category “boxes”. Smaller vigilance values lead to larger category boxes. Learning stops when the input space is covered by boxes. With fast learning and a finite input set of arbitrary size and composition, learning stabilizes after just one presentation of each input pattern. A fast-commit slow-recode option combines fast learning with a forgetting rule that buffers system memory against noise. Using this option, rare events can be rapidly learned, yet previously learned memories are not rapidly erased in response to statistically unreliable input fluctuations.  相似文献   

13.
We describe a hybrid generative and predictive model of the motor cortex. The generative model is related to the hierarchically directed cortico-cortical (or thalamo-cortical) connections and unsupervised training leads to a topographic and sparse hidden representation of its sensory and motor input. The predictive model is related to lateral intra-area and inter-area cortical connections, functions as a hetero-associator attractor network and is trained to predict the future state of the network. Applying partial input, the generative model can map sensory input to motor actions and can thereby perform learnt action sequences of the agent within the environment. The predictive model can additionally predict a longer perception- and action sequence (mental simulation). The models' performance is demonstrated on a visually guided robot docking manoeuvre. We propose that the motor cortex might take over functions previously learnt by reinforcement in the basal ganglia and relate this to mirror neurons and imitation.  相似文献   

14.
A comparison of behavior-based and planning approaches of robot control is presented in this paper. We focus on miniature mobile robotic agents with limited sensory abilities. Two reactive control mechanisms for an agent are considered—a radial basis function neural network trained by evolutionary algorithm and a traditional reinforcement learning algorithm over a finite agent state space. The control architecture based on localization and planning is compared to the former method.  相似文献   

15.
Artificial neural networks are useful tools for pattern recognition because they realize nonlinear mapping between input and output spaces. This ability is tuned by supervised learning methods such as back-propagation. In the supervised learning methods, desired outputs of the neural network are needed. However, the desired outputs are usually unknown in unpredictable environments. To solve this problem, this paper presents a self-supervised learning system for category detection. This system learns categories of objects automatically by integrating information from several sensors. We assume that these sensory inputs are always ambiguous patterns that include some noises according to deformations of the objects. After the learning, the system recognizes objects, also controlling the priority of each sensor, according to the deformation of the sensory input pattern.

In the simulation, the system is applied to several learning and recognition tasks using artificial or actual sensory inputs. In all tasks, the system found the categories. Particularly, we applied the new system to the learning of five Japanese vowels with the corresponding shapes of the mouth. As result, the system became to yield specific outputs corresponding to each vowel.  相似文献   


16.
We have proposed a new approach to pattern recognition in which not only a classifier but also a feature space of input variables is learned incrementally. In this paper, an extended version of Incremental Principal Component Analysis (IPCA) and Resource Allocating Network with Long-Term Memory (RAN-LTM) are effectively combined to implement this idea. Since IPCA updates a feature space incrementally by rotating the eigen-axes and increasing the dimensions, the inputs of a neural classifier must also change in their values and the number of input variables. To solve this problem, we derive an approximation of the update formula for memory items, which correspond to representative training samples stored in the long-term memory of RAN-LTM. With these memory items, RAN-LTM is efficiently reconstructed and retrained to adapt to the evolution of the feature space. This function is incorporated into our face recognition system. In the experiments, the proposed incremental learning model is evaluated over a self-compiled video clip of 24 persons. The experimental results show that the incremental learning of a feature space is very effective to enhance the generalization performance of a neural classifier in a realistic face recognition task.  相似文献   

17.
A deterministic neural network concept for a “universal approximator” is proposed. The network has two hidden layers; only the synapses of the output layer are required to be plastic and only those depend on the function to be approximated. It is shown that a DEterministic Function Approximation Network (DEFAnet) allows to approximate an arbitrary continuous function from the finite-dimensional unit interval into the finite-dimensional real space with arbitrary accuracy; arbitrary Boolean functions may be implemented exactly in a simple subset of DEFAnets. In a supervised learning scheme, convergence to the desired function is guaranteed; back propagation of errors is not required. The concept is also open for reinforcement learning. In addition, when the topology of the network is determined according to the DEFAnet concept, it is possible to calculate all plastic synaptic weights in closed form, thus reducing the training considerably or replacing it altogether. Efficient algorithms for the calculation of synapse weights are given.  相似文献   

18.
Information storage matrices (ISM) have recently been introduced as artificial neural networks. They define a new parallel processing architecture in which the neural connection weights are efficiently trained through a global learning strategy. This eliminates a need for slowly converging iterative learning schemes found in most present day neural network paradigms. Consequently, the ISM neural networks are attractive for real-time and online applications. Several such opportunities are discussed in this presentation. First it is demonstrated how the Boolean logic is implemented, and how the basic digital computer operations are realized through an inherently analog ISM organization. The ISM neural network is then trained to perform a pattern recognition task. The specified feature is extracted from a binary input data string. In addition, the ISM networks are used in the data processing and process control applications. In the first instance, spectral components of the covariance matrix are obtained without having to solve the characteristic equation. In the second case, process performance is stabilized on-line with respect to the initial state selected. As a result of such diverse application potentials, the ISM neural network is emerging as a useful parallel computing processor.  相似文献   

19.
In this paper, we present a new recurrent bidirectional model that encompasses correlational, competitive and topological model properties. The simultaneous use of many classes of network behaviors allows for the unsupervised learning/categorization of perceptual patterns (through input compression) and the concurrent encoding of proximities in a multidimensional space. All of these operations are achieved within a common learning operation, and using a single set of defining properties. It is shown that the model can learn categories by developing prototype representations strictly from exposition to specific exemplars. Moreover, because the model is recurrent, it can reconstruct perfect outputs from incomplete and noisy patterns. Empirical exploration of the model’s properties and performance shows that its ability for adequate clustering stems from: (1) properly distributing connection weights, and (2) producing a weight space with a low dispersion level (or higher density). In addition, since the model uses a sparse representation (k-winners), the size of topological neighborhood can be fixed, and no longer requires a decrease through time as was the case with classic self-organizing feature maps. Since the model’s learning and transmission parameters are independent from learning trials, the model can develop stable fixed points in a constrained topological architecture, while being flexible enough to learn novel patterns.  相似文献   

20.
In a physical neural system, where storage and processing are intimately intertwined, the rules for adjusting the synaptic weights can only depend on variables that are available locally, such as the activity of the pre- and post-synaptic neurons, resulting in local learning rules. A systematic framework for studying the space of local learning rules is obtained by first specifying the nature of the local variables, and then the functional form that ties them together into each learning rule. Such a framework enables also the systematic discovery of new learning rules and exploration of relationships between learning rules and group symmetries. We study polynomial local learning rules stratified by their degree and analyze their behavior and capabilities in both linear and non-linear units and networks. Stacking local learning rules in deep feedforward networks leads to deep local learning. While deep local learning can learn interesting representations, it cannot learn complex input–output functions, even when targets are available for the top layer. Learning complex input–output functions requires local deep learning where target information is communicated to the deep layers through a backward learning channel. The nature of the communicated information about the targets and the structure of the learning channel partition the space of learning algorithms. For any learning algorithm, the capacity of the learning channel can be defined as the number of bits provided about the error gradient per weight, divided by the number of required operations per weight. We estimate the capacity associated with several learning algorithms and show that backpropagation outperforms them by simultaneously maximizing the information rate and minimizing the computational cost. This result is also shown to be true for recurrent networks, by unfolding them in time. The theory clarifies the concept of Hebbian learning, establishes the power and limitations of local learning rules, introduces the learning channel which enables a formal analysis of the optimality of backpropagation, and explains the sparsity of the space of learning rules discovered so far.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号