首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 703 毫秒
1.
Recent behavioral evidence implicates reward prediction errors (RPEs) as a key factor in the acquisition of episodic memory. Yet, important neural predictions related to the role of RPEs in episodic memory acquisition remain to be tested. Humans (both sexes) performed a novel variable-choice task where we experimentally manipulated RPEs and found support for key neural predictions with fMRI. Our results show that in line with previous behavioral observations, episodic memory accuracy increases with the magnitude of signed (i.e., better/worse-than-expected) RPEs (SRPEs). Neurally, we observe that SRPEs are encoded in the ventral striatum (VS). Crucially, we demonstrate through mediation analysis that activation in the VS mediates the experimental manipulation of SRPEs on episodic memory accuracy. In particular, SRPE-based responses in the VS (during learning) predict the strength of subsequent episodic memory (during recollection). Furthermore, functional connectivity between task-relevant processing areas (i.e., face-selective areas) and hippocampus and ventral striatum increased as a function of RPE value (during learning), suggesting a central role of these areas in episodic memory formation. Our results consolidate reinforcement learning theory and striatal RPEs as key factors subtending the formation of episodic memory.SIGNIFICANCE STATEMENT Recent behavioral research has shown that reward prediction errors (RPEs), a key concept of reinforcement learning theory, are crucial to the formation of episodic memories. In this study, we reveal the neural underpinnings of this process. Using fMRI, we show that signed RPEs (SRPEs) are encoded in the ventral striatum (VS), and crucially, that SRPE VS activity is responsible for the subsequent recollection accuracy of one-shot learned episodic memory associations.  相似文献   

2.
《Neuropsychopharmacology》2019,85(11):936-945
BackgroundDisruptions in the decision-making processes that guide action selection are a core feature of many psychiatric disorders, including addiction. Decision making is influenced by the goal-directed and habitual systems that can be computationally characterized using model-based and model-free reinforcement learning algorithms, respectively. Recent evidence suggests an imbalance in the influence of these reinforcement learning systems on behavior in individuals with substance dependence, but it is unknown whether these disruptions are a manifestation of chronic drug use and/or are a preexisting risk factor for addiction.MethodsWe trained adult male rats on a multistage decision-making task to quantify model-free and model-based processes before and after self-administration of methamphetamine or saline.ResultsIndividual differences in model-free, but not model-based, learning prior to any drug use predicted subsequent methamphetamine self-administration; rats with lower model-free behavior took more methamphetamine than rats with higher model-free behavior. This relationship was selective to model-free updating following a rewarded, but not unrewarded, choice. Both model-free and model-based learning were reduced in rats following methamphetamine self-administration, which was due to a decrement in the ability of rats to use unrewarded outcomes appropriately. Moreover, the magnitude of drug-induced disruptions in model-free learning was not correlated with disruptions in model-based behavior, indicating that drug self-administration independently altered both reinforcement learning strategies.ConclusionsThese findings provide direct evidence that model-free and model-based learning mechanisms are involved in select aspects of addiction vulnerability and pathology, and they provide a unique behavioral platform for conducting systems-level analyses of decision making in preclinical models of mental illness.  相似文献   

3.
To survive under changing circumstances, we have to make appropriate decisions on our behavior. For this purpose, the brain should recognize reward information from objects under a given circumstances. Recent experimental and theoretical studies have suggested that primates, including human beings, have at least 2 brain processes that calculate the reward value of objects. One is the process coding a specific reward value of a stimulus or response, depending on direct experience (e.g., classical conditioning and TD learning). The other enables us to predict reward based on the internal model of given circumstances without direct experience (e.g., categorization and inference). To clarify the neuronal correlates of the multiple processes on reward prediction, we have conducted 4 experiments: (1) single-unit recording from the caudate and lateral prefrontal cortex of a monkey, while it performed a memory-guided saccade task with asymmetric reward schedule; (2) human fMRI imaging during random-dot discrimination with asymmetric reward condition; (3) single-unit recording from the monkey dopamine neuron in the random-dot discrimination task with asymmetric reward schedule; and (4) simultaneous single-unit recording from the striatum and lateral prefrontal cortex of monkeys performing a reward inference task. Results suggest that the nigro striatal network and the prefrontal network have different functional roles for reward prediction (value generation). The former applies the model-free method (temporal-difference learning), while the latter uses the model-based method (category-based learning).  相似文献   

4.
Our decisions are based on parallel and competing systems of goal-directed and habitual learning, systems which can be impaired in pathological behaviours. Here we focus on the influence of motivation and compare reward and loss outcomes in subjects with obsessive-compulsive disorder (OCD) on model-based goal-directed and model-free habitual behaviours using the two-step task. We further investigate the relationship with acquisition learning using a one-step probabilistic learning task. Forty-eight OCD subjects and 96 healthy volunteers were tested on a reward and 30 OCD subjects and 53 healthy volunteers on the loss version of the two-step task. Thirty-six OCD subjects and 72 healthy volunteers were also tested on a one-step reversal task. OCD subjects compared with healthy volunteers were less goal oriented (model-based) and more habitual (model-free) to reward outcomes with a shift towards greater model-based and lower habitual choices to loss outcomes. OCD subjects also had enhanced acquisition learning to loss outcomes on the one-step task, which correlated with goal-directed learning in the two-step task. OCD subjects had greater stay behaviours or perseveration in the one-step task irrespective of outcome. Compulsion severity was correlated with habitual learning in the reward condition. Obsession severity was correlated with greater switching after loss outcomes. In healthy volunteers, we further show that greater reward magnitudes are associated with a shift towards greater goal-directed learning further emphasizing the role of outcome salience. Our results highlight an important influence of motivation on learning processes in OCD and suggest that distinct clinical strategies based on valence may be warranted.  相似文献   

5.
The brain's most difficult computation in decision-making learning is searching for essential information related to rewards among vast multimodal inputs and then integrating it into beneficial behaviors. Contextual cues consisting of limbic, cognitive, visual, auditory, somatosensory, and motor signals need to be associated with both rewards and actions by utilizing an internal representation such as reward prediction and reward prediction error. Previous studies have suggested that a suitable brain structure for such integration is the neural circuitry associated with multiple cortico-striatal loops. However, computational exploration still remains into how the information in and around these multiple closed loops can be shared and transferred. Here, we propose a "heterarchical reinforcement learning" model, where reward prediction made by more limbic and cognitive loops is propagated to motor loops by spiral projections between the striatum and substantia nigra, assisted by cortical projections to the pedunculopontine tegmental nucleus, which sends excitatory input to the substantia nigra. The model makes several fMRI-testable predictions of brain activity during stimulus-action-reward association learning. The caudate nucleus and the cognitive cortical areas are correlated with reward prediction error, while the putamen and motor-related areas are correlated with stimulus-action-dependent reward prediction. Furthermore, a heterogeneous activity pattern within the striatum is predicted depending on learning difficulty, i.e., the anterior medial caudate nucleus will be correlated more with reward prediction error when learning becomes difficult, while the posterior putamen will be correlated more with stimulus-action-dependent reward prediction in easy learning. Our fMRI results revealed that different cortico-striatal loops are operating, as suggested by the proposed model.  相似文献   

6.
Animals can categorize the environment into “states,” defined by unique sets of available action-outcome contingencies in different contexts. Doing so helps them choose appropriate actions and make accurate outcome predictions when in each given state. State maps have been hypothesized to be held in the orbitofrontal cortex (OFC), an area implicated in decision-making and encoding information about outcome predictions. Here we recorded neural activity in OFC in 6 male rats to test state representations. Rats were trained on an odor-guided choice task consisting of five trial blocks containing distinct sets of action-outcome contingencies, constituting states, with unsignaled transitions between them. OFC neural ensembles were analyzed using decoding algorithms. Results indicate that the vast majority of OFC neurons contributed to representations of the current state at any point in time, independent of odor cues and reward delivery, even at the level of individual neurons. Across state transitions, these representations gradually integrated evidence for the new state; the rate at which this integration happened in the prechoice part of the trial was related to how quickly the rats'' choices adapted to the new state. Finally, OFC representations of outcome predictions, often thought to be the primary function of OFC, were dependent on the accuracy of OFC state representations.SIGNIFICANCE STATEMENT A prominent hypothesis proposes that orbitofrontal cortex (OFC) tracks current location in a “cognitive map” of state space. Here we tested this idea in detail by analyzing neural activity recorded in OFC of rats performing a task consisting of a series of states, each defined by a set of available action-outcome contingencies. Results show that most OFC neurons contribute to state representations and that these representations are related to the rats'' decision-making and OFC reward predictions. These findings suggest new interpretations of emotional dysregulation in pathologies, such as addiction, which have long been known to be related to OFC dysfunction.  相似文献   

7.
8.
Orbitofrontal cortex (OFC) is critical for reversal learning. Reversal deficits are typically demonstrated in complex settings that combine Pavlovian and instrumental learning. Yet recent work has implicated the OFC specifically in behaviors guided by cues and the features of the specific outcomes they predict. To test whether the OFC is important for reversing such Pavlovian associations in the absence of confounding instrumental requirements, we trained rats on a simple Pavlovian task in which two auditory cues were presented, one paired with a food pellet reward and the other presented without reward. After learning, we reversed the cue–outcome associations. For half the rats, OFC was inactivated prior to each reversal session. Inactivation of OFC impaired the ability of the rats to reverse conditioned responding. This deficit reflected the inability of inactivated rats to develop normal responding for the previously unrewarded cue; inactivation of OFC had no impact on the ability of the rats to inhibit responding to the previously rewarded cue. These data show that OFC is critical to reversal of Pavlovian responding, and that the role of OFC in this behavior cannot be explained as a simple deficit in response inhibition. Furthermore, the contrast between the normal inhibition of responding, reported here, and impaired inhibition of responding during Pavlovian over‐expectation, reported previously, suggests the novel hypothesis that OFC may be particularly critical for learning (or behavior) when it requires the subject to generate predictions about outcomes by bringing together or integrating disparate pieces of associative information.  相似文献   

9.
Potential-based reward shaping has been shown to be a powerful method to improve the convergence rate of reinforcement learning agents. It is a flexible technique to incorporate background knowledge into temporal-difference learning in a principled way. However, the question remains of how to compute the potential function which is used to shape the reward that is given to the learning agent. In this paper, we show how, in the absence of knowledge to define the potential function manually, this function can be learned online in parallel with the actual reinforcement learning process. Two cases are considered. The first solution which is based on the multi-grid discretisation is designed for model-free reinforcement learning. In the second case, the approach for the prototypical model-based R-max algorithm is proposed. It learns the potential function using the free space assumption about the transitions in the environment. Two novel algorithms are presented and evaluated.  相似文献   

10.
It has been proposed that the striatum plays a crucial role in learning to select appropriate actions, optimizing rewards according to the principles of 'Actor-Critic' models of trial-and-error learning. The ventral striatum (VS), as Critic, would employ a temporal difference (TD) learning algorithm to predict rewards and drive dopaminergic neurons. This study examined this model's adequacy for VS responses to multiple rewards in rats. The respective arms of a plus-maze provided rewards of varying magnitudes; multiple rewards were provided at 1-s intervals while the rat stood still. Neurons discharged phasically prior to each reward, during both initial approach and immobile waiting, demonstrating that this signal is predictive and not simply motor-related. In different neurons, responses could be greater for early, middle or late droplets in the sequence. Strikingly, this activity often reappeared after the final reward, as if in anticipation of yet another. In contrast, previous TD learning models show decremental reward-prediction profiles during reward consumption due to a temporal-order signal introduced to reproduce accurate timing in dopaminergic reward-prediction error signals. To resolve this inconsistency in a biologically plausible manner, we adapted the TD learning model such that input information is nonhomogeneously distributed among different neurons. By suppressing reward temporal-order signals and varying richness of spatial and visual input information, the model reproduced the experimental data. This validates the feasibility of a TD-learning architecture where different groups of neurons participate in solving the task based on varied input information.  相似文献   

11.
Roland E Suri 《Neural networks》2002,15(4-6):523-533
This article focuses on recent modeling studies of dopamine neuron activity and their influence on behavior. Activity of midbrain dopamine neurons is phasically increased by stimuli that increase the animal's reward expectation and is decreased below baseline levels when the reward fails to occur. These characteristics resemble the reward prediction error signal of the temporal difference (TD) model, which is a model of reinforcement learning. Computational modeling studies show that such a dopamine-like reward prediction error can serve as a powerful teaching signal for learning with delayed reinforcement, in particular for learning of motor sequences. Several lines of evidence suggest that dopamine is also involved in 'cognitive' processes that are not addressed by standard TD models. I propose the hypothesis that dopamine neuron activity is crucial for planning processes, also referred to as 'goal-directed behavior', which select actions by evaluating predictions about their motivational outcomes.  相似文献   

12.
Nicotinic receptors in the brain: correlating physiology with function   总被引:31,自引:0,他引:31  
Nicotinic ACh receptors (nAChRs) have been implicated in a variety of brain functions, including neuronal development, learning and memory formation, and reward. Although there are substantial data indicating that nAChR subunits are found in many brain regions, the precise cellular roles of these subunits in neuronal functions have remained elusive. Until recently, nAChRs were thought primarily to serve a modulatory role in the brain by regulating neurotransmitter release from nerve terminals. However, new evidence has revealed that nAChRs also function in a postsynaptic role by mediating fast ACh-mediated synaptic transmission in the hippocampus and in the sensory cortex, and are found at somatodendritic as well as nerve terminal sites in the reward system. It is possible that presynaptic and postsynaptic nAChRs mediate changes in the efficacy of synaptic transmission in these brain regions. These changes could underlie the proposed functions of nAChRs in cognitive functions of the hippocampus and cerebral cortex, in neuronal development in the sensory cortex, and in reward.  相似文献   

13.
The striatum plays critical roles in visually-guided decision-making and receives dense axonal projections from midbrain dopamine neurons. However, the roles of striatal dopamine in visual decision-making are poorly understood. We trained male and female mice to perform a visual decision task with asymmetric reward payoff, and we recorded the activity of dopamine axons innervating striatum. Dopamine axons in the dorsomedial striatum (DMS) responded to contralateral visual stimuli and contralateral rewarded actions. Neural responses to contralateral stimuli could not be explained by orienting behavior such as eye movements. Moreover, these contralateral stimulus responses persisted in sessions where the animals were instructed to not move to obtain reward, further indicating that these signals are stimulus-related. Lastly, we show that DMS dopamine signals were qualitatively different from dopamine signals in the ventral striatum (VS), which responded to both ipsilateral and contralateral stimuli, conforming to canonical prediction error signaling under sensory uncertainty. Thus, during visual decisions, DMS dopamine encodes visual stimuli and rewarded actions in a lateralized fashion, and could facilitate associations between specific visual stimuli and actions.SIGNIFICANCE STATEMENT While the striatum is central to goal-directed behavior, the precise roles of its rich dopaminergic innervation in perceptual decision-making are poorly understood. We found that in a visual decision task, dopamine axons in the dorsomedial striatum (DMS) signaled stimuli presented contralaterally to the recorded hemisphere, as well as the onset of rewarded actions. Stimulus-evoked signals persisted in a no-movement task variant. We distinguish the patterns of these signals from those in the ventral striatum (VS). Our results contribute to the characterization of region-specific dopaminergic signaling in the striatum and highlight a role in stimulus-action association learning.  相似文献   

14.
Computational models of reward processing suggest that foregone or fictive outcomes serve as important information sources for learning and augment those generated by experienced rewards (e.g. reward prediction errors). An outstanding question is how these learning signals interact with top‐down cognitive influences, such as cognitive reappraisal strategies. Using a sequential investment task and functional magnetic resonance imaging, we show that the reappraisal strategy selectively attenuates the influence of fictive, but not reward prediction error signals on investment behavior; such behavioral effect is accompanied by changes in neural activity and connectivity in the anterior insular cortex, a brain region thought to integrate subjective feelings with high‐order cognition. Furthermore, individuals differ in the extent to which their behaviors are driven by fictive errors versus reward prediction errors, and the reappraisal strategy interacts with such individual differences; a finding also accompanied by distinct underlying neural mechanisms. These findings suggest that the variable interaction of cognitive strategies with two important classes of computational learning signals (fictive, reward prediction error) represent one contributing substrate for the variable capacity of individuals to control their behavior based on foregone rewards. These findings also expose important possibilities for understanding the lack of control in addiction based on possibly foregone rewarding outcomes. Hum Brain Mapp 35:3738–3749, 2014. © 2013 The Authors. Human Brain Mapping Published by Wiley Periodicals, Inc.  相似文献   

15.
Reinforcement learning theory distinguishes “model-free” learning, which fosters reflexive repetition of previously rewarded actions, from “model-based” learning, which recruits a mental model of the environment to flexibly select goal-directed actions. Whereas model-free learning is evident across development, recruitment of model-based learning appears to increase with age. However, the cognitive processes underlying the development of model-based learning remain poorly characterized. Here, we examined whether age-related differences in cognitive processes underlying the construction and flexible recruitment of mental models predict developmental increases in model-based choice. In a cohort of participants aged 9–25, we examined whether the abilities to infer sequential regularities in the environment (“statistical learning”), maintain information in an active state (“working memory”) and integrate distant concepts to solve problems (“fluid reasoning”) predicted age-related improvements in model-based choice. We found that age-related improvements in statistical learning performance did not mediate the relationship between age and model-based choice. Ceiling performance on our working memory assay prevented examination of its contribution to model-based learning. However, age-related improvements in fluid reasoning statistically mediated the developmental increase in the recruitment of a model-based strategy. These findings suggest that gradual development of fluid reasoning may be a critical component process underlying the emergence of model-based learning.  相似文献   

16.
The primate orbitofrontal cortex (OFC) is involved in reward processing, learning, and decision making. Research in monkeys has shown that this region is densely connected with higher sensory, limbic, and subcortical regions. Moreover, a parcellation of the monkey OFC into two subdivisions has been suggested based on its intrinsic anatomical connections. However, in humans, little is known about any functional subdivisions of the OFC except for a rather coarse medial/lateral distinction. Here, we used resting-state fMRI in combination with unsupervised clustering techniques to investigate whether OFC subdivisions can be revealed based on their functional connectivity profiles with other brain regions. Examination of different cluster solutions provided support for a parcellation into two parts as observed in monkeys, but it also highlighted a much finer hierarchical clustering of the orbital surface. Specifically, we identified (1) a medial, (2) a posterior-central, (3) a central, and (4-6) three lateral clusters spanning the anterior-posterior gradient. Consistent with animal tracing studies, these OFC clusters were connected to other cortical regions such as prefrontal, temporal, and parietal cortices but also subcortical areas in the striatum and the midbrain. These connectivity patterns provide important implications for identifying specific functional roles of OFC subdivisions for reward processing, learning, and decision making. Moreover, this parcellation schema can provide guidance to report results in future studies.  相似文献   

17.
High impulsivity is an important risk factor for addiction with evidence from endophenotype studies. In addiction, behavioral control is shifted toward the habitual end. Habitual control can be described by retrospective updating of reward expectations in ‘model-free'' temporal-difference algorithms. Goal-directed control relies on the prospective consideration of actions and their outcomes, which can be captured by forward-planning ‘model-based'' algorithms. So far, no studies have examined behavioral and neural signatures of model-free and model-based control in healthy high-impulsive individuals. Fifty healthy participants were drawn from the upper and lower ends of 452 individuals, completing the Barratt Impulsiveness Scale. All participants performed a sequential decision-making task during functional magnetic resonance imaging (fMRI) and underwent structural MRI. Behavioral and fMRI data were analyzed by means of computational algorithms reflecting model-free and model-based control. Both groups did not differ regarding the balance of model-free and model-based control, but high-impulsive individuals showed a subtle but significant accentuation of model-free control alone. Right lateral prefrontal model-based signatures were reduced in high-impulsive individuals. Effects of smoking, drinking, general cognition or gray matter density did not account for the findings. Irrespectively of impulsivity, gray matter density in the left dorsolateral prefrontal cortex was positively associated with model-based control. The present study supports the idea that high levels of impulsivity are accompanied by behavioral and neural signatures in favor of model-free behavioral control. Behavioral results in healthy high-impulsive individuals were qualitatively different to findings in patients with the same task. The predictive relevance of these results remains an important target for future longitudinal studies.  相似文献   

18.
The ability to change an established stimulus-behavior association based on feedback is critical for adaptive social behaviors. This ability has been examined in reversal learning tasks, where participants first learn a stimulus-response association (e.g., select a particular object to get a reward) and then need to alter their response when reinforcement contingencies change. Although substantial evidence demonstrates that the OFC is a critical region for reversal learning, previous studies have not distinguished reversal learning for emotional associations from neutral associations. The current study examined whether OFC plays similar roles in emotional versus neutral reversal learning. The OFC showed greater activity during reversals of stimulus-outcome associations for negative outcomes than for neutral outcomes. Similar OFC activity was also observed during reversals involving positive outcomes. Furthermore, OFC activity is more inversely correlated with amygdala activity during negative reversals than during neutral reversals. Overall, our results indicate that the OFC is more activated by emotional than neutral reversal learning and that OFC's interactions with the amygdala are greater for negative than neutral reversal learning.  相似文献   

19.
What is reinforced by phasic dopamine signals?   总被引:1,自引:0,他引:1  
The basal ganglia have been associated with processes of reinforcement learning. A strong line of supporting evidence comes from the recording of dopamine (DA) neurones in behaving monkeys. Unpredicted, biologically salient events, including rewards cause a stereotypic short-latency (70-100 ms), short-duration (100-200 ms) burst of DA activity - the phasic response. This response is widely considered to represent reward prediction errors used as teaching signals in appetitive learning to promote actions that will maximise future reward acquisition. For DA signalling to perform this function, sensory processing afferent to DA neurones should discriminate unpredicted reward-related events. However, the comparative response latencies of DA neurones and orienting gaze-shifts indicate that phasic DA responses are triggered by pre-attentive sensory processing. Consequently, in circumstances where biologically salient events are both spatially and temporally unpredictable, it is unlikely their identity will be known at the time of DA signalling. The limited quality of afferent sensory processing and the precise timing of phasic DA signals, suggests that they may play a less direct role in 'Law of Effect' appetitive learning. Rather, the 'time-stamp' nature of the phasic response, in conjunction with the other signals likely to be present in the basal ganglia at the time of phasic DA input, suggests it may reinforce the discovery of unpredicted sensory events for which the organism is responsible. Furthermore, DA-promoted repetition of preceding actions/movements should enable the system to converge on those aspects of context and behavioural output that lead to the discovery of novel actions.  相似文献   

20.
Understanding the factors that drive organization and function of the brain is an enduring question in neuroscience. Using functional magnetic resonance imaging (fMRI), structure and function have been mapped in primary sensory cortices based on knowledge of the organizational principles that likely drive a given region (e.g., aspects of visual form in primary visual cortex and sound frequency in primary auditory cortex) and knowledge of underlying cytoarchitecture. The organizing principles of higher‐order brain areas that encode more complex signals, such as the orbitofrontal cortex (OFC), are less well understood. One fundamental component that underlies the many functions of the OFC is the ability to compute the reward or value of a given object. There is evidence of variability in the spatial location of responses to specific categories of objects (or value of said objects) within the OFC, and several reference frames have been proposed to explain this variability, including topographic spatial gradients that correspond to axes of primary versus secondary rewards and positive versus negative reinforcers. One potentially useful structural morphometric reference frame in the OFC is the “H‐sulcus,” a pattern formed by medial orbital, lateral orbital and transverse orbital sulci. In 48 human subjects, we use a structural morphometric tracing procedure to localize functional activation along the H‐sulcus for face and food stimuli. We report the novel finding that food‐selective responses are consistently found within the caudal portion of the medial orbital sulcus, but no consistency within the H‐sulcus for response to face stimuli. These results suggest that sulcogyral anatomy of the H‐sulcus may be an important morphological metric that contributes to the organizing principles of the OFC response to certain stimulus categories, including food.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号