1.从Real W'Orld中的反馈中学习的背景和意义受到与选择某些选项或行动相关的奖励结果中的不变波动的限制。其中一些波动是由这些选项/行动的奖励价值的基本变化引起的,这需要对当前的学习策略进行急剧调整,就像在Epiphany学习或一次性学习中一样[陈和克拉吉奇,2017年;Lee等人。[2015]。其他变化在稳定的环境中表示固有的随机性,应容容忍并忽略以保持稳定的选择偏好。In other words, learning in dynamic environments is bounded by a tradeoff between being adaptable (i.e. respond quickly to changes in the environment) and being precise (i.e. update slowly after each feedback to be more accurate), which we refer to as the adaptability-precision tradeoff [Farashahi et al., 2017; Khorsand & Soltani, 2017]. Therefore, distinguishing meaningful changes in the environment from natural fluctuations can greatly enhance adaptive learning, indicating that adaptive learning depends on interactions between multiple brain areas. To date, most computational models of learning under uncertainty are very high-level and/or descriptive [Behrens et al., 2007; Costa et al., 2015; ligaya, 2016; Jang et al., 2015; Nassar et al., 201 O; Payzan-LeNestour & Bossaerts, 2011] and therefore, do not provide specific testable predictions. On the other hand, neural mechanisms of uncertainty monitoring for adaptive learning have been predominantly investigated in humans, and in a few cases monkeys, both of which are limited in terms of circuit-level manipulations. However, interactions between brain areas unfold on short timescales and can be specific to certain cell types. These properties have severely limited the ability of functional MRI [Logothetis, 2003] or MEG [Dale et al., 2000; Mostert et al., 2015] to reveal the microcircuit mechanisms within brain regions and fine-grained contributions between brain regions. To overcome these limitations and reveal neural mechanisms underlying adaptive learning under uncertainty, we propose a combination of detailed computational modeling, imaging of stable neuronal ensembles, and precise system-level manipulation of interactions between multiple brain areas in rodents. The latter is possible in part due to powerful circuit- dissection techniques in rodents that allow manipulations of genetically-tractable cell types and thus, specific projections between brain regions. Combined with decoding of neuronal activity in cortex and guided by mechanistic computational modeling, this approach enables us to investigate both microcircuit and system-level mechanisms of adaptive learning under uncertainty. We have recently proposed a mechanistic model for adaptive learning under uncertainty [Farashahi et al., 2017]. This model, which we refer to as reward-dependent metaplasticity (ROMP) model, provides a synaptic mechanism for how learning can be self-adjusted to reward statistics in the environment. The model predicts as more time spent in a given environment with a certain reward schedule, the organisms should become less sensitive to feedback that does not support what is learned. This and other predictions of the model were confirmed using a large set of behavioral data in monkeys during a probabilistic reversal learning task [Farashahi et al., 2017]. Although the proposed metaplasticity mechanism enables the model to become more robust against random fluctuations, it also causes the model to not respond quickly to actual changes in the environment. This limitation can be partially mitigated by allowing synapses to become unstable in response to changes in the environment [ligaya, 2016]. Interestingly, in our model, the changes in the activity of neurons that encode reward values can be used by another system to compute volatility in the environment. This signal can be used subsequently to increase the speed of learning when volatility is high, that is, when there is a higher chance of real changes in the environment. We hypothesize that such interactions between value-encoding and uncertainty-monitoring systems can enhance adaptability required in dynamic environments. In addition to this modeling study, we recently have shown that both basolateral amygdala (BLA) and orbitofrontal cortex (OFC) have complementary roles in adaptive value learning under uncertainty in rodents [Stolyarova & Izquierdo, 2017]. In this experiment, rats learned the variance in delays for food rewards associated with different visual stimuli upon selecting between them. We found that OFC is necessary to accurately learn such stimulus-outcome association (in terms of 1 21

机构
国家健康研究所(NIH)
研究所
国家药物滥用研究所(Nida)
类型
研究项目(R01)
项目 #
1R01DA047870-01
应用 #
9691634
研究部分
特别强调面板(ZRG1)
计划官
Pariyadath,Vani
项目开始
2018-09-15
项目结束
2023-07-31
预算开始
2018-09-15
预算结束
2019-07-31.
支援年份
1
财政年度
2018年
总花费
间接开销
名称
达特茅斯学院
部门
心理学
类型
艺术与科学学院
DUNS#
041027822
城市
汉诺威
状态
NH.
国家
美国
邮政编码