Learning in deep neural networks and brains with similarity-weighted interleaved learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Learning in deep neural networks and brains with similarity-weighted interleaved learning

Authors:	Rajat Saxena Justin L. Shobe Bruce L. McNaughton

Affiliation:	^aDepartment of Neurobiology and Behavior, University of California, Irvine, CA 92697;^bCanadian Centre for Behavioural Neuroscience, The University of Lethbridge, Lethbridge, Alberta T1K 3M4, Canada

Abstract:	Understanding how the brain learns throughout a lifetime remains a long-standing challenge. In artificial neural networks (ANNs), incorporating novel information too rapidly results in catastrophic interference, i.e., abrupt loss of previously acquired knowledge. Complementary Learning Systems Theory (CLST) suggests that new memories can be gradually integrated into the neocortex by interleaving new memories with existing knowledge. This approach, however, has been assumed to require interleaving all existing knowledge every time something new is learned, which is implausible because it is time-consuming and requires a large amount of data. We show that deep, nonlinear ANNs can learn new information by interleaving only a subset of old items that share substantial representational similarity with the new information. By using such similarity-weighted interleaved learning (SWIL), ANNs can learn new information rapidly with a similar accuracy level and minimal interference, while using a much smaller number of old items presented per epoch (fast and data-efficient). SWIL is shown to work with various standard classification datasets (Fashion-MNIST, CIFAR10, and CIFAR100), deep neural network architectures, and in sequential learning frameworks. We show that data efficiency and speedup in learning new items are increased roughly proportionally to the number of nonoverlapping classes stored in the network, which implies an enormous possible speedup in human brains, which encode a high number of separate categories. Finally, we propose a theoretical model of how SWIL might be implemented in the brain. Artificial neural networks (ANNs) tend to lose previously acquired knowledge abruptly when new information is incorporated too quickly (“catastrophic interference”) (1, 2). Successful lifelong learners (e.g., humans) do not suffer from this problem, potentially by using mechanisms suggested in the Complementary Learning Systems Theory (CLST) (3) (see also ref. 4). CLST states that the brain relies on complementary learning systems: the hippocampus (HC) for rapid acquisition of new memories and the neocortex (NC) for the gradual incorporation of the new data into context-independent structured knowledge. During “offline periods,” such as sleep and quiet awake rest, the HC triggers replay of recent experiences in the NC, while the NC spontaneously retrieves and interleaves representations of existing classes (5 –7). The interleaved replay allows gradual adjustment of NC synaptic weights, in a gradient-descent manner, to create context-independent category representations, thereby gracefully integrating new memories and overcoming catastrophic interference. Numerous studies have since successfully used interleaved replay to achieve lifelong learning in neural networks (8, 9).In practice, however, the CLST raises two significant issues. First, how can the brain possibly perform a comprehensive interleaving when it does not have access to all the old data? One potential solution is “Pseudorehearsal” (10), where random inputs can elicit generative replay of internal representations without requiring explicit access to previously learned examples. Attractor-like dynamics may allow the brain to accomplish pseudorehearsal, but it is unclear what to pseudorehearse. Thus, the second problem is that there is not enough time to interleave all of the previously learned information after each new learning event. “Similarity Weighted Interleaved Learning” (SWIL) was proposed as a solution to this second problem, suggesting that it may be sufficient to interleave only old items with substantial representational similarity to new items (11). Empirical behavioral studies showed that highly consistent new items could be rapidly integrated into NC structured knowledge with little or no interference (12, 13). This indicates that the speed of integrating new information depends on its consistency with the prior knowledge (14). Inspired by this behavioral result, and by a reexamination of the distribution of catastrophic interference among previously acquired classes, which is described below, McClelland et al. (11) demonstrated that SWIL allowed learning new information using 2.5x fewer item presentations per epoch in a simple dataset with two superordinate categories and achieved the same performance as training the network on the entire data. However, the authors did not find a similar effect when using more complex datasets, raising concerns about the algorithm’s scalability.The current study has overcome these limitations by modifying the SWIL algorithm to work with Convolutional Neural Networks (CNNs) on traditional classification datasets (Fashion-MNIST, CIFAR10, and CIFAR100). We exploit the hierarchical structure of existing knowledge to selectively interleave only the old items that have higher representational similarity to new items. With this strategy, we can reach performance levels comparable to that achieved by using the entire training dataset, thereby substantially reducing the amount of data required (data-efficient) and learning time (speedup). We then show that SWIL can also be used in a sequential learning framework. Additionally, we show that learning a new class can be extremely data-efficient—i.e., a much smaller number of old items being presented—if it shares similarities with far fewer previously learned classes, which is likely the case in human learning. Finally, we present a theoretical model of how SWIL might be implemented in the brain using previously stored attractors with an excitability bias proportional to their overlap with new items.

Keywords:	complementary learning systems learning memory neural networks memory consolidation

设为首页 | 免责声明 | 关于勤云 | 加入收藏