Artificial Intelligence may Shed Light on how the Brain Processes Language, According to New Research

Artificial intelligence (AI) models of language have been very good at some tasks in the last several years. They are particularly good at predicting the next word in a string of text; this technology aids search engines and texting apps in anticipating the next word you will enter.

Predictive language models of the most recent generation also appear to learn something about the underlying meaning of language. These models are capable of not just predicting the next word, but also of performing tasks that appear to require genuine comprehensions, such as question responding, document summarization, and story completion.

Such models were created to optimize performance for the specific task of text prediction, rather than attempting to emulate how the human brain does this activity or comprehends language.

However, a new study by MIT neuroscientists reveals that the fundamental function of these models is similar to that of human language-processing regions.

Computer models that do well on other sorts of language tasks lack this closeness to the human brain, suggesting that the human brain may employ next-word prediction to drive language processing.

“The better the model is at predicting the next word, the more closely it fits the human brain,” says Nancy Kanwisher, the Walter A. Rosenblith Professor of Cognitive Neuroscience, a member of MIT’s McGovern Institute for Brain Research and Center for Brains, Minds, and Machines (CBMM), and an author of the new study. “It’s amazing that the models fit so well, and it very indirectly suggests that maybe what the human language system is doing is predicting what’s going to happen next.”

Joshua Tenenbaum, a professor of computational cognitive science at MIT and a member of CBMM and MIT’s Artificial Intelligence Laboratory (CSAIL); and Evelina Fedorenko, the Frederick A. and Carole J. Middleton Career Development Associate Professor of Neuroscience and a member of the McGovern Institute, are the senior authors of the study, which appears this week in the Proceedings of the National Academy of Sciences. Martin Schrimpf, an MIT graduate student who works in CBMM, is the first author of the paper.

The better the model is at predicting the next word, the more closely it fits the human brain. It’s amazing that the models fit so well, and it very indirectly suggests that maybe what the human language system is doing is predicting what’s going to happen next.

Nancy Kanwisher

Making predictions

Deep neural networks are a type of model that includes the new, high-performing next-word prediction models. Computational “nodes” make connections of different strengths, and layers transport information between each other in predetermined ways in these networks.

Deep neural networks have been employed by scientists over the last decade to construct vision models that can distinguish objects as well as the monkey brain. Even though the computer models were not intentionally created to emulate the brain, research at MIT has proven that the fundamental function of visual object identification models fits the structure of the monkey visual cortex.

The MIT researchers used a similar method to compare language-processing areas in the human brain to language-processing models in their latest study. The researchers looked at 43 different language models, including a few that are specifically designed to predict the following words. One of these is the GPT-3 (Generative Pre-trained Transformer 3) model, which can generate language that is comparable to what a human would produce when given a stimulus.

Other models were created to do other language tasks, such as filling in a sentence’s blanks. The researchers measured the activity of the nodes that make up the network when each model was presented with a string of words.

They then matched these patterns to brain activity in people who were undertaking three linguistic tasks: listening to stories, reading sentences one at a time, and reading sentences with one word exposed at a time. The functional magnetic resonance (fMRI) data and intracranial electrocorticographic measures taken in persons having brain surgery for epilepsy were among the human datasets.

The best-performing next-word prediction models displayed activity patterns that were highly similar to those seen in the human brain, according to the researchers. Human behavioral measurements such as how quickly people could read the text were substantially associated with activity in those same models.

“We found that the models that predict the neural responses well also tend to best predict human behavior responses, in the form of reading times. And then both of these are explained by the model performance on next-word prediction. This triangle really connects everything together,” Schrimpf says.

Game changer

A forward one-way predictive transformer is one of the most important computational aspects of predictive models like GPT-3. Based on previous sequences, this type of transformer can generate predictions about what will happen next. This transformer is notable for its ability to make predictions based on a large amount of prior context (hundreds of words), rather than just the last few words.

According to Tenenbaum, scientists have not discovered any brain circuits or learning mechanisms that match to this type of processing. The current findings, however, are consistent with earlier notions that prediction is one of the essential roles in language processing, he says.

“One of the challenges of language processing is the real-time aspect of it,” he says. “Language comes in, and you have to keep up with it and be able to make sense of it in real-time.”

The researchers will now construct versions of these language processing models to see how tiny changes in their architecture affect their performance and capacity to suit human brain data.

“For me, this result has been a game-changer,” Fedorenko says. “It’s totally transforming my research program because I would not have predicted that in my lifetime we would get to these computationally explicit models that capture enough about the brain so that we can actually leverage them in understanding how the brain works.”

Tenenbaum’s lab has previously produced computer models that can handle various tasks, such as generating perceptual representations of the physical world, which the researchers hope to combine with these high-performing language models.

“If we’re able to understand what these language models do and how they can connect to models which do things that are more like perceiving and thinking, then that can give us more integrative models of how things work in the brain,” Tenenbaum says.

“This could take us toward better artificial intelligence models, as well as giving us better models of how more of the brain works and how general intelligence emerges than we’ve had in the past.”

The research was funded by a Takeda Fellowship; the MIT Shoemaker Fellowship; the Semiconductor Research Corporation; the MIT Media Lab Consortia; the MIT Singleton Fellowship; the MIT Presidential Graduate Fellowship; the Friends of the McGovern Institute Fellowship; the MIT Center for Brains, Minds, and Machines, through the National Science Foundation; the National Institutes of Health; MIT’s Department of Brain and Cognitive Sciences; and the McGovern Institute.

Other authors of the paper are Idan Blank Ph.D. ’16 and graduate students Greta Tuckute, Carina Kauf, and Eghbal Hosseini.

Topic : Article