One of the strengths of artificial intelligence is that the structure of the coded neural nets that make machine learning possible is based on the layout of the neurons of our own brains, a branching, neurological-like network capable of learning from trial and error. However, a new study has discovered that older AI models can suffer from cognitive decline, similar to how human brains can begin to fail with age.

While this issue mightn’t necessarily be a serious problem when it comes to generating text or images for most users, it could be a deal-breaker when it comes to more serious applications such as medical diagnoses, which is why the study, titled Age against the machine—susceptibility of large language models to cognitive impairment: cross sectional analysis, was published in the British Medical Journal.

The study applied a widely-used benchmark called the Montreal Cognitive Assessment (MoCA) test to a number of AI models that included Open-AI’s ChatCPT 4 and GPT-4o; Anthropic’s Claude 3.5 Sonnet; and Alphabet’s (Google) Gemini versions 1 and 1.5 to “evaluate the cognitive abilities of the leading large language models and identify their susceptibility to cognitive impairment,” according to the study.

MocA is typically used to uncover early signs of Alzheimer’s or dementia in human patients, using simple cognitive tasks that evaluate an individual’s attention, executive mental function, language, memory, and spatial capabilities. For humans, the test takes about 10 minutes to complete, and is evaluated on a score with a maximum of 30 points; 26 or more points indicates that the participant has normal cognitive functions, with the average score amongst unimpaired individuals being 27.4; individuals with mild impairment average 22.1, while Alzheimer’s patients average 16.2.

Being a newer model, OpenAI’s GPT-4o scored a (barely) passing grade of 26, but the other models involved showed varying states of mental decline, and the researchers found that the age of the AI corresponded to how low it scored. ChatGPT 4 and Claude came just under the wire with a score of 25; Gemini 1.5 scored 22 points, indicating mild impairment, while its older 1.0 counterpart only scored 16, showing a more severe decline in its capabilities.

“With the exception of ChatGPT 4o, almost all large language models subjected to the MoCA test showed signs of mild cognitive impairment,” the study concluded. “Moreover, as in humans, age is a key determinant of cognitive decline: “older” chatbots, like older patients, tend to perform worse on the MoCA test.

“These findings challenge the assumption that artificial intelligence will soon replace human doctors, as the cognitive impairment evident in leading chatbots may affect their reliability in medical diagnostics and undermine patients’ confidence.”

The study authors point out that their findings should be taken with a grain of salt: although neural networks have a structure based on biological layouts, there are necessary differences in how they work, and AI models such as these were not designed to mimic certain human functions that we take for granted; for instance, “all large language models showed impaired visuospatial reasoning skills,” illustrated in their difficulty in certain tasks on the test, such as correctly drawing a clock face, or tracing a line between sequential points on the trail making B task (TMBT), a simple connect-the-dots test.

But the authors do warn that these findings, especially “in tasks requiring visual abstraction and executive function highlights” significant shortfalls in AI’s ability to operate reliably and consistently in a clinical setting, where a patient’s well-being could be on the line.

“The inability of large language models to show empathy and accurately interpret complex visual scenes further underscores their limitations in replacing human physicians,” the paper concludes.

“Not only are neurologists unlikely to be replaced by large language models any time soon, but our findings suggest that they may soon find themselves treating new, virtual patients—artificial intelligence models presenting with cognitive impairment.”

Dreamland Video podcast
To watch the FREE video version on YouTube, click here.

Subscribers, to watch the subscriber version of the video, first log in then click on Dreamland Subscriber-Only Video Podcast link.

3 Comments

  1. Hang on, the title of this article suggests that the score of a given AI model worsens over time, implying the AI version of neurological degeneration…but reading the article, all it seems to be saying is that newer and newer versions of the model get a progressively higher score.

    If I am correct in the above, then this was perilously close to being clickbait 🤔

    1. The article was intended to communicate that there is a degradation of models over time, and that this mimics cognitive decline. I don’t think it qualifies as clickbait.

      1. I would agree with the premise that there is a degredation of models over time, if that was the testing that had actually been done…but I don’t see anywhere it mentioning repeated testing of the same model over an extended period of time, in this news article. All they are saying is that previous versions are worse, which is not the same at all. An analogy might be that subsequent models of a particular car are designed to have a higher and higher top speed. That is not the same as saying the top speed of a given car model declines as it ages.

Leave a Reply