LLMs Don’t Build Cognitive Debt. They Enable Cognitive Amplification (Terms and Conditions Apply)

Recently a study out of MIT titled: Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task, has received at lot of attention. The conclusion of the study is that using LLMs for essay writing (as a proxy for all tasks I suppose) is lowering your cognitive function.

But what if I say that there is a different conclusion, if the same data can be interpreted quite differently. That LLMs are a tool that when used correctly leads to a superior output, and indeed increases your cognitive engagement with the subject. Exactly the opposite of the conclusion reached by the authors.

LLM novices

The LLM-group only were all novices, and had not used chatGPT before. This is shown in figure 29 of the paper. The authors even say it: Interestingly, P17, a first‑time ChatGPT user, reported experiencing ‘analysis‑paralysis’ during the interaction. This could explain the brain patterns observed. That the LLM-group is not only tasked with writing an assay but also learning a new tool. This can explain the split attention and lower brain activity related to the essay writing.

During the study (in session 3) the LLM-group also display learning the tool over the cause of the study: LLM group’s connectivity declined by Session 3, consistent with a neural efficiency adaptation, repeated practice leading to streamlined networks and less global synchrony.

This pattern is exactly what you’d expect from novices learning a new tool – high initial cognitive load that decreases as they become more efficient.

Lack of recall

One of the major arguments for less cognitive engagement of the LLM-group is the lack of perfect recall of the text later. However, the LLM group is also the group with the most advanced language (near perfect language structure). This group had more n-grams, more NERs and overall more complex text. This could also be the explanation for why this group has less recall, considering that remembering more advanced text, while you were learning a new tool is likely a harder task, than recalling easier text you wrote in a familiar setting.

LLM-to-brain, session 4

Another finding the authors use to argue for the cognitive debt is the lack of increase in neural activity when the LLM group has to write “brain-only” essays in session 4. However, here the study is confounded. In session 4 the subjects are asked to choose a topic they have already worked on in a previous session. So what if the participants simply chose the wrong strategy here, and tried to recall the “near perfect” previously written language (which they did as it is supported by similar use of n-grams). The current study cannot tell the difference. In other words this pattern could arise from:

  1. Cognitive dependency on LLM-generated patterns (authors claim)
  2. Strategic reuse of effective language from previous essays on the same topics (my interpretation)

Brain-to-LLM, session 4

Lastly, the brain-to-LLM group in session 4 showed increased neural activity and produced better assays

“Brain-to-LLM group entered Session 4 after three AI-free essays. The addition of AI assistance produced a network‑wide spike in alpha‑, beta‑, theta‑, and delta‑band directed connectivity.”

“Across all frequency bands, Session 4 (Brain-to-LLM group) showed higher directed connectivity than LLM Group’s sessions 1, 2, 3. This suggests that rewriting an essay using AI tools (after prior AI-free writing) engaged more extensive brain network interactions.”

“Brain-to-LLM participants could leverage tools more strategically, resulting in stronger performance and more cohesive neural signatures.”

All of the above points to LLMs acting as a cognitive amplification when used as a strategic support tool.

Conclusion

Science is not all numbers and data, it is also about interpretation and where you focus your attention. To me it seems that the authors had a very specific agenda here. Instead of presenting the findings as the murky mess they are, admitting to strong confounding effects, and lack of clear cut LLMs are good / bad arguments they took a stance not supported by the data.

My interpretation is equally valid, and is also supported by their findings. Despite the flaws of the study, and the differences in interpretation, it is important work. And we need more work like this – albeit with a better study design – to highly exactly what our future reliance on AI tools will do to our own agency and decision making. I had just hoped, the authors would have stayed more clear to the data, and represented the it as “we really can’t tell, it all depends”. Because it does, and it will for a long time. Just as there are good and bad things about social media, there will be good and bad effects of LLMs and AI. It is for us to understand this, and make choices to use the technology responsibly.