EEG Strategy Classification

Can brain activity reveal how someone solved a math problem — from memory or by computation?

2022–2024 · Research Assistant, Utrecht University · Proof of concept

neuroscienceEEGMLPythonscikit-learn

The Problem

When someone multiplies 6 × 4, their brain either remembers the answer or actually computes it. These two strategies look different on an EEG — different timing, different stage patterns. But standard analysis averages hundreds of trials together, which washes out exactly the per-trial variation you'd need to tell them apart.

Nobody had tried using HMP stage patterns as classification features before. So that's what I did.

The Approach

42 participants, 72-channel EEG at 2048 Hz. I ran PCA down to 10 components, split the data participant by participant, then trained a 4-event Hidden Semi-Markov Model only on retrieval trials. The test set was a balanced holdout: held-out retrieval plus procedural trials the model had never seen. If the model really learned what "remembering" looks like, its fit score (Sum Log-Likelihood) should land higher on retrieval than on computation.

The slightly weird part: I repurposed hsmm_mvpy, a package built for cognitive stage recovery, as a classification tool. It wasn't designed for this. That was the whole bet.

Single-trial classification pipeline: raw EEG → dimensionality reduction → stage recovery → strategy prediction

The Outcome

AUC = 0.578

Above chance but not useful yet. The retrieval model does assign systematically higher fit scores to retrieval trials — so the signal exists, it's just thin. Single-trial EEG is noisy, SLL wasn't optimized as a classification score, and there's probably genuine overlap between strategies. An honest 0.578 told me more than a cherry-picked accuracy number would have.

ROC curve — the classifier performs slightly above chance (0.50 diagonal)

Packaging It for Review

The analysis itself was only half the job. Research work gets hard to inspect fast once the real story is split across notebooks, saved intermediates, and an older draft that no longer matches the final public version exactly.

I turned the repo into something a reviewer can parse in one pass: a faster summary, a plain-English method walkthrough, a reproducibility note, and a short explanation of what changed between the draft report and the final public pipeline.

That cleanup mattered here because the honest version of the result is not “breakthrough classifier.” It is “proof of concept, modest signal, clear limits.” Making that explicit made the project stronger, not weaker.

What I'd Do Differently

The SLL-as-feature approach was a blunt instrument. A logistic regression over per-trial stage probability vectors would give the classifier more to work with. And I pooled all 42 participants together — participant-level cross-validation would be more honest about whether this generalizes or just learns individual brains.