Scaling EEG foundation models on VSC: an ICLR 2026 benchmark and a NeurIPS 2025 EEG Challenge win

Apr 21
4 min read

By Liuyin Yang

Electroencephalography (EEG) is a popular and non-invasive way to measure brain activity, but decoding meaningful information from EEG remains difficult. Signals are noisy, vary widely across people, and change substantially across tasks and recording setups. These challenges limit the real-world reliability of EEG-based brain–computer interfaces and clinical applications. Our Tier-1 project (gpr_compute_2025_038) at KU Leuven tackles this by training a large EEG foundation model—a model that first learns general-purpose representations from large, diverse EEG data, and can then be adapted to new tasks and new subjects with minimal additional training. The goal is to learn reusable neural representations that generalize across tasks and subjects, rather than building a new model for every dataset.

Tier-1 pretraining enabled ST-EEGFormer

Using Tier-1 resources, we trained our Spatiotemporal EEGFormer (ST-EEGFormer), a simple yet effective transformer-based baseline, pre-trained via masked autoencoding (masked reconstruction) on over 8 million EEG segments (Figure 1(b)). This large-scale pretraining step is computationally intensive: it requires long-running distributed training, careful monitoring, and repeated experiments to ensure stable convergence. Concretely, we relied on the VSC Tier-1 Hortense system, including GPU nodes equipped with 4× NVIDIA A100-SXM4 80GB GPUs per node (in the gpu_rome_a100_80 partition), which makes large-batch multi-GPU training feasible.

ICLR 2026: large-scale benchmark evidence built on Tier-1 + Tier-2

The Tier-1 pretraining served as the backbone of our ICLR 2026 paper, “Are EEG Foundation Models Worth It? Comparative Evaluation with Traditional Decoders in Diverse BCI Tasks” (accepted at ICLR 2026). In this work, we conducted a broad benchmark of EEG foundation models across diverse datasets and decoding tasks, using six complementary evaluation protocols (from data-rich population decoding to transfer and low-data settings), with rigorous statistical testing (Figure 1a). Key findings are:

When enough labeled data is available, and models are fully fine-tuned, foundation models can deliver strong performance in population-level settings. In low-data or hard-transfer settings, foundation models do not consistently or significantly outperform compact neural networks—and sometimes even classical non-neural decoders remain highly competitive.
Compared with other state-of-the-art EEG foundation models, ST-EEGFormer achieved superior decoding performance, and the large variant attained the top mean rank (Fig. 2a).
Overall, performance differences depend not only on pretraining, but also on downstream architectural and training choices—highlighting the need for transparent, statistically rigorous evaluation.

Compute-wise, we used a two-tier workflow: Tier-1 for large-scale pretraining, and Tier-2 KU Leuven resources for the extensive downstream benchmarking runs (many datasets × protocols × models).

Community-scale validation: the NeurIPS 2025 EEG Foundation Challenge:

We further validated the practical value of our pretrained model in the NeurIPS 2025 EEG Foundation Challenge, designed to push EEG decoding toward cross-task transfer learning and subject-invariant representations. The competition is built on the HBN-EEG dataset, featuring high-density EEG from over 3,000 participants across six cognitive tasks. As the KU Leuven EEG Decoding Team, we tackled both competition tracks: for Challenge 1 (predicting response time in Contrast Change Detection), the Computational Neuroscience Lab team (Liuyin, Ang, Bob, Qiang) developed and validated the pretrained EEG foundation model; for Challenge 2 (psychopathology / externalizing factor prediction from EEG), the ExpORL team—led by Corentin, Jonas, and colleagues, developed a lightweight transformer decoder.

Key outcome:

As the KU Leuven EEG Decoding, we achieved 1st place in Challenge 1 on the final leaderboard with a score of 0.88668, being the only team with a score below 0.90 among more than 1000 teams and 8000 submitted solutions. In parallel, we also participated in Challenge 2 as a KU Leuven team effort (with the ExpORL subgroup developing a lightweight transformer decoder). Considering our combined performance across both challenges (Challenge 1 + Challenge 2), our team ranked 3rd overall by the combined-score tracking used during the competition period; meanwhile, our Challenge 1 result remains the official #1 final ranking. Note that the organizers ultimately awarded Challenge 1 and Challenge 2 separately in the final results/prize structure (Figure 2c).

These results led to an invitation for the winning teams to present at NeurIPS 2025 in San Diego, including a 5-minute spotlight talk at the Foundation Models for the Brain and Body NeurIPS 2025 Workshop, and a 15-minute presentation slot during the NeurIPS competition track session by Liuyin Yang, supervised by Professor Marc Van Hulle (Figure 2b).

Why VSC compute mattered:

VSC Tier-1 compute made it practical to pretrain and iterate on a large EEG foundation model at scale—running multi-GPU training on A100 nodes, validating stability, and turning months of experimentation into a reusable model that later powered our NeurIPS 2025 Challenge-1 winning solution. Without Tier-1 resources, neither the scale of pretraining nor the pace of iteration would have been feasible.

Figure 1 (a) Graphical representation of the six evaluation protocols used in the benchmark study. (b) Graphical representation of the proposed ST-EEGFormer. During pre-training, the input EEG data are divided into segments along spatial and temporal dimensions. Each segment is tokenized through a linear projection layer, with each token receiving its corresponding temporal positional encoding (TPE) and spatial positional encoding (SPE). After randomly masking 75% of all tokens, the encoder processes the remaining unmasked tokens. The mask tokens, with their added temporal and spatial positional embeddings, are then concatenated with the encoder output to form a full set of tokens. This full set of tokens is input to a small decoder comprising a transformer followed by a linear projection layer, which reconstructs the original EEG signal. Once the model is pre-trained, only the encoder is utilized as the ST-EEGFormer model for fine-tuning on a downstream dataset. — **Figure 1** (a) Graphical representation of the six evaluation protocols used in the benchmark study. (b) Graphical representation of the proposed ST-EEGFormer. During pre-training, the input EEG data are divided into segments along spatial and temporal dimensions. Each segment is tokenized through a linear projection layer, with each token receiving its corresponding temporal positional encoding (TPE) and spatial positional encoding (SPE). After randomly masking 75% of all tokens, the encoder processes the remaining unmasked tokens. The mask tokens, with their added temporal and spatial positional embeddings, are then concatenated with the encoder output to form a full set of tokens. This full set of tokens is input to a small decoder comprising a transformer followed by a linear projection layer, which reconstructs the original EEG signal. Once the model is pre-trained, only the encoder is utilized as the ST-EEGFormer model for fine-tuning on a downstream dataset.

Figure 2 (a) Benchmark results: the average ranks of different models across evaluation protocols: our proposed ST-EEGFormer large yields the best performance. (b) Spotlight talk at the Foundation Models for the Brain and Body NeurIPS 2025 Workshop. (c) Challenge certificate.

🔍 Your Research Matters — Let’s Share It!

Have you used VSC’s computing power in your research? Did our infrastructure support your simulations, data analysis, or workflow?

We’d love to hear about it!

Take part in our #ShareYourSuccess campaign and show how VSC helped move your research forward. Whether it’s a publication, a project highlight, or a visual from your work, your story can inspire others.

🖥️ Be featured on our website and social media. Show the impact of your work. Help grow our research community

📬 Submit your story: https://www.vscentrum.be/sys