Meta's TRIBE v2 Can Predict Your Brain Activity Neuroscience May Never Be the Same

David Borish
Mar 29
5 min read

From Four Subjects to 720

The original TRIBE model, which won first place at the Algonauts 2025 brain modeling competition last year, was trained on low-resolution fMRI data from just four individuals. It combined pretrained representations from Meta's own foundation models, including Llama 3.2 for text, Wav2Vec2-BERT for audio, and V-JEPA 2 for video, and used a transformer to handle the time-evolving nature of brain responses to naturalistic stimuli.

TRIBE v2 keeps that multimodal architecture but scales it dramatically. The training dataset now spans more than 1,000 hours of fMRI recordings across 720 subjects. Brain resolution jumped from roughly 1,000 parcels to approximately 70,000 voxels. The model processes diverse stimuli types, from complex Hollywood films to spoken-word podcasts to static images and written passages.

This expansion matters because neuroscience models have historically been limited by small sample sizes and narrow experimental conditions. A model trained on four people watching movies can tell you something about movie-watching brains. A model trained on hundreds of people across multiple input types starts to look like a general-purpose tool for studying cognition.

Outperforming the Standard Playbook

The benchmark results show consistent advantages over established methods. When predicting neural responses to spoken language, TRIBE v2 achieved a prediction score of 0.20, compared to 0.12 for conventional fMRI analysis models and 0.07 for simple linear encoding approaches. The gap widened for video processing: TRIBE v2 scored 0.28, versus 0.15 for competing methods and 0.11 for linear models.

These numbers may look modest in isolation, but in fMRI research, where biological noise and measurement limitations create a hard ceiling on prediction accuracy, a near-doubling of performance is significant. In auditory and language cortices specifically, TRIBE v2's predictions approached the noise ceiling, meaning the model was capturing nearly as much signal as the physics of fMRI scanning would allow.

The multimodal design proved critical. When the team tested unimodal versions of the model, each trained on only one input type, they found that single-modality models could reliably predict activity in their corresponding cortical networks. Visual models predicted visual cortex well, auditory models predicted auditory cortex well. But in higher-order associative regions, where the brain integrates information across senses, the multimodal model consistently outperformed every unimodal variant.

Zero-Shot Prediction Across New Subjects and Tasks

Perhaps the most consequential capability is zero-shot generalization. TRIBE v2 can predict high-resolution brain activity for individuals it has never been trained on, in languages it hasn't seen, and for experimental tasks outside its training distribution. Meta reports this represents a two- to three-fold improvement over previous methods for both movie-watching and audiobook-listening paradigms.

This matters because the traditional workflow in neuroscience is expensive, slow, and constrained by ethics review. Testing a hypothesis about how the brain processes sarcasm in speech requires recruiting subjects, getting institutional approval, booking scanner time, running the experiment, and analyzing the data. Each step takes weeks or months. A model that reliably predicts brain responses could let researchers run hundreds of preliminary tests computationally, identifying the most promising hypotheses before committing to human trials.

The researchers tested this in silico experimentation capability against established findings from decades of cognitive neuroscience. They ran the model through seminal visual and neuro-linguistic paradigms and found that TRIBE v2 recovered a variety of results that had taken years of empirical research to establish. The model independently reproduced known patterns without being explicitly trained on those experimental protocols.

Mapping Multisensory Integration

Beyond prediction accuracy, TRIBE v2 offered new insights into how the brain combines information from different senses. By extracting interpretable latent features from the model, the research team mapped the fine-grained topography of multisensory integration across the cortical surface.

This analysis confirmed that brain regions responsible for integrating vision, sound, and language are distributed in patterns that align with existing neuroscientific understanding but added new resolution to the picture. The model's latent representations revealed gradients of modal preference across association cortex, showing precisely where and how the brain transitions from processing individual sensory streams to constructing unified perceptual experiences.

For neuroscientists, this kind of mapping at scale has been difficult to achieve with traditional methods. Most existing studies examine one or two modalities in isolation. A model that simultaneously processes all three and maps their interaction onto brain anatomy provides a tool for questions that were previously hard to ask.

The Open Science Play

Meta is releasing TRIBE v2's research paper, model weights, and code under a Creative Commons BY-NC license, meaning non-commercial use is freely permitted. The company has also launched a demo website where researchers can explore the model's predictions interactively.

The open release positions this as a research infrastructure contribution rather than a product announcement. But it also aligns with Meta's broader investments in brain-computer interface technology through its Reality Labs division. Understanding how the brain processes multimodal input is directly relevant to building AR and VR systems that adapt to user perception in real time.

The dataset itself, aggregating more than 1,000 hours of fMRI from the Courtois NeuroMod project and other sources, represents one of the largest unified neuroimaging collections assembled for AI training. Making a model trained on that data openly available gives academic labs access to computational resources that would be prohibitively expensive to build independently.

What TRIBE v2 Gets Right, and Where It Falls Short

TRIBE v2 has clear limitations. fMRI operates on a timescale of roughly 1.5 seconds per sample, which means millisecond-level neural dynamics remain invisible. The model predicts aggregate BOLD signals, not individual neuron firing patterns. And any in silico finding would still need experimental validation before it could be considered established science.

But the trajectory is notable. The jump from TRIBE v1 (four subjects, low resolution, one data type) to TRIBE v2 (720 subjects, 70,000 voxels, multiple modalities) took less than a year. If the scaling patterns from other foundation model domains hold, the next iteration could incorporate even more subjects, finer temporal resolution from complementary imaging methods like magnetoencephalography, and additional modalities such as touch or proprioception.

The larger question is whether AI models of this kind will fundamentally change how neuroscience research is conducted. The paper's authors argue that "interpretation and scientific insights are most legitimate if they derive from the model that makes the best prediction." If TRIBE v2 or its successors can consistently outpredict all alternatives, the field may increasingly organize around computational models as its primary tools, with human experiments serving to validate rather than generate hypotheses.

For an era in which AI continues to move from analyzing the world to simulating it, a digital twin of the human brain is among the more consequential additions to the toolkit.

DAVID BORISH

Meta's TRIBE v2 Can Predict Your Brain Activity Neuroscience May Never Be the Same

From Four Subjects to 720

Outperforming the Standard Playbook

Zero-Shot Prediction Across New Subjects and Tasks

Mapping Multisensory Integration

The Open Science Play

What TRIBE v2 Gets Right, and Where It Falls Short

Comments

SIGN UP FOR MY NEWSLETTER

ARTIFICIAL INTELLIGENCE, BUSINESS, TECHNOLOGY, RECENT PRESS & EVENTS

Back to top