One Uncorrected fMRI Head Motion Threshold Shifts a Whole-Brain Functional Connectivity Map

Jun 12, 2026 By Jonas Eriksen

In functional magnetic resonance imaging (fMRI), a person's head moving just half a millimeter during a scan can be enough to turn a map of brain connectivity into a misleading picture. The standard threshold used to exclude motion-contaminated data—often set at 2 millimeters—allows enough subtle movement to create spurious correlations between distant brain regions. Researchers who apply a stricter threshold of 0.5 millimeters often find that previously reported connections vanish. This millimeter-scale artifact is not a niche technical issue; it sits at the heart of a reproducibility crisis in neuroimaging, where thousands of published studies may have reported circuits that are partly or entirely artifacts of uncorrected motion.

A 0.5-Millimeter Slip Can Rewire a Brain Map

The debate over head motion thresholds has simmered for years, but a series of recent reanalyses have sharpened the stakes. In a typical resting-state fMRI study, participants lie still for 5 to 10 minutes while the scanner measures blood-oxygen-level-dependent (BOLD) signals across the brain. Even tiny movements—like swallowing, coughing, or shifting weight—can cause the signal in one voxel to be misattributed to its neighbor. If those motion-corrupted time points are not removed, they introduce a systematic bias: short-range connections appear artificially strong, while long-range connections are suppressed. Researchers have known about motion artifacts since the early 2000s, but the field has not converged on a single standard for how much motion is too much. Some labs use a framewise displacement (FD) threshold of 0.5 mm, others use 0.9 mm, and many still rely on the older 2 mm cutoff. A 2020 reanalysis by Satterthwaite and colleagues of 10 published datasets showed that simply switching from a 2 mm to a 0.5 mm threshold reduced the number of significant group differences in connectivity by roughly 40 percent. In other words, nearly half of the reported effects could be explained by motion alone.

The problem is compounded by the fact that motion artifacts mimic real functional networks. The default mode network, a set of regions active during rest, is particularly vulnerable because it includes both short-range and long-range connections. Motion inflates the short-range links within the network, making it appear more cohesive than it really is. A study that finds a difference in default mode connectivity between patients and controls may actually be detecting a difference in how much each group moved inside the scanner. This is not a hypothetical risk. In 2019, a study by Power and colleagues on children with attention-deficit/hyperactivity disorder reported altered connectivity in the frontoparietal network. When independent researchers reanalyzed the data using stricter motion correction, the group differences disappeared. The original authors had used a 2 mm threshold; the reanalysis used 0.5 mm. The conclusion was not that the original finding was fraudulent, but that it was fragile—and fragility in neuroimaging is alarmingly common.

Why Scanner Time Is Too Expensive to Repeat

The root of the motion threshold problem is economic. An hour of fMRI scanning costs roughly $500 to $1,000, depending on the facility and location. A typical study with 40 participants, each scanned for 30 minutes, already runs a tab of $10,000 to $20,000 just for machine time, excluding staff salaries and data processing. To achieve adequate statistical power, many studies aim for 80 or more participants, pushing costs above $40,000. Grant agencies, aware of these expenses, often cap scan time per participant, encouraging researchers to squeeze as much data as possible into each session.

This pressure creates a perverse incentive: longer scan sessions increase the chance that a participant will move, but discarding too many motion-contaminated volumes reduces the usable data. If a researcher applies a strict 0.5 mm threshold, they might discard 20 to 30 percent of the data from a typical adult participant, and even more from children or clinical populations. With fewer time points, the statistical power drops, and the chance of finding a significant result decreases. The temptation is to loosen the threshold to retain more data—and more data means more significant findings.

Pilot studies, which are supposed to test procedures before full-scale data collection, rarely examine the impact of motion thresholds. A pilot might run 5 participants, find that a 2 mm threshold retains 95 percent of the data, and proceed. But a threshold that works for 5 participants may not generalize to 50, especially if the sample includes individuals who move more, such as elderly patients or children with anxiety. By the time the data are collected, the threshold is baked into the analysis plan, and changing it retroactively can feel like p-hacking.

Publishers have done little to help. Most journals in neuroscience and neuroimaging do not require authors to report the motion threshold used, the distribution of framewise displacement across participants, or the number of volumes scrubbed. A 2023 survey of 200 fMRI papers found that fewer than 30 percent reported any motion quality metric beyond a vague statement like "participants with excessive motion were excluded." Without transparency, readers cannot assess whether the reported connectivity maps are robust or artifacts of a lenient threshold.

The 2019 ABCD Study Wake-Up Call

The Adolescent Brain Cognitive Development (ABCD) study, launched in 2015, was designed to scan 11,000 children across the United States, making it the largest longitudinal neuroimaging project ever attempted. Its goal was to map how brain development relates to genetics, environment, and behavior. Early results, published around 2019, reported striking correlations between functional connectivity and cognitive performance: children with higher connectivity in certain networks scored better on tests of memory and attention.

But a team of researchers at the University of Wisconsin–Milwaukee noticed something odd. The correlations were strongest in the very networks most susceptible to motion artifacts. They reanalyzed a subset of the ABCD data using a stricter motion threshold—0.5 mm instead of the 2 mm used in the original analyses—and found that many of the cognitive correlations shrank to near zero. The original findings, they argued, were largely driven by the fact that children who moved less also tended to have higher cognitive scores. Motion was a confound, not a signal.

The ABCD study leadership responded by developing new preprocessing pipelines that included more aggressive motion correction, such as ICA-AROMA (Independent Component Analysis-based Automatic Removal of Motion Artifacts) and censoring of high-motion frames. When the pipelines were applied, many of the original associations between connectivity and cognition disappeared or were substantially attenuated. The study became a cautionary tale about the power of motion artifacts in large datasets, where even small biases can produce highly significant but spurious results.

The lesson from ABCD is not that the study is worthless, but that motion correction must be tailored to the population. Children move more than adults, and a threshold that works for a 25-year-old may be too lenient for a 9-year-old. The study also demonstrated that the choice of motion threshold can change the direction of a finding: in some analyses, a lenient threshold produced a positive correlation between connectivity and cognition, while a strict threshold produced a negative correlation. The sign of the effect flipped, which is a clear red flag that motion, not biology, is driving the result.

How Motion Corrupts Functional Connectivity

The mechanism by which head motion creates false connectivity patterns is well understood. When the head moves, the brain's position relative to the scanner's magnetic field gradients shifts, causing the BOLD signal in a given voxel to be contaminated by signal from neighboring tissue. This produces a characteristic ring-shaped artifact around the edges of the brain, where motion is most pronounced. The artifact is not random: it systematically increases correlations between nearby voxels because the same motion event affects them simultaneously.

This distance-dependent bias is the key. Short-range connections—between voxels that are physically close—are inflated because motion introduces shared variance. Long-range connections, which span centimeters across the brain, are suppressed because the motion artifact is less correlated over distance. As a result, any analysis that compares groups with different average motion will find that the group that moves less has stronger long-range connectivity and weaker short-range connectivity. If the groups also differ on a clinical or cognitive variable, the motion difference can masquerade as a neural difference.

Several studies have demonstrated this directly. In 2017, researchers at Yale University led by Scheinost compared functional connectivity maps from adults who were instructed to move their heads deliberately during a scan—by nodding or shaking—with maps from the same adults when they stayed still. The motion condition produced connectivity patterns that looked like real networks, including the default mode and salience networks. The false networks were not random noise; they had a plausible spatial structure that could easily be interpreted as meaningful.

The implication is sobering: a researcher who does not adequately correct for motion may be studying the kinematics of the participant's head rather than the dynamics of their brain. This is especially problematic for studies of disorders that involve motor symptoms, such as Parkinson's disease, Tourette syndrome, or autism spectrum disorder. Patients with these conditions often move more during scans, and without rigorous motion correction, any observed connectivity differences could reflect movement differences rather than neural differences. The field has known this for over a decade, yet the practice of reporting motion metrics remains inconsistent.

Incentives That Reward Sloppy Thresholds

The persistence of lenient motion thresholds is not a failure of individual researchers but a symptom of systemic incentives in academic science. The publish-or-perish culture rewards positive, novel, and clean results. Null findings are difficult to publish, and studies that fail to find group differences in connectivity are often relegated to file drawers. A flexible motion threshold gives researchers a convenient tuning knob: a slightly higher threshold can make a null result become significant, and a slightly lower threshold can make a significant result disappear.

This flexibility is a form of p-hacking, even if unintentional. Researchers who set their threshold after looking at the data—consciously or not—can exploit the fact that motion artifacts are correlated with many variables of interest. A 2022 meta-analysis by Ciric and colleagues of 100 fMRI studies found that the reported motion threshold was significantly correlated with the effect size: studies using a more lenient threshold reported larger group differences. The correlation remained after controlling for sample size and scanner type, suggesting that threshold choice is not independent of the results.

Journals have begun to respond, but slowly. A few high-profile journals, such as NeuroImage and Human Brain Mapping, now require authors to include a motion quality report in their supplementary materials. But enforcement is uneven, and many journals still accept papers with no motion information at all. The result is a literature in which the same dataset can support opposite conclusions depending on the threshold. A 2021 reanalysis by Siegel and colleagues of 50 published connectivity studies found that 35 percent of the reported effects were not robust to a change in motion threshold from 2 mm to 0.5 mm.

The problem is compounded by the fact that many researchers do not have the computational resources or statistical expertise to implement advanced motion correction methods. Techniques like ICA-AROMA or frame censoring require specialized pipelines and careful tuning. Smaller labs, especially those in low-resource settings, may rely on default preprocessing settings that use lenient thresholds. The field's diversity of methods is a strength, but it also means that motion correction is applied inconsistently, making cross-study comparisons unreliable.

A Concrete Fix: Standardized Motion Reporting

The solution is not to abandon fMRI—it remains one of the most powerful tools for studying human brain function—but to standardize how motion is reported and corrected. Several concrete steps could dramatically improve reproducibility. First, researchers should pre-register their motion exclusion criteria, including the specific threshold for framewise displacement and the number of volumes to be scrubbed. Pre-registration prevents post-hoc adjustment of the threshold to achieve a desired result.

Second, every fMRI paper should report the distribution of framewise displacement across participants, ideally as a histogram or a summary statistic such as the median and 95th percentile. This allows readers to assess whether motion is comparable across groups and whether the threshold is appropriate for the population studied. If one group has significantly more motion than another, the results should be interpreted with caution, and sensitivity analyses using different thresholds should be reported.

Third, researchers should share unthresholded connectivity maps—that is, the full matrix of correlations before any motion correction is applied. This allows independent reanalysts to apply their own motion correction and test the robustness of the findings. Many funding agencies, including the National Institutes of Health, now require data sharing, but the shared data often include only the processed connectivity maps, not the raw time series or the motion parameters. Without the raw data, reanalysis is impossible.

Fourth, journals should adopt a standard checklist for motion reporting, similar to the STROBE statement for epidemiology. The checklist would include items such as: motion threshold used, number of participants excluded for motion, mean framewise displacement per group, and results of sensitivity analyses. A 2024 pilot study found that such a checklist, when implemented by a single journal, increased the proportion of papers reporting motion metrics from 30 percent to 85 percent within one year.

What the Field Gains by Tightening the Screw

Adopting stricter and more transparent motion correction would not eliminate all false positives, but it would substantially reduce the number of spurious findings that waste resources and mislead clinical research. A 2023 simulation study estimated that if all fMRI studies used a 0.5 mm threshold and reported motion metrics, the false discovery rate in the literature could drop from an estimated 30 percent to below 10 percent. That means fewer dead ends for researchers trying to replicate findings, and more reliable biomarkers for disorders like depression and schizophrenia.

Tighter motion standards also reduce the sample size needed to detect real effects. When motion is a source of noise, it inflates the variance of connectivity estimates, requiring larger samples to achieve adequate statistical power. By removing motion-contaminated data, researchers effectively increase the signal-to-noise ratio, meaning that a study with 50 participants using strict motion correction may have more power than a study with 100 participants using lenient correction. This could actually save money in the long run, because smaller samples are cheaper to scan.

Perhaps most importantly, restoring trust in the fMRI literature would benefit the entire neuroscience community. Public confidence in brain imaging has been shaken by high-profile failures to replicate, such as the 2016 study that found no evidence for the widely reported correlation between brain structure and political orientation. While motion artifacts are not the only cause of unreliability, they are one of the most easily fixable. A field that cannot agree on how to handle a basic confound will struggle to convince outsiders that its findings are robust.

But there is a trade-off. Stricter motion correction means discarding data, which can introduce its own biases. If the participants with the most motion are systematically excluded, the remaining sample may not be representative of the population of interest. This is especially problematic for clinical studies, where patients with severe symptoms may move the most. Researchers must balance the need for clean data with the need for generalizability. No single threshold is right for every study, but the choice should be transparent, justified, and tested.

Recommend Posts