A 2026 ADNI study reported 85.23% accuracy for ANA-GNN, a graph neural network that combined structural MRI regional features with clinical variables to classify cognitively normal controls, mild cognitive impairment, and Alzheimer’s disease.1 The result is useful, but the clinical-feature ablation dropped accuracy to 68.35%, so the model should be read as multimodal decision-support research, not as a standalone MRI test.
Research Highlights
- 85.23% ADNI accuracy: ANA-GNN reached 85.23% accuracy and 85.44% F1 across 707 participants in subject-level 5-fold cross-validation.1
- MCI remained the hard class: mild cognitive impairment had 83.17% F1 across 378 participants, below the 87.57% F1 reported for the 109-person Alzheimer’s disease group.1
- Clinical variables carried major signal: removing clinical features dropped accuracy from 85.23% to 68.35%, making the result multimodal rather than MRI-only.1
- Graph learning beat familiar baselines: BrainGNN reached 84.15% accuracy, Graph Transformer reached 83.88%, and Gradient Boosting reached 81.65%.1
- External validation is the missing gate: the model was tested inside ADNI, but not yet across independent cohorts such as AIBL, OASIS, community clinics, or scanner-diverse hospital data.1,3
ANA-GNN means adaptive neighborhood aggregation graph neural network. In plain terms, the model treats the brain as a network of regions, lets disease-relevant regions gather information from a wider learned neighborhood, and then fuses that graph representation with non-imaging clinical data.
The central calibration is not whether artificial intelligence can find Alzheimer’s disease patterns in ADNI. Many models can do that under research conditions. The harder question is whether a model adds clinically transportable signal beyond cognition scores, age, APOE genotype, and other variables that already separate groups inside ADNI.
707 ADNI Participants Were Split Across CN, MCI, and AD
The Sheng et al. study used structural MRI data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI), a major research dataset for dementia biomarkers. The analyzed cohort included 707 participants: 220 cognitively normal controls, 378 people with mild cognitive impairment (MCI), and 109 people with Alzheimer’s disease.1
Mild cognitive impairment is an intermediate clinical state: cognition is measurably worse than expected, but daily function is not impaired enough for dementia. That makes MCI the diagnostic class that matters most for “early diagnosis,” and also the class most likely to blur with normal aging or early Alzheimer’s disease in a cross-sectional model.
The study converted each T1-weighted structural MRI scan into mean gray-matter values across 166 Automated Anatomical Labeling 3 (AAL3) regions. Gray matter contains many neuronal cell bodies and local processing circuits; lower regional gray-matter volume can reflect atrophy, but it is not disease-specific by itself.
- Cognitively normal group: 220 people, mean age 73.0 years, Mini-Mental State Examination (MMSE) 29.1.
- MCI group: 378 people, mean age 71.3 years, MMSE 28.1.
- Alzheimer’s disease group: 109 people, mean age 74.3 years, MMSE 23.1.
Evidence-strength note: this was an internal machine-learning validation study in a benchmark research cohort. It can show that ANA-GNN separated ADNI diagnostic labels better than several comparator models under the same cross-validation scheme. It cannot show that the model is ready for community screening, routine memory-clinic diagnosis, or scanner-diverse hospital deployment.
ANA-GNN Used Adaptive Brain-Region Neighborhoods
Graph neural networks are machine-learning models built for data that naturally form networks rather than flat tables. In this case, nodes represented brain regions, and edges represented similarity-based relationships among regional MRI-derived features.
Standard graph models often use fixed neighborhoods: each region collects information from the same number or pattern of nearby nodes. ANA-GNN changed that rule. The model assigned learned importance scores to nodes, then let higher-importance regions use broader neighborhoods while less informative regions aggregated more locally.1
The biological idea is plausible because Alzheimer’s disease does not damage the brain uniformly. Hippocampal, amygdala, and posterior cingulate changes are more informative than many other regions for memory and dementia classification. Importance-weighted pooling then used those learned scores to give more influence to regions judged more disease-relevant.
Gated multimodal fusion was the other key piece. The model did not rely only on MRI. It also incorporated age, sex, education, APOE genotype, cognitive scores, and related clinical variables, then learned how much graph-derived imaging signal vs. clinical signal to use for each sample.1
That design is sensible for Alzheimer’s disease because clinical diagnosis is already multimodal. The risk is interpretation: if readers call the result an “MRI AI diagnosis” finding, they miss the fact that the model’s own ablation says clinical variables were a major driver of performance.
85.44% F1 Beat BrainGNN, Graph Transformer, and Gradient Boosting
ANA-GNN outperformed every comparator reported in the paper. Overall accuracy was 85.23% with precision 85.74%, recall 85.22%, and F1-score 85.44%. BrainGNN, the closest deep-learning comparator, reached 84.15% accuracy and 84.22% F1. Graph Transformer reached 83.88% accuracy and 83.95% F1. The strongest traditional model, Gradient Boosting, reached 81.65% accuracy.1
F1-score combines precision and recall into one number, which is useful when classes are imbalanced. The ADNI split was imbalanced: the MCI class had 378 people, while the Alzheimer’s disease class had 109. Sheng et al. used class-weighted loss so errors in smaller classes counted more during training.
The per-class results help prevent a false-green read:
- Cognitively normal: 88.29% F1 across 220 participants.
- Mild cognitive impairment: 83.17% F1 across 378 participants.
- Alzheimer’s disease: 87.57% F1 across 109 participants.
MCI was still the weakest class by F1. That does not invalidate the model, but it narrows the claim. The reported system was best framed as a research-cohort classifier that improved benchmark performance, not a tool that solved prodromal Alzheimer’s diagnosis.
The 68.35% No-Clinical-Features Result Is the Main Calibration Point
Ablation studies remove parts of a model to see how much each component contributes. In Sheng et al., the full model achieved 85.23% accuracy, but removing clinical features dropped accuracy to 68.35%. That was a much larger loss than removing gated fusion, attention, importance pooling, or residual connections.1
Interpretation: clinical and cognitive variables were not a decorative add-on. They carried a large share of the class-separation signal. For an Alzheimer’s model, that is both expected and clinically awkward: MMSE and related clinical variables are already close to the diagnostic label.
The other ablations still supported the graph architecture:
- Without gated fusion: accuracy fell to 83.15%.
- Without the attention mechanism: accuracy fell to 81.92%.
- Without importance-weighted pooling: accuracy fell to 80.65%.
- Without residual connections: accuracy fell to 79.40%.
Those drops suggest that adaptive graph learning added signal above a minimal architecture, but the largest practical warning comes from the clinical-feature dependency.
For readers, this changes the deployment question. ANA-GNN may be useful as a decision-support layer after cognitive testing and MRI are already available. It is not evidence that a standalone MRI pipeline can replace clinical assessment.
Why ADNI Accuracy Often Overstates Real-World Readiness
ADNI is valuable because it standardizes imaging, clinical phenotyping, and biomarker collection. That same strength can make diagnostic models look cleaner than they will look in routine practice. Memory clinics and community screening programs include scanner differences, referral bias, mixed dementias, medication effects, psychiatric comorbidity, vascular disease, and incomplete clinical records.
Wen et al. reviewed convolutional neural network studies for Alzheimer’s disease classification and emphasized reproducible evaluation problems, including data leakage and inconsistent validation practices.3 Sheng et al. used subject-level 5-fold cross-validation to reduce leakage risk, which is a necessary step. It is still internal validation, not proof of external transportability.
Earlier MRI deep-learning work by Basaia et al. showed that single-scan deep networks could classify Alzheimer’s disease and MCI from structural MRI, while graph-learning work such as BrainGNN and multimodal graph models pushed the field toward interpretable network representations and multimodal fusion.2,4,5 ANA-GNN fits that progression: stronger graph architecture, plausible brain-region importance, and better internal benchmark performance.
Clinical-readiness gate: the next useful test is not another small ADNI split. The model needs frozen-code external validation in cohorts such as AIBL or OASIS, scanner-diverse hospital data, and preferably longitudinal prediction of conversion from MCI to dementia.
Questions About ANA-GNN and Alzheimer’s AI Diagnosis
Does ANA-GNN diagnose Alzheimer’s disease from MRI alone?
No. The Sheng et al. model used structural MRI features plus clinical variables including demographics, APOE genotype, and cognitive scores. The no-clinical-features ablation fell to 68.35% accuracy, so the published headline performance should not be described as MRI-only diagnosis.
What does “adaptive neighborhood” mean in this model?
Each brain region was represented as a graph node. Instead of forcing every region to collect information from the same fixed neighborhood, ANA-GNN learned importance scores and let more disease-relevant regions aggregate information from broader neighborhoods.
Why is MCI classification so important?
MCI is the stage where early Alzheimer’s disease may be present but dementia is not yet established. Better MCI separation could help triage patients for biomarker testing, monitoring, or treatment trials, but only if the model generalizes outside ADNI.
What would make this clinically useful?
External validation is the first requirement. A clinically useful version would need stable performance across scanners, clinics, age ranges, comorbidities, and disease mimics, plus a clear role in a workflow: triage, risk stratification, conversion prediction, or treatment-selection support.
References
- Sheng J, Zhong H, Zhang Q, Zhang R, Gong Z, Lin J, Chen Z. Adaptive neighborhood aggregation graph neural network for early diagnosis of Alzheimer’s disease. Scientific Reports. 2026. https://doi.org/10.1038/s41598-026-50351-2
- Basaia S, Agosta F, Wagner L, Canu E, Magnani G, Santangelo R, et al. Automated classification of Alzheimer’s disease and mild cognitive impairment using a single MRI and deep neural networks. NeuroImage: Clinical. 2019;21:101645. https://doi.org/10.1016/j.nicl.2018.101645
- Wen J, Thibeau-Sutre E, Diaz-Melo M, Samper-Gonzalez J, Routier A, Bottani S, et al. Convolutional neural networks for classification of Alzheimer’s disease: overview and reproducible evaluation. Medical Image Analysis. 2020;63:101694. https://doi.org/10.1016/j.media.2020.101694
- Li X, Zhou Y, Dvornek N, Zhang M, Zhuang J, Ventola P, et al. BrainGNN: interpretable brain graph neural network for fMRI analysis. Medical Image Analysis. 2021;74:102233. https://doi.org/10.1016/j.media.2021.102233
- Zhang Y, He X, Chan YH, Teng Q, Rajapakse JC. Multi-modal graph neural network for early diagnosis of Alzheimer’s disease from sMRI and PET scans. Computers in Biology and Medicine. 2023;164:107328. https://doi.org/10.1016/j.compbiomed.2023.107328
