A 2026 TADPOLE modeling study reported that a sequential neural process with normalizing flows predicted future Alzheimer diagnostic stage with mAUC 0.965 ± 0.006, ahead of the authors’ earlier sequential-neural-process model at 0.937 ± 0.014.1 The result is a strong benchmark signal for uncertainty-aware disease-progression AI, but it is still retrospective modeling evidence rather than proof of clinical deployment readiness.
Research Highlights
- 0.965 mAUC benchmark: the SNP-NF model reached mAUC 0.965 ± 0.006, recall 0.929 ± 0.006, and precision 0.929 ± 0.007 for 3-way future diagnostic-stage prediction.1
- 1,737-person TADPOLE base: researchers modeled 12,741 visits across 22 time points using cognitive tests, MRI-derived brain volumes, demographic features, and longitudinal diagnosis labels.1
- Normalizing flows added signal: the prior sequential neural process reached mAUC 0.937, while adding normalizing flows increased the reported benchmark to 0.965.1
- Architecture mattered more than any single module: normalizing flow alone reached only mAUC 0.783, while the full neural-process, transformer, and flow model reached 0.965.1
- Deployment remains unproven: the 80%/20% retrospective split and 4-visit input window mean the model is not yet a single-visit screening tool or a treatment-decision system.1
mAUC means mean area under the receiver-operating curve: a multi-class ranking metric that asks how well a model separates diagnostic classes across all pairwise comparisons. Normalizing flows are machine-learning transformations that turn a simple probability distribution into a more flexible one, allowing a model to represent irregular, non-Gaussian uncertainty instead of forcing patient trajectories into a simpler shape.
The useful clinical read is narrow. SNP-NF may be a better way to model heterogeneous Alzheimer trajectories in research datasets, but higher retrospective discrimination does not automatically make it a memory-clinic triage test.
TADPOLE Gave the Model Longitudinal Alzheimer Trajectories, Not One-Time Screens
Al-anbari et al. used the TADPOLE dataset, which aggregates Alzheimer Disease Neuroimaging Initiative (ADNI) data from ADNI 1, ADNI GO, and ADNI 2.1 The analytic resource contained 1,737 participants, 975 males and 780 females, with 12,741 visits collected across 22 time points from 2003 to 2017.
The model predicted the next diagnostic label from a rolling history of prior visits. Diagnostic classes were cognitively normal, mild cognitive impairment (MCI), and probable Alzheimer’s disease. That setup is different from asking whether a single blood test or clinic visit can diagnose Alzheimer’s pathology today.
- Input window: 4 visits per participant sequence.
- Training split: 80% training data and 20% test data.
- Feature set: 13 longitudinal variables, including CDR-SB, ADAS11, ADAS13, MMSE, RAVLT, FAQ, ventricles, hippocampus, whole brain, entorhinal cortex, and midtemporal lobe.
- Excluded participants: anyone with fewer than 4 visits, which removes the exact low-history case many clinics care about.
CDR-SB is the Clinical Dementia Rating Sum of Boxes, a clinician-rated measure of dementia severity. ADAS is the Alzheimer’s Disease Assessment Scale, a cognitive test battery used in Alzheimer trials, while MMSE is the Mini-Mental State Examination, a short cognitive screen. Combining these measures with MRI-derived brain volumes gives the model both clinical trajectory and neurodegeneration context.
SNP-NF Beat Recurrent and Geometric Baselines on mAUC
The benchmark table compared the proposed model against recurrent, manifold, geometric, and probabilistic architectures. LSTM-M reached mAUC 0.758, PLSTM-Z reached 0.842, MinimalRNN reached 0.871, DeepRNN reached 0.878, and deep geometric learning reached 0.881.1
The prior sequential neural process, which already combined neural processes with sequence modeling, reached mAUC 0.937. SNP-NF raised that to 0.965, with recall and precision both reported around 0.929.1

Adjacent work explains why that comparison is plausible. Ghazi et al. previously used recurrent neural networks designed to tolerate incomplete Alzheimer time-series data.2 Nguyen et al. later tested deep recurrent neural networks for Alzheimer progression prediction.3 Jeong et al. added deep geometric learning with monotonicity constraints, trying to build disease-order assumptions into the model rather than treating visits as ordinary tabular rows.4
Al-Anbari et al.’s 2025 predecessor moved the question into neural-process territory: a model can learn a distribution over possible patient trajectories rather than only outputting a single deterministic class.5 The 2026 paper then added normalizing flows to make that latent distribution more flexible.
Normalizing Flows Improved the Latent-Space Assumption
Neural processes combine neural networks with probabilistic prediction. In plain terms, the model sees a small context set from a patient trajectory, learns a distribution over possible functions, and predicts future observations with uncertainty attached.
The limitation is that many neural-process models use relatively simple Gaussian latent variables. A Gaussian assumption can be too rigid for Alzheimer progression, where 2 patients with similar baseline cognitive scores can move through different combinations of hippocampal atrophy, memory decline, biomarker change, and diagnostic conversion.
Normalizing flows address that rigidity by applying invertible transformations to the latent distribution. Instead of assuming one smooth bell-shaped uncertainty cloud, the model can represent more complex, multimodal uncertainty. In this paper, that design was supposed to help the decoder separate cognitively normal, MCI, and Alzheimer’s disease trajectories when the input history was noisy and nonlinear.1
The ablation table supports the architecture-level interpretation:
- Neural process without transformer: mAUC 0.891, recall 0.719, precision 0.782.
- TNP latent only: mAUC 0.906, recall 0.903, precision 0.909.
- Normalizing flow only: mAUC 0.783, recall 0.639, precision 0.644.
- NP plus NF without transformer: mAUC 0.913, recall 0.870, precision 0.872.
- Full SNP-NF: mAUC 0.965, recall 0.929, precision 0.929.
The flow component was not magic by itself. It helped when it was embedded inside a sequence-aware neural-process architecture that could use temporal context.
Speed and Memory Looked Practical for Research Workstations
Al-anbari et al. also reported computational-cost checks. The baseline architecture used 128-dimensional embeddings, 2 encoder layers, and 5 flow steps, with about 2.88 million trainable parameters and an 11.6 MB checkpoint.1
The paper’s flow-step ablation showed the expected trade-off. More flow steps increased model expressiveness but also raised latency. A 4-step configuration used 2.5 million parameters, 6.99 ± 0.02 ms GPU latency, and 0.43 GB GPU memory. An 8-step configuration used 3.3 million parameters, 9.00 ± 0.02 ms latency, and 0.60 GB memory.1
Engineering implication: the reported model is not obviously too heavy for research deployment. The harder question is not whether a GPU can run it; it is whether the predictions stay calibrated across hospitals, scanners, missing-data patterns, diagnostic coding habits, and patient mixes.
Calibration Limits Matter Beyond Accuracy
Prediction models for Alzheimer’s disease often look strongest inside curated research datasets. Moradi et al. showed MRI-based MCI-to-Alzheimer conversion prediction could be useful, but that work also depended on feature selection and cohort structure.6 Li et al. trained a hippocampal MRI deep-learning model for early Alzheimer dementia prediction, again showing that strong imaging signal does not remove the need for validation in the intended population.7
Multi-cohort work has pushed the same concern forward. Pan et al. used federated learning across real-world data to predict progression from MCI to Alzheimer’s disease, which is closer to the portability problem faced by any clinical AI system.8
The Al-anbari model excluded people with fewer than 4 visits and evaluated an 80%/20% retrospective split. That is a legitimate benchmark design, but it does not test whether a clinician can use the model for a new patient with 1 visit, a missing MRI variable, a different scanner protocol, or a diagnostic label distribution unlike ADNI.
Supported inference: normalizing-flow-enhanced neural processes may improve Alzheimer progression prediction in TADPOLE-style longitudinal data, especially compared with recurrent and geometric baselines.
Unsupported inference: the model can diagnose Alzheimer’s disease, replace amyloid/tau testing, select disease-modifying therapy, or generalize to primary care without prospective external validation.
Best use now: treat SNP-NF as a strong research benchmark for longitudinal modeling, not as a replacement for biomarker confirmation or clinical prognosis in sparse real-world records.
Questions About Alzheimer Progression AI Models
Does mAUC 0.965 mean the model is 96.5% accurate?
No. mAUC is a ranking/discrimination metric averaged across class comparisons. It is not the same as simple accuracy, and it does not tell the full calibration story.
For clinical use, a model also needs sensitivity, specificity, predictive values, calibration, missing-data behavior, and decision-curve evidence in the population where it will be used.
Why use a 4-visit window?
Alzheimer progression is longitudinal. A 4-visit window lets the model see direction over time: cognitive decline, MRI volume change, and shifting diagnostic labels.
The trade-off is obvious: requiring 4 visits makes the model less applicable to first-contact screening or short-record patients.
What did normalizing flows add?
They made the latent uncertainty distribution more flexible. In the ablation study, the full SNP-NF model outperformed the sequential neural process without flows and the normalizing-flow-only model.
What would make this clinically stronger?
Prospective validation across multiple memory clinics, explicit calibration reporting, missing-data stress tests, scanner/site subgroup checks, and comparison with realistic clinical baselines such as neuropsychological assessment plus amyloid/tau biomarkers.
References
- Normalizing flow based neural processes for Alzheimer’s disease progression prediction. Al-anbari E, Karshenas H, Shoushtarian B. PLOS One. 2026;21(4):e0345958. doi:10.1371/journal.pone.0345958
- Training recurrent neural networks robust to incomplete data: Application to Alzheimer’s disease progression modeling. Mehdipour Ghazi M et al. Medical Image Analysis. 2019;53:39-46. doi:10.1016/j.media.2019.01.004
- Predicting Alzheimer’s disease progression using deep recurrent neural networks. Nguyen M et al. NeuroImage. 2020;222:117203. doi:10.1016/j.neuroimage.2020.117203
- Deep Geometric Learning With Monotonicity Constraints for Alzheimer’s Disease Progression. Jeong S et al. IEEE Transactions on Neural Networks and Learning Systems. 2025;36:7090-7102. doi:10.1109/tnnls.2024.3394598
- A Novel Approach to the Prediction of Alzheimer’s Disease Progression by Leveraging Neural Processes and a Transformer Encoder Model. Al-Anbari E, Karshenas H, Shoushtarian B. IEEE Access. 2025;13:44607-44619. doi:10.1109/access.2025.3548173
- Machine learning framework for early MRI-based Alzheimer’s conversion prediction in MCI subjects. Moradi E et al. NeuroImage. 2015;104:398-412. doi:10.1016/j.neuroimage.2014.10.002
- A deep learning model for early prediction of Alzheimer’s disease dementia based on hippocampal magnetic resonance imaging data. Li H et al. Alzheimer’s & Dementia. 2019;15:1059-1070. doi:10.1016/j.jalz.2019.02.007
- Federated learning with multi-cohort real-world data for predicting the progression from mild cognitive impairment to Alzheimer’s disease. Pan J et al. Alzheimer’s & Dementia. 2025;21:e70128. doi:10.1002/alz.70128