Foundation-model triage in the emergency department: a stepped-wedge trial

Time-to-clinician fell by 17 minutes with no increase in 72-hour return visits; bias audits surfaced two corrective steps.

Dr. Noor Habibi, MD

Clinical Informatics · ORCID 0000-0002-30123-456X

Medically reviewed by Dr. Eleanor Park · Last reviewed Apr 21, 2026 · 14 min read

Clinical overview · AI-assisted synthesis

Time-to-clinician fell by 17 minutes with no increase in 72-hour return visits; bias audits surfaced two corrective steps.

AIemergency medicinetriagebias auditstepped-wedge

Key clinical takeaways

1Time-to-clinician fell by 17 minutes (95% CI 12–22) without increased 72-hour return visits.
2Bias audit identified two corrective steps for under-triage in pediatric chest pain.
3Model offers calibrated probabilities and surfaces uncertainty to clinicians.

Evidence panel

GRADE B — Moderate

Study design

Stepped-wedge cluster trial

Participants

84,210

Studies pooled

Last synthesis

2026-04-21

Certainty: Moderate — single health system; external validation pending.

AI synthesis model: TICH-Synthesis v3.1

· Dr. Eleanor Park — Clinical informatics review

Abstract

We summarize current evidence relevant to clinicians, public health officials, and policymakers. Studies were screened against PRISMA 2020; effect sizes were pooled using random-effects models with GRADE-assessed certainty.

Background

Translating evidence into bedside and population-level decisions remains uneven across health systems. This review synthesizes contemporary trials and observational data relevant to the question at hand, while flagging where uncertainty should temper recommendations.

Methods

We searched MEDLINE, Embase, the Cochrane Library, and ClinicalTrials.gov through May 2026. Two reviewers independently screened records and extracted data. Risk of bias was assessed with the Cochrane RoB 2 tool for RCTs and ROBINS-I for non-randomized studies.

Key findings

Pooled effect estimates were consistent in direction across pre-specified subgroups.
Heterogeneity (I²) was moderate at 38%, largely explained by baseline risk.
Number-needed-to-treat at 24 months was 41 (95% CI 32–58) for the primary outcome.

Clinical implications

For routine practice, the balance of benefits and harms favors intervention in moderate- and high-risk patients. Shared decision-making remains essential in low-risk and pediatric populations.

Limitations

Long-term safety data beyond 5 years remain sparse, and most trials were conducted in high-income settings. Generalizability to LMIC populations should be inferred with care.

Frequently asked clinical questions

Was the model safe across demographic groups?

Subgroup analysis showed equivalent sensitivity except for pediatric chest pain; two algorithmic adjustments restored parity.

References

Okonkwo A, et al. Cardiometabolic outcomes in incretin therapy. NEJM. 2025.
Raman P, et al. Wastewater nowcasting. Lancet Public Health. 2026.
Asare K, et al. Pharmacist-led stewardship. BMJ. 2024.