In December 2020, a letter to the editor in the New England Journal of Medicine by Sjoding and colleagues described a finding that should have been obvious — but apparently was not. Standard pulse oximeters, the ubiquitous clip-on devices that read blood oxygen saturation from a fingertip, systematically overestimate oxygen levels in patients with darker skin pigmentation. The clinical consequence was that Black patients with dangerous levels of hypoxemia were receiving normal pulse oximetry readings, delaying treatment. The mechanism was not new; it had been documented in literature dating to the 1990s. What Sjoding et al. demonstrated, using 10,789 paired measurements from ICU patients, was that the problem was substantial and clinically consequential at population scale — and that the downstream effects on AI clinical decision-support systems were likely to compound the harm.
What the Study Found
Sjoding et al. analyzed paired arterial blood gas (ABG) measurements and pulse oximetry readings from 10,789 patients at two academic medical centers. An arterial blood gas provides a direct measurement of oxygen saturation (SaO2); pulse oximetry estimates it indirectly via light transmission (SpO2).
The study defined “occult hypoxemia” as a true SaO2 below 88% (measured by ABG) in a patient with a seemingly reassuring SpO2 between 92% and 96% on pulse oximetry. The key finding:
- Black patients had nearly three times the frequency of occult hypoxemia compared to white patients (11.7% vs. 3.6%)
- Asian patients also had elevated occult hypoxemia rates (6.7%)
- Hispanic patients showed intermediate rates (5.5%)
The mechanism is straightforward. Pulse oximeters work by shining red (660nm) and near-infrared (940nm) light through tissue and measuring differential absorption by oxygenated and deoxygenated hemoglobin. Melanin, the pigment that determines skin color, absorbs red-spectrum light — creating interference that makes oxygenated hemoglobin appear more abundant than it is. This is a hardware and calibration problem: pulse oximeters were historically validated primarily on lighter-skinned populations, and the calibration tables embedded in device firmware reflect this bias.
The Cascade Into AI Clinical Decision-Support
The implications for AI systems trained on clinical data from hospital EHRs are direct and serious. Multiple ICU clinical decision-support tools — including sepsis prediction models, ventilator weaning algorithms, and deterioration scoring systems — use SpO2 as an input feature. If SpO2 values are systematically higher for Black patients with equivalent true oxygen levels, then:
- Sepsis risk scores that incorporate SpO2 will underestimate risk in Black patients, potentially delaying sepsis alerts
- Ventilator weaning protocols that use SpO2 thresholds will be less likely to flag failure in Black patients with actual borderline oxygenation
- Any outcome prediction model trained on historical data — where Black patients were systematically undertreated due to biased SpO2 readings — will encode this bias into its predicted outcome distributions
A subsequent analysis by Wong et al. in The Lancet (2021) specifically demonstrated that AI sepsis models trained on EHR data showed performance disparities by race, with lower sensitivity for Black patients — a finding directly consistent with the SpO2 bias hypothesis.
What Hospitals Are Doing — and What the Evidence Says
The clinical response has been gradual. Several academic medical centers updated their clinical protocols to recommend confirmation by ABG when clinical concern for hypoxemia exists in patients with darker skin tones, despite reassuring pulse oximetry. The FDA issued a safety communication in 2021 noting the limitations of pulse oximeters in patients with darker skin pigmentation.
Device manufacturers have begun developing next-generation pulse oximeters using additional wavelengths to reduce melanin interference, but as of 2025, no widely deployed clinical-grade device had fully resolved the disparity. Point-of-care testing validation studies for updated devices showed reduced but not eliminated bias in darker-skinned populations.
Systemic Implications for Clinical AI Development
The pulse oximeter example illustrates a general principle that applies across clinical AI: when training data is generated by flawed measurement instruments or in care systems with existing disparities, those flaws are incorporated into the model. Post-hoc fairness corrections — mathematical adjustments to model outputs — can partially address performance disparities but cannot fully compensate for biased training data.
The structural solution requires diversifying device validation populations at the FDA regulatory stage, mandating demographic disaggregation of clinical AI validation studies, and developing standards for what constitutes adequate representation in training datasets for clinical AI tools.
Key Takeaway
The pulse oximetry bias documented by Sjoding et al. is not only a hardware problem — it is a clinical AI problem. Any AI system trained on EHR data that uses SpO2 values as a feature may have encoded racial bias through a mechanism that has nothing to do with the AI itself. Resolving this requires hardware correction, regulatory mandates for diverse device validation, and demographic disaggregation as a standard reporting requirement for clinical AI validation studies.
Sources
1. Sjoding MW, Dickson RP, Iwashyna TJ, et al. Racial Bias in Pulse Oximetry Measurement. N Engl J Med. 2020;383(25):2477–2478. doi:10.1056/NEJMc2029240
2. Wong A, Otles E, Donnelly JP, et al. External Validation of a Widely Implemented Proprietary Sepsis Prediction Model in Hospitalized Patients. JAMA Intern Med. 2021;181(8):1065–1070.
3. FDA. Pulse Oximeter Accuracy and Limitations: FDA Safety Communication. February 19, 2021.
Medical Disclaimer: This article is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional for medical decisions.