Frontier datasets in artificial intelligence for medicine have a geography problem, and the field is starting to admit it openly. The models attracting the most attention in noninvasive blood-biomarker estimation, from hemoglobin to bilirubin to glucose-adjacent signals, are increasingly being trained and stress-tested on cohorts that look almost nothing like the populations used to validate the previous generation of optical medical devices. The center of gravity for this work is shifting toward East Africa, and Uganda in particular has become a reference point for what a more representative training set actually looks like. The reason is not sentiment. It is statistics, and it is the hard lesson the field already learned from pulse oximetry.
"Among patients who had an arterial oxygen saturation of 88 to 92 percent on the basis of pulse oximetry, Black patients had nearly three times the frequency of occult hypoxemia that was not detected by pulse oximetry as White patients.", Dr. Michael Sjoding, University of Michigan, New England Journal of Medicine (2020)
What Uganda AI blood biomarker trials prepublication data actually offers
The phrase "Uganda AI blood biomarker trials prepublication" has started circulating among academic labs and pharmaceutical R&D groups for a specific reason: the cohorts capture variation that United States and Western European datasets structurally cannot. A model that estimates a blood biomarker from a camera signal or a smartphone image is learning a mapping from optical features (skin reflectance, conjunctival color, nailbed perfusion) to a physiological quantity. If the training distribution is narrow on skin tone, age, nutritional status, and comorbidity, the learned mapping will be narrow too. Uganda cohorts widen all four axes at once, which is exactly what makes prepublication access valuable to groups trying to build something that holds up outside one hospital system.
Consider what a typical Ugandan recruitment site contributes that a typical North American academic medical center does not:
- A continuous range of melanin concentration at the high end of the Fitzpatrick scale, where optical devices historically fail.
- A high prevalence of nutritional anemia and iron deficiency, which shifts the biomarker distribution rather than clustering it near the reference range.
- Endemic comorbidities (malaria, sickle cell trait and disease, HIV) that alter blood optics and perfusion in ways absent from most Western training sets.
- A young median population age alongside maternal and pediatric cohorts, broadening the demographic envelope.
The pulse oximetry cautionary tale, in numbers
The reason this matters is not theoretical. It already happened, at scale, with one of the most trusted instruments in medicine. The table below contrasts the conditions that produced biased optical readings with the conditions a more representative trial design tries to correct.
| Factor | Legacy optical device development | Uganda-anchored AI biomarker trials |
|---|---|---|
| Skin tone range | Calibrated mostly on light-to-medium skin | Deep skin tones represented as a primary group, not an afterthought |
| Anemia / nutritional status | Mostly normal-range reference subjects | Wide range including moderate and severe anemia |
| Comorbidity profile | Largely excluded or underrepresented | Malaria, sickle cell, HIV present in cohort |
| Validation geography | High-income settings | Sub-Saharan field and clinical settings |
| Documented failure mode | Occult hypoxemia missed ~3x more in Black patients | Bias surfaced during training, not after deployment |
According to Dr. Michael Sjoding and colleagues at the University of Michigan (2020), pulse oximeters overestimated true arterial oxygen saturation in Black patients far more often than in White patients, producing dangerous "occult hypoxemia" that normal-looking readings concealed. The finding was not an isolated artifact. According to Dr. Ashraf Fawzy at Johns Hopkins University (2022), the same measurement bias was associated with delayed recognition of eligibility for COVID-19 therapy and unequal treatment among patients with darker skin. Earlier signals existed too: concerns about pulse oximeter accuracy in darkly pigmented skin date back to laboratory work in the 1990s, but the devices reached global ubiquity before the bias was treated as a design defect rather than a footnote.
The lesson the AI biomarker field has taken from this is blunt. A model validated only where bias is invisible will export that bias everywhere it is deployed.
Clinical applications driving the work
Noninvasive anemia and hemoglobin estimation
Anemia screening is the clearest near-term application, and it is where Ugandan research infrastructure is most mature. The Makerere University AI Health Lab, launched in May 2024, has anchored work on smartphone-based hemoglobin estimation from images of the conjunctiva and nailbeds. Clinical validation work on noninvasive, AI-augmented anemia screening has reported the ability to estimate hemoglobin from patient-sourced photos, with continued validation through 2024 and beyond. For a tool intended to function in resource-limited settings, the population it is built on and the population it serves finally coincide, which is the opposite of the legacy device pathway.
Maternal and pediatric monitoring
Anemia in pregnancy and severe childhood anemia (often malaria-driven) are leading causes of morbidity in the region. Models trained on these cohorts inherit a biomarker distribution that includes the clinically dangerous tail, not just the reference range. A model that has only ever seen near-normal values learns to predict near-normal values. A model that has seen severe anemia at meaningful frequency can resolve it.
Pharmaceutical and decentralized trial endpoints
For drug developers, noninvasive biomarker capture reduces the friction of repeated venous sampling and widens the candidate pool for decentralized trials. A model that generalizes across skin tone and comorbidity is a prerequisite for any sponsor that wants endpoints to behave consistently across multinational sites.
Current research and evidence
The structural case for diverse cohorts is now well documented in the methodology literature. A systematic review of deployed medical AI models found extreme geographic concentration, with roughly 95 percent of patient cohorts drawn from high-income or upper-middle-income countries, and racial representation dominated by White and Asian patients. That concentration is the modern equivalent of the pulse oximetry calibration gap.
The STANDING Together initiative, led by researchers including Dr. Xiaoxuan Liu and Professor Alastair Denniston at the University of Birmingham, published consensus recommendations in 2023 calling for explicit documentation of dataset composition and deliberate inclusion of underrepresented groups in health AI. The core argument is that diversity cannot be retrofitted; it has to be a design input. Work on generalization in clinical AI reinforces the point: models frequently post strong in-distribution accuracy and then lose that advantage in out-of-distribution deployment, which is precisely the scenario a US-only training set creates when a tool ships to Kampala, Lagos, or Mumbai.
Two practical conclusions follow from this body of evidence. First, external validation alone is a weak guarantee; several authors now argue for recurring local validation because healthcare data shifts over time and place. Second, the cheapest place to catch optical bias is during training, by including the populations where bias appears. Uganda cohorts make that catchment possible in a single, coherent dataset rather than through post-hoc correction.
The future of Uganda-anchored AI biomarker research
The trajectory points toward cohorts being treated as a competitive scientific asset rather than a compliance checkbox. Prepublication access is part of that shift: instead of waiting for a single capstone paper, partner labs increasingly want to interrogate the underlying distribution, test their own architectures against it, and report subgroup performance transparently. Expect three developments over the next few years. Multi-country African consortia will pool harmonized optical and biomarker data, raising statistical power on rare comorbidity combinations. Reporting standards will increasingly require skin-tone-stratified performance as a default table, not a supplementary figure. And regulators and funders will weight geographic and demographic breadth of validation when assessing whether a noninvasive biomarker claim is credible.
Prepublication data from the ongoing Uganda trials is available to qualified researchers and partners. Academic labs, pharmaceutical R&D groups, and global-health funders can request access by contacting [email protected].
Frequently asked questions
Why do Uganda cohorts make AI blood-biomarker models more generalizable?
They widen the training distribution across four axes simultaneously: skin tone, age, nutritional status, and comorbidity. Optical biomarker models learn a mapping from light signals to physiology, so a narrow training population produces a narrow, brittle model. Uganda cohorts include deep skin tones, a high prevalence of anemia, and endemic comorbidities, which forces the model to learn a mapping that holds across conditions a US-only dataset rarely contains.
How does pulse oximetry bias relate to AI biomarker research?
Pulse oximetry is the cautionary precedent. According to Dr. Michael Sjoding at the University of Michigan (2020), oximeters missed dangerously low oxygen levels far more often in Black patients because the devices were calibrated on lighter skin. AI biomarker developers treat this as proof that optical bias must be addressed in training data, not patched after deployment.
What is prepublication data and who can access it?
Prepublication data is dataset access granted to research partners before the primary results are formally published, allowing labs to test their own models and report subgroup performance early. Access to the ongoing Uganda trial data is available to academic labs, pharmaceutical R&D teams, and global-health funders through [email protected].
Does a diverse training set guarantee a fair model?
No. Diversity is necessary but not sufficient. The methodology literature, including the STANDING Together recommendations (2023), stresses transparent documentation of dataset composition, skin-tone-stratified performance reporting, and recurring local validation. Diverse data lowers the risk of hidden bias; it does not eliminate the need to measure and report subgroup performance.