Why Clinical AI Still Needs Human Review: Lessons from Doximity PeerCheck

Fluency is not safety. An AI answer can sound confident, read smoothly, cite plausible-looking references, and still be subtly wrong — incomplete, overconfident, outdated, or misaligned with current guidance. Doximity's PeerCheck initiative is a visible acknowledgement of this reality from one of the largest clinical AI platforms in the world.

Doximity's own support material warns that AI outputs can contain inaccuracies and that clinicians should verify outputs. This is not a weakness in the product — it is an honest assessment of the current state of medical AI. A Stanford-Harvard study found that AI can cause clinical harm in up to 22% of real patient cases. The question is not whether clinical AI makes errors. It is what trust architecture surrounds those errors.

What PeerCheck Adds

PeerCheck recruits domain experts to review AI-generated clinical answers for accuracy, evidence strength, and potential bias. Over 10,000 physicians have participated. Reviews are attributed — clinicians can see which physician reviewed the answer. The initiative is co-chaired by Eric Topol and Regina Benjamin, lending significant credibility.

This adds a human verification layer that no other major clinical AI platform currently offers at this scale. The physician reviewer can detect clinical nuances that automated systems miss: practice-pattern considerations, evidence-hierarchy issues, emerging research not yet reflected in training data, and the subtle difference between a textbook answer and a practically useful one.

Why Human Review Cannot Be the Only Safety Layer

Physician review is valuable but necessarily selective. Not every answer can be reviewed by every relevant specialist. The reviewed content creates a verified core — but the vast majority of real-time clinical queries will be answered by the AI alone, without individual physician review.

This means clinical AI also needs: retrieval quality (drawing from authoritative sources, not general internet content), source alignment (matching the answer to the clinician's jurisdiction and practice context), version control (ensuring the evidence reflects current guidelines, not superseded recommendations), feedback mechanisms (allowing clinicians to report errors and drive quality improvement), monitoring (tracking output quality over time, not just at the point of review), and fail-safe behaviour (narrowing, abstracting, or declining to answer when the available evidence is insufficient).

The iatroX Model: Algorithmic Fidelity Plus Professional Judgement

iatroX is designed around the principle that clinical AI should support professional judgement, not replace it. Its trust architecture combines curated retrieval from UK clinical sources, source prioritisation, citation-aware synthesis, conflict detection, review logic, and abstention or escalation where the available evidence is insufficient, conflicting, or poorly matched to the question.

The fail-safe principle is important: in clinical AI, a fail-safe is not an admission of weakness. It is a safety feature. Where the system cannot retrieve sufficiently relevant, current, or internally consistent source material, the safer behaviour is to narrow the answer, show uncertainty, surface the source trail, or decline to provide a definitive clinical conclusion. Inventing certainty from insufficient evidence is the more dangerous alternative.

UK Governance Context

The MHRA recognises that software, including AI, may be regulated as a medical device in the UK depending on its intended use. NHS England's guidance on ambient scribing products emphasises implementation considerations including governance, safety, and user review of outputs. The regulatory expectation is clear: clinical AI tools operating in the UK should have safety infrastructure proportionate to their intended use.

iatroX is UKCA-marked and MHRA-registered. Its clinical AI standards are publicly described. The intended use is professional clinical information support — with the clinician retaining responsibility for patient-specific decisions.

The Lesson

Doximity is right that clinical AI needs visible human oversight. PeerCheck makes physician review a product feature, not just a back-end process. The lesson for the UK market — and for every clinical AI tool globally — is that trust requires multiple layers: human review where possible, source fidelity always, fail-safe behaviour when evidence is insufficient, and feedback loops that turn real-world use into continuous improvement.

Ask iatroX is designed for clinical professionals who want fast answers without losing the ability to inspect the underlying source →