U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Newman-Toker DE, Peterson SM, Badihian S, et al. Diagnostic Errors in the Emergency Department: A Systematic Review [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2022 Dec. (Comparative Effectiveness Review, No. 258.)

Cover of Diagnostic Errors in the Emergency Department: A Systematic Review

Diagnostic Errors in the Emergency Department: A Systematic Review [Internet].

Show details

Results

Description of Included Evidence

We retrieved 19,127 unique citations from our searches (Appendix B). After screening all abstracts and 1,455 full-text studies, we included 279 studies. Appendix C provides a list of the articles excluded during the full-text screening. Details of the included articles, including our assessment of their risk of bias, are provided in the Evidence Tables in Appendix D. Our strength of evidence tables, which present details regarding our confidence in the estimates of the rate of diagnostic errors, can be found in Appendix D, Tables D-6 through D-8. The results of our grey literature search are presented in Appendix Table B-1.

Of the included studies, 41 listed the distribution of diseases associated with diagnostic error or misdiagnosis-related harms, 160 reported the rate of diagnostic error, and 185 examined the causes of diagnostic error. Five studies were randomized controlled trials5458 and the remainder were observational studies. Studies were conducted in the United States (n=137), Western Europe (n=85), Canada (n=28), the United Kingdom (n=14), Australia (n=12), or in multiple countries (n=3). Sixty-four studies were published before 2011 and 215 were published in 2011 or later. The majority of studies identified were disease-specific. Thus, relatively fewer studies were available to answer Key Question (KQ) 1, and answers to KQ2 and KQ3 included aggregation of literature on a per-disease basis.

Key Question 1. Distribution of Diseases

Key Points

  • The top 20 diseases associated with diagnostic errors in the emergency department (ED) (independent of harm severity), in approximate rank order, were fracture, stroke, myocardial infarction, appendicitis, venous thromboembolism, spinal cord compression and injury, aortic aneurysm and dissection, meningitis and encephalitis, sepsis, traumatic brain injury and traumatic intracranial hemorrhage, arterial thromboembolism, lung cancer, ectopic pregnancy and ovarian torsion, pneumonia, testicular torsion, gastrointestinal perforation and rupture, spinal and intracranial abscess, open and non-healing wounds, cardiac arrhythmia, and intestinal obstruction (with or without hernia). It is likely that this list of misdiagnosed diseases is strongly skewed by reporting bias towards diseases that, when missed, lead to more serious harms and false negatives (as opposed to false positives).
  • Missed fractures are the ED diagnostic errors most commonly reported in malpractice claims and incident reports, but the level of harm associated with most missed fractures is lower than that for missed major medical and neurologic events. Detection bias for radiographic misdiagnosis (which is more easily confirmed than other types of diagnostic error) likely contributes to their high frequency in claims and incident reports, but, even if overrepresented, they are still likely quite common given the high incidence of fractures.
  • The top 15 diseases associated with the greatest number of serious misdiagnosis-related harms in the ED, in rank order, were (1) stroke, (2) myocardial infarction, (3) aortic aneurysm and dissection, (4) spinal cord compression and injury, (5) venous thromboembolism, (6/7 – tie) meningitis and encephalitis, (6/7 – tie) sepsis, (8) lung cancer, (9) traumatic brain injury and traumatic intracranial hemorrhage, (10) arterial thromboembolism, (11) spinal and intracranial abscess, (12) cardiac arrhythmia, (13) pneumonia, (14) gastrointestinal perforation and rupture, and (15) intestinal obstruction. The top 3 conditions account for an estimated 28 percent of all serious harms from ED diagnostic error, while the top 8 account for 52 percent and the top 15 for 68 percent.
  • The so-called “Big Three” disease categories (vascular events, infections, and cancers) account for an estimated 72 percent of all ED diagnostic errors resulting in serious misdiagnosis-related harms. However, major vascular events (42%) and infections (23%) substantially outnumber cancers (8%) in the ED clinical setting. Misdiagnosed trauma (11%), particularly craniospinal trauma linked to neurological injury, is also common.
  • The top 5 organ systems with diseases linked to serious diagnostic error are neurologic (including stroke) (34%), cardiovascular (23%), pulmonary (8%), gastrointestinal (7%), and hematologic (including venous thromboembolism) (7%). Taken together, these account for an estimated 79 percent of all serious misdiagnosis-related harms in the ED.
  • Among children in the ED, the distribution of misdiagnosed diseases is likely substantially different, with missed infections and fractures dominating over missed vascular events. Unusual conditions such as Kawasaki disease may be frequent among misdiagnosed diseases, although robust data from multiple sources are lacking.

Summary of Findings

There were 41 studies pertinent to Key Question (KQ) 1. Among these were 40 studies that reported on the most common diagnostic errors among patients presenting to the ED.16, 17, 31, 5996 There were 18 “numerator only” studies (including five based on malpractice claims,17, 71, 84, 95, 96 two based on incident reports,16, 31 and two based on mixed data sources80, 90) and 23 “numerator and denominator” studies” (20 cohort studies and 3 cross-sectional studies). Heterogeneity in disease categorization across studies hampered cross-study comparisons and meta-analysis.

When considering all diagnostic errors, including those that produce minimal patient harm, relatively little is known about the overall distribution of symptoms or diseases involved. This is because studies of diagnostic error distribution that address all diseases (i.e., not disease-, symptom-, or discipline-specific) generally rely on a triggering adverse event to identify cases (e.g., repeat visit or hospitalization, incident report, malpractice claim). Thus, more is known about disease distribution for diagnostic errors that result in adverse events, and little is known about the subset that result in minimal or minor harms. With that caveat noted, the top 20 individual diseases associated with diagnostic errors (independent of harm severity), in approximate rank order, were found to be fracture, stroke, myocardial infarction, appendicitis, venous thromboembolism, spinal cord compression and injury, aortic aneurysm and dissection, meningitis and encephalitis, sepsis, traumatic brain injury and traumatic intracranial hemorrhage, arterial thromboembolism, lung cancer, ectopic pregnancy and ovarian torsion, pneumonia, testicular torsion, gastrointestinal perforation and rupture, spinal and intracranial abscess, open and non-healing wounds, cardiac arrhythmia, and intestinal obstruction (with or without hernia). It is likely that this list of misdiagnosed diseases, derived from two large “numerator-only” studies (one malpractice-based and one incident report-based), is strongly skewed by reporting bias towards diseases that, when missed, lead to serious harms. It is also likely that the list is skewed towards false negatives (as opposed to false positives). Finally, it may also be partly skewed towards errors likely to be confirmed in hindsight by radiographic review, including both fractures and lung cancer, and skewed away from common conditions that may be frequently misdiagnosed (e.g., migraine97), but go unaccounted. The top 15 individual diseases associated with serious misdiagnosis-related harms, in rank order, were found to be (1) stroke, (2) myocardial infarction, (3) aortic aneurysm and dissection, (4) spinal cord compression and injury, (5) venous thromboembolism, (6/7 – tie) meningitis and encephalitis, (6/7 – tie) sepsis, (8) lung cancer, (9) traumatic brain injury and traumatic intracranial hemorrhage, (10) arterial thromboembolism, (11) spinal and intracranial abscess, (12) cardiac arrhythmia, (13) pneumonia, (14) gastrointestinal perforation and rupture, and (15) intestinal obstruction. The top 3 conditions account for an estimated 28 percent (95% confidence interval (CI) 26 to 30) of all serious harms from diagnostic error in the ED, while the top 8 account for 52 percent (95% CI 49 to 55) and the top 15 for 68 percent (95% CI 66 to 71). This list is much more likely to be representative, because malpractice claims and incident reports, although clearly biased towards more severe adverse events, are likely fairly representative of the cases with poor health outcomes (i.e., serious harms). Fractures (#1 any severity) and appendicitis (#4 any severity) do not make the serious-harm list, since most associated harms are low- or medium-severity, not high-severity. It is possible that lung cancer is still overrepresented because the “proof” of error found in chest radiographs may be a key factor that helps ascertain the presence of an error or, alternatively, increases the odds of legal action.

Data were sparse with respect to the most commonly misdiagnosed clinical presentations (symptoms, signs, or syndromes). One small study based on malpractice claims (n=62) found that abdominal pain, trauma, and neurological symptoms topped the list.95 The largest study of malpractice claims17 found that neurological diseases were the most common, which implies that neurological symptoms are probably highly prevalent among malpractice claims. The relative frequency of misdiagnosed symptoms obviously varies by disease (e.g., stroke does not present with abdominal pain, and mesenteric ischemia does not present with headache). Multiple studies from the review demonstrated a strong link between specific symptoms and specific misdiagnosed diseases. Among these were dizziness/vertigo (strongly associated with missed ischemic stroke); headache (associated with missed ischemic stroke, subarachnoid hemorrhage, raised intracranial pressure, cerebral venous sinus thrombosis, and meningitis)21, 64, 81; abdominal pain (associated with missed myocardial infarction, aortic pathology, cancer, appendicitis, intestinal disorders, and ovarian pathology, among others)60, 68, 70, 73, 76, 98; and back pain (associated with missed spinal abscess and other causes of spinal cord compression, as well as other myelopathic disorders).81, 99

Regarding prospectively defined subgroups, data were insufficient to determine whether disease distributions meaningfully differed between the United States and countries outside the United States. Unsurprisingly, the spectrum of diseases seen in specialty EDs differed dramatically from that seen in general EDs. For example, key missed conditions in “eye” or “eye and ear” EDs leading to patient harm included uveitis and retinal detachment. There were fewer studies of admitted patients and only a subset identified the distribution of missed diseases; differences in disease groupings and result reporting made direct comparisons challenging.

The absolute frequency of malpractice claims of different diseases varies over the age spectrum (Figure 2). Among pediatric populations, dangerous diseases are, overall, less common than in adults, and, accordingly, serious misdiagnosis-related harms are also less common.17 In particular, vascular events are less common than in adults, and missed fractures and infections (including missed appendicitis) predominate. Nevertheless, missed strokes and childhood cancers remain important causes in children, particularly older children. Missed testicular torsion may also be an important cause of misdiagnosis-related harms in the pediatric ED population. Child abuse is an important condition that is likely underrepresented in diagnostic error malpractice claims (see Representativeness of Malpractice Claims Data for Disease Distribution, below).

Figure 2 shows the age distribution of the big three conditions (vascular, infection, and cancer) and other conditions among malpractice claims associated with misdiagnosis-related harms. Malpractice claims for vascular conditions tend to peak for those aged 41 to 50 years. Malpractice claims for infections are highest for those aged 0 to 10 years and those aged 41 to 50 years. Malpractice claims for cancer misdiagnosis are highest for those aged 51 to 60 years. Malpractice claims for those with other conditions are highest for those aged 41 to 50 years.

Figure 2

Age distribution of “Big Three” and other (non-Big Three) diseases among diagnostic error malpractice claims (any severity).

Key Question 1a. What diseases or syndromes are associated with the greatest total number and the highest risk of diagnostic errors or misdiagnosis-related harms?

For disease-agnostic data on diagnostic errors, major data sources are malpractice claims (numerator-only), incident reports (numerator-only), and stimulated chart reviews based on systematically identified unexpected or adverse outcome events (numerator-denominator). Systematic follow-up of patients, conducted routinely or as part of a prospective study, generally provides the strongest data (because contemporaneous data can be gathered, independent of the clinical record) but is rarely available. Malpractice claims are routinely captured by risk insurers, labeled as diagnostic error-related, and then thoroughly analyzed, making them a rich source of information on the distribution of diseases (KQ1) and causes (KQ3) of diagnostic error. Incident reporting systems sometimes permit labeling of incidents as a diagnostic error-related100; at the institutional level, their value is principally in identifying unexpected errors or latent risks, but regional or national incident reporting systems can also be used to identify the distribution of diseases (KQ1) and causes (KQ3) of diagnostic error.

There were nine studies16, 17, 31, 59, 71, 80, 90, 95, 96 that addressed KQ1a directly for all diagnostic errors, reporting on a total of 5,817 diagnostic errors. The two largest studies, one a large, United States-based review of a national malpractice claims database (Newman-Toker, 201917) and the other a large, United Kingdom-based review of a national incident reporting system (Hussain, 201916) together represented 78 percent of the total number of diagnostic error cases (n=4,561 of 5,817). These two studies organized their categories in similar enough fashion to present results together (Table 2). In particular, Newman-Toker et al. used the Agency for Healthcare Research and Quality’s standardized coding schema from the Clinical Classifications Software101 to aggregate diagnosis codes into clinically sensible and comparably granular categories (something not done by most diagnostic error studies). Data provided by the original authors permitted reaggregation of data using the standardized coding schema for more than just the “Big Three” categories emphasized in the original report (i.e., the “non-Big Three” category was categorized here using the original method to identify comparably granular categories across all diagnostic error groups). They also provided further detail broken down by severity of harm to patients, enabling comparison of diseases causing higher- versus lower-severity harm to patients when misdiagnosed. Table 3 reports the diseases in rank order that cause serious misdiagnosis-related harms, as determined from the largest, most detailed study of ED malpractice claims (Newman-Toker, 2019).17

Missed fractures appear to be the ED diagnostic errors most commonly reported in malpractice claims and incident reports,16, 31, 71, 80, 90 but the level of harm associated with most missed fractures is generally lower than that for missed major medical and neurologic events.17 As a result, though they top the list of diagnostic errors identified (Table 2), they are not among the top 15 clinical conditions associated with serious misdiagnosis-related harms to patients (Table 3).

Table 2. Frequency of diagnostic errors from the two largest studies of all emergency department diagnostic errors.

Table 2

Frequency of diagnostic errors from the two largest studies of all emergency department diagnostic errors.

The list of top diagnostic errors independent of harm severity (i.e., any severity), as seen in malpractice claims and incident reports, is shown in Table 2 above. Fractures, stroke, myocardial infarction, appendicitis, and venous thromboembolism top the list. A reporting bias towards more severe outcomes almost certainly impacts this list (see “Representativeness of Malpractice Claims Data for Disease Distribution,” below). It is unknown whether ascertainment and reporting biases linked to radiographic misdiagnosis (which is more easily confirmed and contested than other types of diagnostic error) lead to fractures being further overrepresented in malpractice claims or incident reports, but their high annual incidence (2 million ED cases per year in the United States, as of 2020, according to the National Electronic Injury Surveillance System [NEISS]102) makes it likely that, even if overrepresented, they are still quite common. Additional information on possible overrepresentation of fractures is provided in the section below entitled, “Representativeness of Malpractice Claims Data for Disease Distribution.”

What remains unknown when considering “all” diagnostic errors is the frequency of other misdiagnoses that are likely to be as or more common than fractures, yet less transparent. There is some evidence that certain non-radiographic errors causing lower-severity harms are probably grossly underrepresented in both absolute terms and relative to fractures. For example, there are now nearly 5 million ED visits for dizziness and vertigo in the United States annually. Based on known disease distributions, it is probable that more than 1 million of these patients have benign inner ear causes (mostly benign paroxysmal positional vertigo [BPPV] and vestibular neuritis). Diagnostic error rates for these benign conditions have been estimated to be over 80 percent.103, 104 [A recently completed randomized clinical trial (AVERT NCT02483429), too new to be included in our search, found an 87 percent error/non-diagnosis rate for these disorders for the ED clinical team (with just 9 of 68 cases correctly diagnosed using all available data including neuroimaging and clinical consultation results) versus 26 percent error/non-diagnosis rate for a specialist (with 50 of 68 correctly diagnosed using only a brief history and eye movement recordings from the ED visit).]105 Thus, the total number of misdiagnosed benign inner ear cases is likely to exceed 800,000 per year, with more than 500,000 clearly preventable. By contrast, fractures are probably missed less than 5 percent of the time (see KQ2), so, given an estimated 2 million ED fractures per year in the United States from NEISS, the absolute number of missed fractures is likely to be fewer than 100,000 per year (and probably only about 20,000 per year, given the most likely estimate of the false negative rate for fractures is about 1% [see KQ2]). Despite over an order of magnitude more ED missed cases than fractures, inner ear diseases such as BPPV do not appear in any malpractice claim- or incident report-based “top 10” lists (while fractures routinely occupy the top-ranked spot in such lists, as they do in Table 2). There is no way to know how many other similar diagnostic error problems exist in the ED yet are not currently being tracked or reported, but it is probable that other commonly misdiagnosed diseases are missing (e.g., migraine97).

It strengthens and corroborates the evidence from both sources shown in Table 2 that the percentages for the two most common dangerous diseases (stroke and myocardial infarction) are nearly the same between these two data sources (U.S. malpractice claims17 and U.K. incident reports16), and, although absolute frequencies differed (especially for fractures and appendicitis), the relative disease frequency rank order was fairly similar for the top 8 listed conditions. Appendicitis was the third most commonly identified condition after stroke and myocardial infarction in the U.S. claims-based study (Newman-Toker, 201917), but harms were low- to medium-severity in 94 percent of cases, such that it was not part of the top 15 diseases associated with serious harms (Table 4). The serious harm frequency for appendicitis of 0.5 percent in the U.S. claims-based study (Newman-Toker, 201917) was quite similar to the 0.7 percent in the U.K. incident report-based study (Hussain, 201916). This would seem to suggest that the U.K. incident reporting system pulls for higher-severity harm events, similar to malpractice claims. However, the serious harm proportion in the U.K. incident report-based study (Hussain, 201916) was much lower than in the U.S. claims-based study (Newman-Toker, 201917) (15% versus 59%), suggesting otherwise. Also, the proportion of fractures (which are generally of lower severity) was much higher in the U.K.-based study. It is unclear whether these differences are methodological or real.

Table 3. Most common individual conditions causing serious misdiagnosis-related harms reported in ED malpractice claims.

Table 3

Most common individual conditions causing serious misdiagnosis-related harms reported in ED malpractice claims.

Table 4. Proportion of misdiagnosis-related harms attributable to “Big Three” diseases reported in ED malpractice claims, broken down by high-severity versus low-/medium-severity harms.

Table 4

Proportion of misdiagnosis-related harms attributable to “Big Three” diseases reported in ED malpractice claims, broken down by high-severity versus low-/medium-severity harms.

The largest and most comprehensive study evaluating severity of patient harms from diagnostic error is Newman-Toker, 2019.17 This was a U.S.-based malpractice claims study (not restricted by age or disease, and representing nearly 30% of all U.S. national claims during the study period from 2006-2015) which found misdiagnosed stroke as the leading cause of serious misdiagnosis-related harms (i.e., severe disability or death), followed by myocardial infarction, aortic aneurysm/dissection, spinal cord compression/injury, venous thromboembolism, meningitis/encephalitis, sepsis, lung cancer, traumatic brain injury/traumatic intracranial hemorrhage, arterial thromboembolism, spinal/intracranial abscess, cardiac arrhythmia, pneumonia, gastrointestinal perforation/rupture, and intestinal obstruction +/− hernia (Table 3).17 As is apparent from Tables 3 and 4 (and the differences between these and Table 2), the list of conditions responsible for different severity harms is only partially overlapping. The top-ranked 15 conditions causing serious harm account for 68.1 percent of the high-severity harms but only 22.6 percent of the lower-severity harms. Fractures and appendicitis, which are commonly mentioned (particularly in studies of diagnostic errors in children), do not make the top 15 associated with high-severity harms despite being major contributors to lower-severity harms.

The so-called “Big Three” disease categories (vascular events, infections, and cancers) account for an estimated 72 percent of all ED diagnostic errors resulting in serious misdiagnosis-related harms. However, major vascular events (42%) and infections (23%) substantially outnumber cancers (8%) in the ED clinical setting. Misdiagnosed trauma (11%), particularly craniospinal trauma linked to neurological injury, is also common. The top 5 organ systems with diseases linked to serious diagnostic error are neurologic (including stroke) (34%), cardiovascular (23%), pulmonary (8%), gastrointestinal (7%), and hematologic (including venous thromboembolism) (7%). Taken together, these account for an estimated 79 percent of all serious misdiagnosis-related harms in the ED (Table 4).

Total malpractice claim payouts may be important for prioritization at the institutional level. The range across the top 10 most costly diseases (including claims of any severity, 2006-2015) was from $17 million for lung cancer (#10 in payouts) to $60 million for stroke (#1 in payouts) (Table 3). Four other neurological conditions led to disproportionately high payouts when missed, causing their payout-based rank to rise above their high-severity harm frequency-based rank—spinal cord compression/injury (from #4 to #2, $44M), meningitis/encephalitis (from #6 to #4, $34M), spinal and intracranial abscess (from #11 to #5, $29M), and traumatic brain injury/traumatic intracranial hemorrhage (from #9 to #6, $27M). Taken together, these five neurological conditions accounted for 30 percent of high-harm ED cases (ranks #1, 4, 6, 9, 11) and 35 percent of total payouts (ranks #1, 2, 4, 5, 6). Most (8 of 10) of the medical conditions making up the remainder of the top conditions associated with serious harm had lower payout ranks than their frequency ranks for high-severity harms. This suggests neurological injuries led to worse patient outcomes (and, in particular, a higher proportion of severely disabling outcomes, as expected).

We reviewed other published malpractice claims reports and grey literature reports from major medical liability insurance carriers or similar risk management entities (Appendix Table B-1). Only one provided data on claims specific to the ED setting and used a roughly comparable disease categorization process (Troxel, 2014).96 These results, published in a quarterly report by The Doctor’s Company, found that 58 percent of their ED claims (n=242/414) were diagnosis-related. They did not stratify their findings based on harm severity and listed only the top six conditions (fracture 13%, stroke 13%, myocardial infarction 5%, meningitis 5%, appendicitis 2%, spinal abscess 2%); nevertheless, the list appears to be quite similar to results obtained from the 9.3-fold larger CRICO Comparative Benchmarking System database analysis (n=2,273).

The U.K. incident report study (Hussain, 2019) reported that among 877 (38%) of cases with sufficient data to assess outcome severity, the distribution was no harm (20%), mild harm (52%), moderate harm (14%), severe harm (4%), and death (10%).16 Among 128 cases with severe harm or death, frequent diagnoses included abdominal aortic aneurysm (n=18, 14%), intracranial bleed (a subtype of stroke) (n=15, 12%), and pulmonary embolus (n=8, 6%). This top three of high-severity misdiagnosis-related harms presented by Hussain et al. matches three of the top five conditions from the high-severity U.S. malpractice claims. Myocardial infarction did not make the Hussain top-harms list, despite being substantially ahead of aortic aneurysm and pulmonary embolus on the overall (regardless of harm severity) frequency list from Hussain. This could indicate that serious harm from myocardial infarction is overrepresented among U.S. malpractice claims (see below on “Representativeness of Malpractice Claims Data for Disease Distribution”).

A smaller incident report study (Okafor 2016) found similar overall disease distributions to those from malpractice claims shown in Table 4. The proportion of claims attributable to the “Big Three” (both high-severity and low-/medium-severity) is 58 percent (Table 4). When cases reported in Okafor (n=214) are tabulated and classified into the “Big Three” disease categories, together these three groups accounted for 55 percent of incident reports (33% vascular, 21% infection, 1% cancer, 45% other).31 These overall similarities across studies, study teams, and methods further bolster the validity of findings presented in Tables 2 through 4.

Differences in Disease Distribution by Prespecified Subgroups

We attempted to assess whether disease distributions differed by prespecified subgroups for KQ1. These included comparison between U.S.-based and non-U.S. based studies; patient age group (children younger than 18 years of age versus adults aged 18 years or older; and, within the adult population, adults younger than 65 years of age versus adults aged 65 years or older); ED type (general versus specialty ED [e.g., psychiatric, eye and ear]); and ED disposition (ED discharges versus admissions versus transfers).

Differences by Country of Study Origin

We identified 6 studies conducted in the United States17, 31, 59, 90, 95, 96 and 3 studies conducted outside of the United States.16, 71, 80 Unfortunately, it was not possible to draw any strong conclusions based on country of study origin because very few studies addressed disease distribution in directly comparable ways. As described above and shown in Table 2, there were strong similarities in the disease distributions between U.S. malpractice claims and U.K. incident reports, at least for the most commonly identified errors and harms. It was noteworthy that the one non-U.S. study of closed claims, based out of the Netherlands, found 78 percent of cases to be associated with missed fractures or related musculoskeletal injuries71; this was substantially higher than what was found in U.S.-based studies, where fractures or other traumatic injuries represented just 10 to 20 percent of cases.17, 95, 96 It is unknown whether this apparent difference relates to differences in study methodology or to international differences in the mechanisms for malpractice claims to be filed.

Differences by Patient Age Group

We identified one study conducted among pediatric populations,90 none among adult populations, four among multiple age groups or populations not restricted by age,17, 59, 80, 96 and four among populations where the patient age was unclear or not reported.16, 31, 71, 95 While meta-analytic comparisons were hampered by study differences in design and reporting, there were clear differences in disease distribution by age group. These were most clearly illustrated in the largest U.S.-based malpractice claims study, as shown above in Figure 2 and below in Table 5.

Table 5. Variation in diagnostic error malpractice claims (any severity) by patient age decile.

Table 5

Variation in diagnostic error malpractice claims (any severity) by patient age decile.

There are fewer ED diagnostic error-related malpractice claims among children (<18 years old, 13%) than among adults (18 years or older, 87%). This is mainly because there are fewer pediatric ED patients (about 30 million, 20-25%107) than adult ED patients (about 100 million, 75-80%). However, this incompletely accounts for the difference. Table 5 shows that there are proportionately fewer claims per age decile for those ages 0-20 (8%) versus for those ages 21 and older (11%). When considering adults ages 21-60 (for whom age-related mortality has not appreciably reduced the general population),108 the difference is even larger (17%, ratio about 2:1 for claims in adults versus children). A similar difference in the epidemiology of malpractice claims per population has been reported previously (for all claims, not restricted to diagnostic errors or ED care) using the National Practitioner Data Bank, which showed 5.6 claims per 100,000 population for children versus 10 claims per 100,000 for adults (ratio about 2:1 for claims in adults versus children),109 so is unlikely to represent a bias in CRICO data. Although all malpractice claims are less frequent in pediatric populations, the plurality (48%) of claims are still diagnosis related (as in adults), and 58 percent occur in the ED setting.110 Although this absolute frequency difference between children and adults could be accounted for by a lower likelihood of a lawsuit being brought when the patient is a child, this seems highly improbable; if anything, one would suspect just the opposite, since legal actions are disproportionately sought when the severity of adverse outcomes is greater111 (as would be the case for a child who might otherwise have a “full life to live” were it not for a devastating medical misdiagnosis). The greater likelihood of a lawsuit being brought when the claimant is a child is supported by data from the National Practitioner Data Bank showing higher payouts in pediatric than adult cases, with the highest payouts occurring among the youngest children and the lowest payouts among the oldest adults.109 Some specific data on the relative frequency of claims, such as those related to lung cancer misdiagnosis in the ED, appear to confirm the general suspicion of a higher likelihood that cases will be brought when patients are younger (see Representativeness of Malpractice Claims Data for Disease Distribution, below).

This leaves two possible explanations—either (a) diagnostic errors are less frequent among children (e.g., because they have less medical comorbidity, so are less “complex”) or (b) harms are less frequent among children (e.g., because they are less often impacted by life-threatening diseases or are more medically resilient when such diseases are present). The rate of diagnostic errors in pediatric acute care settings (5.0%)112 is close to that estimated for the aggregate ED setting (5.7%, see KQ2), suggesting explanation “a” is less likely. Explanation “b” makes sense and corresponds best to the data shown in Figure 2, which show that diagnostic error claim frequency roughly mirrors the relative prevalence of dangerous disease groups in children versus adults (higher prevalence of infections and lower prevalence of vascular events and cancer). Thus, to summarize, there appear to be fewer total (absolute) misdiagnosis-related harms among children, most likely because they are fewer in number (total population), visit the ED less frequently, and less often have a dangerous underlying cause; there is less evidence to support the contention that the rate of diagnostic errors is lower or that harms occur less frequently (or are less severe) when a misdiagnosis occurs and an underlying dangerous cause is present.

Overall, among children, vascular events are less prevalent than in adults while missed fractures and infections (including missed appendicitis) tend to predominate.17, 90 As shown in Table 5 with results by age decile, the largest malpractice claims-based study (Newman-Toker, 2019) found that infections accounted for 52 percent in the 0-10 age group and 31 percent in the 11-20 age group. Authors grouped fractures with “other” diseases, but a review of source data from the authors found that, among children under age 18, fractures accounted for just 9 percent (n=25 of 269) of diagnostic error malpractice cases of any harm severity and 7 percent (n=4 of 54) of those resulting in high-severity harms. In a smaller study of pediatric diagnostic error malpractice claims, 24 percent (n=12 of 50) were fractures.84 Studies in pediatric populations using other methods showed some degree of concordance (i.e., a relative preponderance of infections), but were not directly comparable because of differences in design and disease categories. One cohort study of patients admitted from the ED, in particular, from Children’s Hospital (Boston, Massachusetts) looking at 10 predefined conditions (notably not including fracture) found the most common of the 10 diseases (regardless of error) were appendicitis (53%), pancreatitis (14%), and sepsis (10%) (n=2,151).82 However, the most frequent diagnostic errors (total n=67) occurred with Kawasaki disease (25 percent diagnostic errors [n=17 of 67]; 9% of cases [n=194 of 2,151]), followed by pancreatitis (24% diagnostic errors; 14% of cases) and septic arthritis (18% of diagnostic errors; 8% of cases). The list of diagnostic error frequency after that was appendicitis (10%), sepsis (9%), stroke (including cerebral venous sinus thrombosis) (6%), ovarian torsion (4.5%), and hemolytic uremic syndrome (3%). The diseases with the highest ratio of diagnostic error proportion to overall prevalence were hemolytic uremic syndrome, stroke, and Kawasaki disease. Since this study was conducted at a quaternary care referral center and the age ranges of patients included was not reported, it is unclear the extent to which results are representative of all ED diagnostic errors among children.

Differences by ED Type

We identified two studies conducted in specialty EDs: one an eye and ear ED79 and one an orthopedics ED for minor injuries.83 The remainder were general EDs or did not report the ED type. Given the limited number of studies in specialty EDs, no meta-analysis could be performed. However, as expected, the distribution of diagnostic errors differed dramatically from those seen in general EDs. Unsurprisingly, missed conditions at the “eye and ear” ED leading to patient harm included uveitis, retinal detachment, and corneal abrasion, while all the diagnostic errors at the orthopedic ED were reported to be musculoskeletal.

Differences by ED Disposition

Although data are limited, the distribution of diseases frequently misdiagnosed in admitted patients may be distinct from that among discharged patients. For disease-specific studies, patients admitted were usually “overcalls” (e.g., false positive diagnosis of a dangerous disease such as migraine mistaken for stroke) while patients discharged were “undercalls” (e.g., false negative diagnosis of a dangerous disease such as stroke, misattributed to inner ear disease). However, this is what would necessarily be expected from a disease-specific study by design, so does not speak directly to any possible differences in disease distribution.

We identified only two disease-agnostic studies that addressed diagnostic error among patients admitted via the ED, both European.62, 75 The first, from Spain, found 42 errors among 669 admissions (6.3%) with the most frequent misdiagnoses being infections (pneumonia, bronchitis, and tuberculosis) and vascular events (pulmonary embolism and heart failure).75 The second, from Switzerland (Peng, 2015), looked at a specific subset of patients presenting to the ED with non-specific symptoms and modest illness severity (Emergency Severity Index scores of 2 or 3).62 They found 309 ED diagnostic errors among 573 admissions (54%), only 53 of which were corrected during the hospitalization, with the others discovered through follow-up. This high rate of error may have been due to differences in error definition (based on 30-day follow-up rather than end of hospitalization) or, more likely, a function of the narrowly defined “non-specific symptoms” cohort included in the Swiss study. Among the 309 errors, 211 were coded as “missed” diagnoses in the ED, while others were listed as secondary diagnoses in the ED but were later determined to be primarily responsible for the initial clinical presentation. The most frequent correct final diagnoses (n missed/n total) were urinary tract infection (26/49), electrolyte disorders (19/40), pneumonia (12/37), functional impairment (30/34), renal failure (20/33), malignant neoplasm (14/32), heart failure (14/26), intoxications (16/24), dementia (13/23), depression/anxiety (17/20), orthostasis (10/19), and dehydration (8/17).

Representativeness of Malpractice Claims Data for Disease Distribution

It is known that malpractice claims data represent a biased sample of cases, so it is then reasonable to consider whether bias(es) might influence the distribution of diseases represented in this report. In particular, claims are known to be biased towards higher-severity harms17; this is self-evident from Tables 3 and 4, since high-severity harms are relatively rare, yet among the malpractice cases there are more high-severity harm cases than low- and medium-severity harm cases combined. This is further reinforced by the much higher fraction of high-severity harms in the malpractice claims than in the large incident report study described above (58%17 versus 15%16). It is uncertain what additional biases may be at work, but results from the systematic review do suggest that some specific biases in the malpractice claims data may be present.

It has previously been suggested that diseases with tangible clinical artifacts from the encounter (e.g., radiographs showing missed incidental findings, such as a lung nodule on chest X-ray) make it easier to bring a legal action, leading to overrepresentation of cancer cases in claims, which does appear to be the case in primary care settings.17 It is possible that this may partially account for the relatively high number of lung cancer cases among ED claims, particularly given the high frequency of obtaining chest imaging in the ED (relative to other types of imaging likely to disclose cancers of the breast, prostate, colon, or other malignancies).

It is unknown the extent to which the same bias might lead to overrepresentation of fractures among ED claims. As mentioned above, there are about 2 million ED cases of fractures per year in the United States, as of 2020, according to the NEISS.102 With a maximum plausible error rate of 5 percent and a more probable estimate of about 1 percent (see KQ2, Fractures), there are likely no more than 100,000 missed ED fractures per year and probably closer to 20,000 per year in the United States. By contrast, there are an estimated 800,000 strokes and likely 400,000 transient ischemic attacks (TIA) each year in the United States; with a meta-analytically summarized error rate of 17% (see KQ2), this suggests there are roughly 200,000 missed cerebrovascular events annually. In Table 2, fractures outnumber strokes 1.3-fold in U.S.-based malpractice claims and 4.2-fold in U.K.-based incident reports. It is hard to imagine how this discrepancy can be explained other than to suggest missed fractures are overrepresented relative to missed strokes in these data sets. One possible cause, alluded to in the prior paragraph, is the presence of verifiable evidence of the diagnostic error through re-examination of radiographs.

Given that only 1.5 percent of myocardial infarctions are missed, it is possible that missed heart attacks may also be overrepresented in malpractice claims relative to their population prevalence. In terms of population annual incidence, heart attacks and strokes are very similar in the United States,113 and the rate of missed stroke (17%) is roughly an order of magnitude higher than that for heart attacks, yet there are only 1.5-fold more strokes in claims than there are heart attacks. We speculate here that the rationale could be that “standard of care” expectations are now so high for heart attacks that any missed case probably crosses the legal threshold for care to be considered “sub-standard.” Alternatively, missed strokes could be underrepresented for the opposite reason—because successful legal claims may be infrequent when overall misdiagnosis rates are high (e.g., stroke manifesting with clinical dizziness or vertigo, where error rates are estimated to be roughly 40%,21, 103, 114116 yet claims cases are fewer than expected117). In such cases, if the “standard of care” is effectively to miss (rather than detect) a stroke, a course of legal action may be pursued less often or only infrequently lead to a paid claim. If malpractice data are used to track diagnostic error rates or disease distributions, it will be important to conduct further research into the types, direction, magnitude, and frequency of such biases.

Age-related biases are also a possibility, at least for some diseases. Figure 2 and Table 5 show that the peak age of incidence of missed cancer in malpractice claims is 51-60 years of age, and most of this reflects lung cancer (46% of 122 cases) with the next most common being brain/spinal tumors (19%), hematologic malignancies (8%), and colorectal cancer (7%). However, the peak incidence of cancer cases is 65-74 years, with 71 percent of cases occurring over age 65.118 If the principal mechanism by which lung cancer is missed in the ED is via missed incidental lung nodules on chest X-ray,106, 119 then there is no specific reason why this should occur with greater frequency in younger patients than older ones—if anything, they should have less lung pathology that interferes with radiographic interpretation. This suggests a likely age bias to file a legal claim when the patient is younger, rather than older. It is unknown whether this sort of bias may explain some of the skewed distribution in Figure 2 and Table 5 towards more claims among younger and middle-aged patients, who have a lower incidence of dangerous diseases relative to their older counterparts; the alternative explanation is that misdiagnosis is more frequent because younger patients are not thought likely to have dangerous diseases (e.g., stroke).64 Child abuse (non-accidental trauma) is a special case in which misdiagnoses are unlikely to result in malpractice claims, even if the underlying problem does result in serious harm to the child, since the abuser (often a parent) is unlikely to draw attention to the underlying cause via a legal claim. Also see KQ1, Differences by Patient Age Group, for additional consideration of potential biases related to pediatric claims.

Other biases could be at work that are not readily apparent from the available literature. For example, disadvantaged or vulnerable populations (e.g., those who are differently abled, racial or ethnic minorities, lower health literacy, lower socioeconomic status, prisoners, immigrants) might be more likely to be misdiagnosed and less likely to file a legal claim. However, we could find no specific evidence to suggest that this would likely impact the distribution of diseases for KQ1. In particular, it is important to note that there was close alignment between the list of diseases from malpractice claims and those reported in diagnostic safety incidents (Table 2), which argues fairly powerfully against a major disease maldistribution based on claims data.

Key Question 1b. Do results vary based on the severity of any resulting misdiagnosis-related harms (e.g., death or permanent disability, as opposed to less serious harms)?

Twelve out of 40 studies reported misdiagnosis-related harms.16, 17, 31, 61, 74, 75, 80, 83, 86, 87, 89, 92 Many of these studies (6 out of 12) did not report harms related to specific disease categories but rather across all diseases in the cohort.16, 31, 61, 75, 80, 92 As described above, the clearest data on this point come from a single, large, U.S.-based malpractice claims study.17 It is clear from the data presented in Tables 2 and 3 that the distribution of diseases responsible for serious misdiagnosis-related harm differs from those responsible for any misdiagnosis-related harm. Serious harms are caused disproportionately by missed vascular events and severe infections, while less severe harms are caused disproportionately by “non-Big Three” diseases (Table 4), including fractures and some infections with fewer high-severity adverse outcomes when missed (e.g., appendicitis).

The same malpractice claims study also provides evidence that, among those with serious misdiagnosis-related harms, the distribution of underlying diseases in those suffering death differs somewhat from the distribution of underlying diseases in those suffering permanent disability. Specifically, the top three causes of death are myocardial infarction, aortic aneurysm or dissection, and venous thromboembolism. By contrast, the top three causes of permanent, serious disability are stroke, spinal cord compression/injury, and meningitis/encephalitis. This pattern is expected, with serious adverse outcomes from major cardiovascular disease principally being death and those from major neurologic disease principally being permanent disability.

Key Question 1c. What are the most common clinical presenting symptoms or signs associated with diagnostic errors or misdiagnosis-related harms in the ED?

In malpractice claims, the top clinical presentations associated with diagnostic error may be neurological symptoms, abdominal pain, and trauma, but data are sparse.72, 95 A high frequency of neurological symptoms is made more likely by the fact that diseases affecting the central nervous system are the most common diseases associated with serious misdiagnosis-related harms (34% of all ED serious harms, representing the #1 organ system involved [Table 4]). In addition, based on studies of specific diseases, it appears likely that the most common symptoms associated with misdiagnosis vary substantially by disease21, 63, 64, 77, 94 and also by age group.78, 94

Key Question 1d. Do the most common clinical presenting symptoms or signs associated with diagnostic error or misdiagnosis-related harms vary by disease or syndrome?

In addition, based on studies of specific diseases, it appears likely that the most common symptoms associated with misdiagnosis vary substantially by disease21, 63, 64, 77, 94 and also by age group.78, 94 As clarified in KQ3, “atypical” symptoms for a given disease consistently increase risk for diagnostic error. Table 6 highlights the most common “atypical” presenting symptoms and related misdiagnosed diseases identified in this analysis, by symptom. Table 7 highlights the most common “atypical” symptoms, by disease. We found limited data on the relationship between presenting symptoms and harms, other than to note that those with “atypical” symptoms often have milder forms of disease, leading to the “misdiagnosis is protective” paradox (KQ2).

Table 6. Most common “atypical” presenting symptoms and related misdiagnosed diseases.

Table 6

Most common “atypical” presenting symptoms and related misdiagnosed diseases.

Table 7. Most common dangerous conditions presenting with “atypical” symptoms.

Table 7

Most common dangerous conditions presenting with “atypical” symptoms.

Key Question 2. Rates of Diagnostic Errors

Key Points

  • We estimate a weighted average overall diagnostic error rate of 5.7 percent (95% CI 4.4 to 7.1) per ED visit. The overall representativeness of this estimate for ED care is uncertain, but the figure is not outside the range expected based on disease-specific error rates.
  • Variation in diagnostic error rates by disease were striking with the lowest per-disease diagnostic error rate seen for myocardial infarction (false negative rate 1.5%) and the highest seen for spinal abscess (false negative rate 56%). Most of the top harm-producing dangerous diseases are initially missed at rates of 10 to 28 percent, and there is roughly an inverse relationship between annual disease incidence and diagnostic error rates.
  • An estimated overall misdiagnosis-related harm rate of 2.0 percent (95% CI 1.0 to 3.6) per ED visit comes from one rigorous, prospective study. Retrospective trigger-based studies included many more ED visits and often reported much lower rates, but this was almost certainly due to systematic under-ascertainment from retrospective methods.
  • An estimated overall misdiagnosis-related death rate of 0.2 percent (plausible range [PR] 0.1 to 0.4) per ED visit comes from the same prospective study. This value is corroborated by estimates derived from another high-quality prospective study of admitted ED patients, which found an absolute mortality increase of 4.8 percent (2.4-fold relative increase) and, when combined with data on preventable deaths measured among ED discharges, yields a similar blended total mortality rate estimate (0.19% to 0.29%).
  • We estimated an overall serious misdiagnosis-related harms rate of 0.3 percent (PR 0.1 to 0.7) by averaging the results of two arithmetic calculations (one based on the proportion of adverse events that are serious and the other based on the mortality rate per ED visit combined with the ratio of disability to death among those with serious harms). This estimate reflects the combination of permanent, high-severity morbidity plus mortality.
  • Data on disease-specific health outcomes associated with diagnostic error were limited, and many were incorrectly reported as null effects (or even “protective” effects) without proper severity matching (or adjustment) from the time of initial clinical presentation. Nevertheless, our meta-analysis found an increase in mortality associated with diagnostic error for aortic dissection (21%, 95% CI 6 to 37) and individual studies reported increases for stroke, venous thromboembolism, and arterial thromboembolism (mesenteric ischemia).
  • If generalizable to all ED visits in the United States (130 million), best available evidence suggests there are over 7 million ED diagnostic errors, over 2.5 million diagnostic adverse events with preventable harms, and over 350,000 serious misdiagnosis-related harms, including more than 100,000 serious, permanent disabilities and 250,000 deaths. This is equivalent to a diagnostic error every 18 patients, a diagnostic adverse event every 50 patients, a serious harm (serious disability or death) about every 350 patients, and a misdiagnosis-related death about every 500 patients. Put in terms of an average ED with 25,000 visits annually and average diagnostic performance, each year this would be over 1,400 diagnostic errors, 500 diagnostic adverse events, and 70 serious harms, including 50 deaths.

Summary of Findings

Relatively less is known about the overall diagnostic error rate than the misdiagnosis-related harms rate. This is because studies of diagnostic error frequency that seek to address all diseases (i.e., are not disease-, symptom-, or discipline-specific) generally rely on a triggering adverse event to identify cases (e.g., repeat visit or hospitalization, incident report, malpractice claim). Thus, more is known about frequency for diagnostic errors that result in adverse events, and far less is known about the frequency of those that result in minimal or minor harms.

Nevertheless, we estimate a weighted average overall diagnostic error rate of 5.7 percent (95% CI 4.4 to 7.1) per ED visit by combining the error rate among ED discharges (4.1%) from a case-control study at a large university hospital in Spain with the error rate among ED admissions (12.3%) from a rigorous, prospective study at a university hospital in Switzerland. The overall representativeness of this estimate for U.S. ED care is uncertain, but the figure is not outside the range expected based on disease-specific error rates found in KQ2b, which range from 1 to 2 percent (fractures, myocardial infarction) to 56 percent or more (spinal abscess). Additionally, the 4.1 percent estimate for the ED diagnostic error rate is correctly situated within the spectrum of error and harm rates—diagnostic errors among admitted patients with “non-specific” symptoms [i.e., where there is a high degree of diagnostic uncertainty] (54%) >> diagnostic errors among all admitted patients (12%) >> diagnostic errors among treat-and-release discharges (4%) > diagnostic errors resulting in adverse events (2%) >> diagnostic errors resulting in serious harms, including death or permanent disability (0.3%). Finally, the overall error rate of 5.7% is comparable to that found in rigorous U.S.-based studies of other frontline care settings (e.g., 6.3% overall diagnostic error rate in U.S.-based primary care clinics).11 Thus, in light of all the relevant evidence, we believe it is appropriate to report and rely on this result.

Methodological approaches used in most of the identified studies tend to bias towards underestimation of diagnostic errors and misdiagnosis-related harms. These include (1) lack of systematic follow-up on discharged patients who do not return (including out-of-hospital deaths); (2) failure to account for hospital or health system crossovers (i.e., return to a different hospital or health system); (3) narrow definitions of diagnostic error that (i) limit to specific diagnostic process failures discernable from chart review, (ii) categorize as treatment-related the mismanagement of patients on the basis of an incorrect diagnosis, or (iii) do not include failures in communicating diagnoses to patients; and (4) failure to adjust for initial case severity, a key confounder, when assessing adverse outcomes due to diagnostic delay.

The last issue of initial case severity adjustment is crucially important to assessing adverse health outcomes from diagnostic error and calls into question the results of any study that fails to do so.1 Some studies in the review failed to adequately address case mix severity, potentially leading to erroneous inferences that delays in diagnosis do not have a deleterious impact on patient outcomes (or even benefit patients – the “misdiagnosis is protective” paradox).1 This problem occurs because illness severity is often a confounder (i.e., is causally linked to both the risk of misdiagnosis and the risk of a bad health outcome). Patients with higher initial case severity are less likely to have favorable clinical outcomes and also generally less likely to be misdiagnosed (because patients with more advanced or more serious disease tend to have more obvious clinical features that are easier to diagnose). Patients with lower initial case severity are more likely to have favorable clinical outcomes and also generally more likely to be misdiagnosed (because patients with earlier or milder disease tend to have less obvious clinical features that are more challenging to diagnose). An observational study that directly compares a population of all correctly diagnosed and all incorrectly diagnosed patients will generally find that initial case severity is higher in the correctly diagnosed population, skewing health outcomes for these patients in an unfavorable direction. This effect will tend to nullify the unadjusted, measured impact of diagnostic error or even reverse it (“misdiagnosis is protective” paradox).1 When cases of similar severity at initial presentation are compared, the impact of misdiagnosis can be properly determined. When early presentations with lower initial severity are missed at first contact, early treatment opportunities are squandered, so outcomes for these untreated patients become closer to those who initially presented later in the illness course with higher severity. In such cases, early intervention could potentially have yielded better outcomes, but this fact will often be obscured if a study compares outcomes unadjusted for initial case severity.

There were insufficient data to assess overall error and harm rates by prospectively defined subgroups. For disease-specific studies, there were no clear differences between studies conducted in United States versus those not conducted in the United States. The one disease-specific study which included both U.S.-based and European EDs and compared diagnostic performance directly across continents found slightly longer diagnostic delays for aortic dissection patients in North America, where 12 of 14 sites were U.S.-based.68 There were no clear differences based on the epoch in which studies were reported (2000 to 2010 versus 2011 to 2021), although comparisons were limited to just a few diseases based on data availability. The one study which explicitly assessed temporal trends for cardiovascular misdiagnosis in U.S.-based EDs (2006-2014, using Medicare data) found no significant trends for myocardial infarction or aortic dissection and a rising trend (increased false negative diagnostic errors) over time for ruptured aortic aneurysm, subarachnoid hemorrhage, and ischemic stroke.120 Insufficient data were available to assess the impact of ED clinician training on overall measured rates of diagnostic error or misdiagnosis-related harms. The impact of training background and clinical experience varied by study and disease, as reported in the sections analyzing KQ3.

Key Question 2a. On a per-visit or symptom-specific basis, what is the rate of diagnostic errors, misdiagnosis-related harms, and serious misdiagnosis-related harms?

Twenty-nine studies reported on per-visit or clinical presentation-specific rates of diagnostic error or harms.54, 56, 58, 72, 74, 75, 121143 There was significant methodological heterogeneity across studies in defining diagnostic errors, any harms, or serious harms, which made synthesis challenging. Most of the rates reported are underestimates, since few studies reported a systematic regional inquiry into returns to other hospitals or health systems, and hospital crossovers after ED misdiagnosis can occur in more than one third of cases.144, 145

Per-Visit Overall ED Diagnostic Error Rates

We use the term “overall” ED diagnostic error rates to refer to rates measured across presenting symptoms and clinical problems (as opposed to those that are symptom-, disease-, or discipline-specific). Although many studies reported on “diagnostic error” rates, they were mostly misdiagnosis-related harm rates, since they used an adverse event trigger to focus their search for errors. Only two studies addressed diagnostic error (as opposed to adverse events) systematically – one among ED patients who were discharged and the other among ED patients who were admitted to the hospital. These two studies are described below. In aggregate, the weighted average estimated ED diagnostic error rate is 5.7 percent (Moderate strength of evidence [SOE]).

We found just one study that systematically measured overall per-visit diagnostic error rates among patients discharged from the ED.137 This study (Nuñez, 2006) was based in a large, university hospital in Spain and began by using an adverse event trigger (72-hour unscheduled returns for the same chief complaint) to identify cases and assess diagnostic errors (which external, masked reviewers defined as a discrepancy between initial and final diagnoses).137 Study investigators then purposively sampled from the remaining visits (patients who did not return) to create a comparable population on factors likely to impact diagnostic error rates. Exclusion criteria were “age <14 years, obstetric/ gynecological emergencies, erroneous referral, voluntary withdrawal, and incomplete or unavailable data in the medical records at the hospital or health center.” Of 32,523 eligible patients during a four-month period in 2004, there were 250 unscheduled 72-hour returns; among these study investigators found a diagnostic error rate of 20 percent (Nuñez, 2006, Table 2, including footnote). The control group “consisted of 250 patients who did not return; these comprised the next consecutive patient after each case in an attempt to balance cases and controls with respect to the influence of the attendance team, patient census, day of the week, work shift, and other external factors.” Among the control group, the study investigators found a diagnostic error rate of 4 percent. Thus, diagnostic errors were 5-fold enriched among patients with 72-hour returns, but because the unscheduled return rate was very low at 0.8 percent of all visits (n=250/32,523 visits), the estimated total diagnostic error rate for the discharged ED population was very close to 4 percent. Authors did not report the admission fraction; however, given the small number of unscheduled returns (n=250), an admission fraction anywhere between 1 and 50 percent would produce a weighted average diagnostic error rate of 4.1 to 4.2 percent (with 4.1% being the value for a typical admission fraction of 10-15%). This likely represents a “floor” (minimum) rate estimate because diagnostic errors were based solely on chart review and not systematic patient follow-up. Methodologically, the control group schema was strong with respect to the risk of diagnostic errors in those with unscheduled returns versus those without, but the absolute error rate is of uncertain representativeness even for this individual ED. For example, if every diagnostic error was attributable to a single clinician who was intermittently on call, then the matched population would track that individual’s diagnostic error rate, rather than the average diagnostic error rate for the entire ED.

One high-quality prospective study was identified that examined overall diagnostic error rates and misdiagnosis-related mortality among patients admitted from the ED.7 The study was a prospective observational study of 755 consecutive ED patients at a university-affiliated tertiary care facility in Switzerland. They used the primary hospital discharge diagnosis as the reference standard for the final correct diagnosis. They used a rigorous and moderately reliable (kappa 0.54) process of classifying diagnostic differences that only counted clinically meaningful discrepancies for the main analysis of the primary outcomes (hospital length of stay and mortality). They found diagnostic differences in 42 percent of cases (n=319 of 755) and considered these meaningful discrepancies in 12 percent of cases (n=93 of 755). Although the authors demurred labelling these as errors (focusing on “error” as a process failure), these events meet the National Academy of Medicine (NAM) definition of a diagnostic error used in this report, regardless of whether an explicit, preventable failure occurred during the diagnostic process. Diagnostic errors were associated with longer hospital stay (mean 10.3 versus 6.9 days; Cohen’s d 0.47; 95% confidence interval 0.26 to 0.70; P = 0.002) and increased patient mortality (8.6% [n=8] versus 3.8% [n=25]); OR 2.40; 95% confidence interval 1.05 to 5.5 P = 0.038). Note that no post-hospital follow-up was performed, so the authors concluded that their estimates were likely minimum estimates (i.e., some additional diagnostic errors were presumably not captured by the inpatient team and therefore unaccounted for in the study results). The authors defend this approach well, but it is apparently more common than one might expect for the inpatient team to convert a correct ED diagnosis into an incorrect one, as found in one study that focused on the subset of patients with non-specific symptoms at higher risk for diagnostic error.62 Whether this is a “floor,” “ceiling,” or intermediate estimate therefore remains unknown.

Per-Visit Overall ED Misdiagnosis-Related Harm Rates

There were seven studies that assessed overall per-visit misdiagnosis-related harm rates, referred to in most of the studies as diagnosis-related adverse events, or similar terminology (Table 8).7, 24, 72, 131, 137, 141, 143 Only three of these studies were high-quality, prospective studies (Nuñez, 2006137; Calder, 2010131; Hautz, 20197) and just one included both those discharged and admitted from the ED, in addition to systematic patient follow-up (Calder, 2010).131 The prospective studies found adverse events and deaths at rates one to two orders of magnitude higher than those found in the various retrospective cohorts identified by revisit triggers. The retrospective studies are likely to represent substantial underestimates, given under-ascertainment as a consequence of design.

Table 8. Overall per-visit ED misdiagnosis-related harm rates.

Table 8

Overall per-visit ED misdiagnosis-related harm rates.

As noted above, we identified only one high-quality study that assessed overall diagnostic adverse event rates for both admitted and discharged patients with a prospective design using systematic follow-up (Calder, 2010).131 They enrolled adult patients (≥18 years of age) from high-acuity areas of the ED (Emergency Severity Index triage level 1-3) during random shifts at two university-affiliated hospitals in Canada in 2004. They excluded patients deemed incapable of informed consent (cognitive impairment or major psychiatric illness; critically ill or in distress) or unable to complete 2-week phone follow-up (non-English/French speaker, no telephone, or expected to be unavailable). Of 518 enrollees (369 treat-and-release ED visits and 134 hospital admissions), an impressive 97 percent had a follow-up assessment, with 2 patients withdrawing and 13 lost to follow-up (at equal rates among those discharged versus admitted). They looked for prespecified “flagged outcomes” including deaths, hospital complications, returns, healthcare visits, and new, worsening, or persistent symptoms. They found 22 percent of both discharged and admitted groups had flagged outcome events, which were then assessed via chart review. They found 43 of 107 flagged outcomes represented preventable adverse events and classified 10 of these as diagnostic in nature. Thus, the authors found 2.0 percent (n=10 of 503, 95% CI 1.0 to 3.6) of ED patients enrolled suffered preventable diagnostic adverse events. However, treatment errors pursuant to inaccurate diagnoses were considered management adverse events, rather than being counted as diagnostic adverse events; furthermore, events had to be deemed causally related and preventable with a certainty of at least 5 on a 6-point Likert scale by at least 2 of 3 reviewers. Also, this study was conducted at an academic hospital, and teaching hospitals are known to have lower diagnostic error rates for some conditions (see KQ3). Therefore, this represents a “floor” estimate. One of the 10 patients died of a delayed diagnosis of aortic dissection (rate 0.20%, 95% CI 0.005 to 1.1). Although the severity of the morbidity was not fully quantified, one additional patient was noted to have suffered “permanent disability” from a missed myocardial infarction (rate 0.20%, 95% CI 0.005 to 1.1).

There were four retrospective studies that reported overall per-visit harm rates. All but one (Heitmann, which used the longest revisit window) found much lower rates than the prospective study (Table 8).24, 72, 141, 143 These studies all used triggered chart reviews at single institutions, with the trigger being an ED revisit or short-term hospitalization (<72 hours to <30 days), and none used regional health information exchange or insurance claims-based follow-up to ascertain health events. This means that diagnostic errors were not generally counted towards the totals if they (a) were not discovered until after the time window; (b) did not prompt further care within the time window; (c) were discovered at an outpatient clinic visit, rather than via an ED revisit or hospitalization; (d) prompted care at another hospital or health system; or (e) resulted in an out-of-hospital death. Furthermore, all studies used chart review procedures that required reviewers to gauge whether care was “appropriate” or diagnostic errors “preventable,” further reducing the estimates. Such chart reviews are limited by the data recorded, which tend to be systematically incomplete and biased away from relevant details in cases where diagnostic errors have occurred.104, 145, 146 This group of studies systematically underestimates harms, and likely does so by a wide margin, given much higher rates in studies not limited by under-ascertainment.

The four trigger-based studies examining returns after ED discharge were each conducted at single institutions (total of 5 EDs, with ED annual visit volumes ranging from 15,000 to 100,000) (Table 8).24, 72, 141, 143 Trigger event time windows varied from 72 hours to 30 days, reducing direct comparability across studies. The all-cause return rates ranged from 2.0-2.9 percent at 72 hours, 4.4-6.8 percent at 7 days, and 11 percent at 30 days, suggesting a fairly comparable rate of overall ED returns across studies. However, these returns were at lower absolute rates than those reported using U.S. state-level data (7.5 percent at 72 hours and 22.4 percent at 30 days),147 perhaps suggesting an academic/teaching hospital bias in the reported studies.148 The proportion of ED returns attributed to diagnostic error varied from 0.6 to 14.2 percent, with a weighted mean of 1.0 percent (n=94 of 9,277). The overall rate of diagnostic adverse events (returns attributed to diagnostic error) per original ED visit varied over 100-fold across studies (i.e., across hospitals) from 0.01 percent at a large tertiary care ED in the United States to 1.6 percent at a small regional ED in Denmark, with a weighted mean of 0.022 percent (n=94 of 436,861). It was unclear the extent to which these reflected real differences between institutions as opposed to methodological differences in time windows, inclusions, or outcome definitions. Regardless, the rate of diagnostic adverse events in the one high-quality, prospective study (2.0%) is 92-fold higher than the weighted mean from the five retrospective studies (0.022%).

Misdiagnosis-related deaths per ED visit were reported in three of four retrospective studies,24, 72, 141 ranging from 0 to 0.007 percent, with a weighted mean of 0.0009 percent (n=4 of 436,173). On an institutional basis in these three studies (representing 4 EDs), each ED would see between 1 and 5 misdiagnosis-related deaths annually (based on their reported ED volumes). Since these studies conducted no systematic searches for out-of-hospital or out-of-hospital-network deaths and the single largest study (Aaronson, 2018, representing 94% of the patients synthesized) used 72-hour returns, rather than 7-day returns, this is, again, likely a substantial underestimate. The rate of misdiagnosis-related deaths in the one high-quality, prospective study (0.2 percent, n=1 of 503) is 217-fold higher than the weighted mean from the three retrospective studies (0.0009 percent). Although the rate of 0.2 percent is based on just a single death (so is imprecise, with a wide 95% CI 0.005 to 1.1), the value is the best estimate from this study and matches data from other sources. However, the confidence interval from the Calder study alone is implausibly wide. Based on data from other sources, we have assigned a +/− 2-fold plausible range to the 0.2 percent estimate (0.1% to 0.4%). This range bound comports well with other available data relevant to estimates of serious misdiagnosis-related mortality (details below in “Plausibility of Mortality Estimates from Higher Quality Studies”).

Plausibility of Mortality Estimates From Higher Quality Studies

U.S. data based on deaths post ED discharge from Medicare (where ascertainment of death is nearly complete) suggest that, at least for patients over age 65, the 7-day death rate among non-hospice patients treated and released with non-lethal ED diagnoses is 0.12 percent (n=12,375 of 10,093,678),148 equating to about 1 death per 833 ED treat-and-release visits. This value is 134-fold higher than what was found in the retrospective, trigger-based studies with incomplete ascertainment of deaths and just 1.6-fold off from the 0.2 percent measured in the one high-quality, prospective study that identified the one death among 503 patients (Calder, 2010). It is also a value that fits within the plausible range we have defined (0.1% to 0.4%).

We can compare this death rate to that found in the other high-quality prospective study, which examined only admitted patients (Hautz, 2019). Using a strong design, the increased mortality associated with diagnostic error was 4.8 percent (8.6% of cases [8 of 93 incorrectly diagnosed] minus 3.8% of controls [25 of 662 correctly diagnosed]). In the United States, ED admitted patients constitute 12.4 percent of ED visits (n=16.2 million of 130.0 million in 2018).13 Thus, if misdiagnosis-related deaths only occurred among admitted ED patients (not those discharged), the overall misdiagnosis-related mortality rate would be 0.07 percent. If the death rate among those discharged were the same as in Nuñez, 2006, the overall blended (weighted average) rate for all ED visits would be 0.19-0.29 percent. These values also fit within the plausible range we have defined (0.1% to 0.4%).

We can further assess the plausibility of a 0.10-0.40 percent death rate based on the proportion of total post-ED deaths it represents. The overall 30-day death rate after an ED visit is 3.0 percent for patients of any age group (from a population-based Danish study)149; this is likely a reasonable proxy for U.S.-based ED deaths, since the U.S.-based 30-day mortality rate is 4.6 percent among Medicare beneficiaries,150 and mortality is naturally expected to be higher among this older cohort that represents approximately 1 in 5 ED visits.151 If the misdiagnosis-related death rate is 0.10 to 0.40 percent and the overall death rate is 3.0 percent, then the proportion of deaths attributable to diagnostic error (misdiagnosis-related deaths) would be 3.3 to 13.3 percent. This range is quite plausible, given that a systematic review of misdiagnosis-related deaths estimated the combined Goldman Class I/II diagnostic error rate for an average, modern, U.S.-based hospital that autopsied 100% of its deaths would be 8.4% (95% CI 5.2-13.1). Death among hospitalized patients is often due to severe, untreatable diseases that were correctly diagnosed in the ED (in obviously sick individuals), while this is not likely to be the case for those who die unexpectedly after ED treat-and-release discharge. Thus, even though the likelihood of death is much higher among hospitalized patients than discharged patients, the proportion of deaths that are attributable to ED misdiagnosis among those who die after ED treat-and-release is expected to be higher than the proportion of deaths attributable to ED misdiagnosis among those who die during a post-ED hospitalization (see Role of Hospitalization and Discharge Fraction, below). The point estimate of 0.2 percent mortality corresponds to roughly 6.7 percent of deaths being attributed to diagnostic error, so, if anything, the 0.2 percent estimate may be slightly low.

Misdiagnosis-Related Permanent Disability Estimates

The rate of non-lethal yet serious misdiagnosis-related harms (i.e., permanent disability, rather than mortality) was not systematically reported in these particular studies. The Calder, 2010 study did not expressly quantify morbidity, but one patient (0.2%, 95% CI 0.005 to 1.1) “suffered permanent disability as a result of a missed inferior wall myocardial infarction.”131 The largest ED malpractice study in our review found that disabling outcomes (National Association of Insurance Commissioners scale score of 6-8, equivalent in severity to the loss of one arm or one eye [level 6], paraplegia or blindness [level 7] or quadriplegia or severe brain damage [level 8]) account for 41 percent (n=545/1,323) of high-severity harm outcomes; similarly, the largest incident report study found that disabling outcomes accounted for 29 percent (n=37/128) of high-severity harm outcomes. Thus, the number of serious harms is expected to be approximately 1.4- to 1.7-fold higher than the mortality rate. There are known differences in the relative proportions of disabling morbidity versus mortality by disease (e.g., aortic aneurysm and dissection 89% mortality and 11% permanent disability versus stroke 29% mortality and 71% permanent disability [Table 3]17), these findings indicate it is insufficient to monitor death alone to assess poor overall health outcomes from diagnostic error or prioritize diagnostic error problems for intervention. Among the top 15 diseases identified in KQ1, serious misdiagnosis-related harms are known to disproportionately represent disabling morbidity (rather than mortality) for several neurological conditions including spinal and intracranial abscess (82% disability versus 22% mortality), stroke (71% disability versus 29% mortality), and meningitis and encephalitis (48% disability versus 52% mortality).17 The same is likely true for other neurological conditions in the top 15 (e.g., spinal cord compression/injury and traumatic brain injury). Given that the organ system most often involved in diagnostic errors leading to serious harms is the nervous system (34%, Table 4), mortality alone will be a particularly poor health outcome proxy and will tend to substantially understate these individual diseases and total, serious misdiagnosis-related harms.

Role of Hospitalization and Discharge Fraction

Only one study assessing per-visit diagnostic harm rates reported on both treat-and-release (discharged from the ED) and hospitalized (admitted from the ED) fractions with respect to subsequent ED returns. The study was conducted at a 15,000 visit per year regional hospital in Denmark. Heitmann et al., 2016 found that 1.6 percent of ED discharges and 0.3 percent of patients admitted to a hospital ward via the ED returned within 30 days due to a diagnostic error, and almost all of these (in both subgroups) returned within 7 days.143 This likely indicates that hospital admission serves as a clinical safety net for patients who are initially misdiagnosed, and comports with U.S. Medicare data showing that EDs with very high discharge fractions (proportion of patients sent home on any given day) are more susceptible to diagnostic errors associated with short-term, unexpected patient deaths.148 This also comports with the findings from Hautz et al., 2019 in which 12.3 percent of patients admitted via the ED were found to have clinically important diagnostic discrepancies during their hospital stays.7

However, an unrelated Swiss study (Peng, 2015) of ED patients with non-specific symptoms who were admitted to a tertiary care hospital found 9 percent of ED diagnoses were corrected during the inpatient stay while, remarkably, 4 percent of ED diagnoses were converted from correct to incorrect diagnoses by the inpatient team. Diagnoses were assessed based on 30-day follow-up review of clinical records.62 While the overall impact of hospitalization was still to increase diagnostic accuracy over and above the initial admitting ED diagnosis, the high rate of conversion to an incorrect diagnosis could potentially cast doubt on whether inpatient diagnoses are always a good proxy reference standard for a final correct diagnosis. However, this particular population of patients was selected for a set of symptoms that predispose to diagnostic error, so it is probably not representative of the overall impact of inpatient care on diagnostic accuracy.

Differences in Estimation Based on Study Design

Prospective methods are likely to identify substantially more frequent diagnostic errors, diagnostic adverse events, and serious misdiagnosis-related harms than is possible using trigger-based retrospective chart review methods. Methodological reasons for this are detailed in the sections above. The strongest empiric evidence supporting these methodological contentions comes from a study group headed by the same lead author that published two non-overlapping studies using different methods (Calder, 2010; Calder, 2015) (see also Table 8).24, 131 Both studies were conducted at the same two university-based hospital EDs in Ottawa, Canada. The more recent study used a triggered chart review process based on 7-day ED returns and other indirect methods of case capture to find a 0.11 percent diagnostic adverse event rate and a 0.0074 percent misdiagnosis-attributable death rate among 13,495 ED visits. The earlier study used systematic ascertainment in a small, random sample of ED patients (n=503) to determine a 2.0 percent diagnostic adverse event rate (18-fold higher) and a 0.20 percent misdiagnosis-attributable death rate (27-fold higher). It is also important that, in both Calder studies, management errors pursuant to incorrect diagnoses were counted as management errors, rather than diagnostic ones, suggesting that even these latter figures are likely “floor” estimates. Similar evidence has been published previously with respect to missed fractures in trauma patients—when Enderson and colleagues changed the study design from retrospective to prospective, the incidence of missed traumatic fractures increased from 2 to 9 percent.152

Additional evidence comes from studies of patients who do not return for care, who are also at risk of diagnostic error, but go uncounted in most trigger-based studies. As described above, one trigger-based study (Nuñez, 2006) reported on a matched control population of patients who did not return to the ED.137 If the 250 sampled patients who did not return are representative of the broader ED population at that hospital, then 96 percent of all diagnostic errors occur in patients who do not return to the ED, and are therefore missed by trigger-based studies.

Finally, studies with insurance-based death ascertainment are likely to have much greater event capture than those based on revisits to the same hospital, because hospital crossovers are enriched among diagnostic error cases (37% of cases rather than 25%).145 One study with such a design found misdiagnosis-related death rates—0.12 percent (n=12,375 of 10,093,678)148—to be much closer to those seen in the high-quality, prospective study (0.2%). Taken in aggregate, these findings suggest that real-world per-visit diagnostic error and misdiagnosis-related harm rates are likely substantially higher than currently reported in much of the medical literature.

Per-Symptom ED Diagnostic Error and Harm Rates

Appendix Table B-2 shows included studies reporting on symptom-specific rates of diagnostic error. Six studies reported on rates of diagnostic error among polytrauma patients.121, 125, 127, 128, 130 Wilner et al. focused on pediatric patients and reported an 8 percent rate of delayed diagnosis of injury as well as a 0.3 percent rate of clinically significant delayed diagnoses.130 The remaining five studies focused on adult populations, had varying definitions of diagnostic delay, and reported delayed diagnosis rates ranging from 0.2 to 40.3 percent.

Kornblith et al. reported that 16.9 percent of patients ‘found down’ had a late-identified injury/medical diagnosis.123 Sun et al. reported a 4 percent rate of diagnostic delay among patients presenting with syncope/near-syncope.135

Royl et al. reported a 44 percent rate of diagnostic error for patients seen in the ED with dizziness and for whom a neurology consult was sought. The rate of harm ranged 5 to 6 percent for patients that had a primary diagnosis changed from a benign to a serious condition, and for patients that had one serious primary diagnosis replaced with another serious condition.129 Moeller et al reported a 17 percent discordance between the emergency clinicians’ diagnosis and the final diagnosis and a 19 percent discordance between ED trainees’ diagnosis and the final diagnosis among patients that received a neurology consult for any neurological complaint.134

Two studies reported on misdiagnosis rates among patients presenting with headache.122, 140 Miller et al. included adult and pediatric patients and reported a 1.7 percent rate of missed intracranial diagnoses.140 Dubosh et al. focused on adult populations and reported a 0.5 percent rate of serious misdiagnosis-related harms. Dubosh et al. also reported a 0.2 percent rate of serious misdiagnosis-related harms for adults presenting with atraumatic back pain.122

Four studies reported on misdiagnosis rates among patients presenting with abdominal pain.58, 126, 138, 139 Gallager and Osterwalder focused on adult populations. Gallager et al. reported a 14.1 percent rate of misdiagnosis among abdominal pain patients receiving morphine and 14.6 percent among patients not receiving morphine.58 Osterwalder et al. reported a 5.6 percent rate of misdiagnosis and 1.7% rate of patients requiring surgery.139 Saaristo et al. reported on adult and pediatric populations, and found the misdiagnosis rate to be 3.3 percent rate; 0.7 percent of the patients with abdominal pain required hospitalization, and 0.06 percent needed immediate surgery.138 Crosby et al. focused on pediatric patients and reported a misdiagnosis rate of 1 percent among surgeons, and of 0.3 percent among emergency medicine clinicians. Crosby also reported a 1.6 percent and 0 percent rate of misdiagnosis for testicular pain among surgeons and emergency clinicians, respectively, and equal rates of misdiagnosis for minor head trauma at 0.3 percent across the providers types.126 Freedman et al reported a 0.28 percent rate of misdiagnosis among pediatric patients with constipation.142

Two studies reported on misdiagnosis rates of adults presenting with dyspnea.54, 136 Ray et al. focused on older adults (65 years and older) and reported a misdiagnosis rate of 20 percent.136 Pirozzi reported on all adults and found the rate of misdiagnosis to be 5 percent when using point-of-care ultrasound, and 50 percent when not using point-of-care ultrasound (their definition of a misdiagnosis was a discordance between the initial and final ED diagnosis).54

One study reported on rates of misdiagnosis among adults presenting with ‘low-risk’ chest pain; they found a 0.5 percent rate of missed or delayed acute coronary syndrome among control patients, and 0% among intervention patients randomized for patient and clinician to receive print-out information on their acute coronary syndrome risk assessment.56

One study reported on rates of infection misdiagnosis among older adults (age 65 years and older); they found an 18.4 percent false discovery rate in the ED.124

Two studies reported on rates of misdiagnosis among patients receiving radiological imaging.132, 133 Chung et al. reported a 2 percent misdiagnosis rate for patients receiving torso imaging that were read by radiology residents during off-hours; 0.3 percent of the cases resulted in a change in management or call back to the ED, and no cases resulted in serious harm.132 Filippi et al. reported a 7.2 precent misdiagnosis rate of neurological magnetic resonance imagine (MRI) being read by radiology residents off-hours; 4.2 percent of the cases resulted in harm.133

Key Question 2b. On a per-disease/syndrome basis, what is the rate of diagnostic errors, misdiagnosis-related harms, and serious misdiagnosis-related harms?

When interpreting rates shown in the sections that follow, the meanings for these rates (technically, proportions, but more commonly referred to as “rates”) are as follows, using the exemplars of myocardial infarction, pneumonia, and appendicitis (data from Table 9):

  • False negative rate (1-sensitivity) of 1.5 percent means that patients who DO have myocardial infarction are missed (not promptly diagnosed) 1.5 percent of the time, which is nominally independent of the prevalence of myocardial infarction;
  • False omission rate (1-negative predictive value) of 0.2 percent means those said NOT to have myocardial infarction actually DO have myocardial infarction 0.2 percent of the time, which is dependent on the overall prevalence of myocardial infarction (i.e., for a given sensitivity, the false omission rate will be lower with lower disease prevalence);
  • False positive rate (1-specificity) of 24 percent means that patients who do NOT have pneumonia are misdiagnosed (called pneumonia) 24 percent of the time, which is nominally independent of the prevalence of pneumonia;
  • False discovery rate (1-positive predictive value) of 7 percent means those said TO have appendicitis actually do NOT have appendicitis 7 percent of the time, which is dependent on the overall prevalence of appendicitis (i.e., for a given specificity, the false discovery rate will be lower with higher disease prevalence).

The first two rates are related to false negatives, while the second two rates are related to false positives. The first and third (which are based on sensitivity and specificity, respectively) can be thought of as reflecting diagnostic accuracy “in principle.” The second and fourth (which are based on negative and positive predictive values, respectively) can be thought of as reflecting diagnostic accuracy “in practice.” False negative and false positive rates are more readily compared and aggregated across studies, because they are nominally153 prevalence independent. However, since prevalence of high-acuity illnesses such as myocardial infarction is likely to be relatively comparable across various EDs, the false omission and discovery rates are also likely to be reasonably compared and aggregated across studies with similar designs (inclusion criteria, diagnostic reference standards, outcome definitions, and outcome event ascertainment). More meaningful heterogeneity is expected for false omission and false discovery rates across settings with marked differences in disease prevalence (e.g., stroke in a pediatric versus adult ED).

A commonly used method for identifying rates of diagnostic adverse events was the Symptom-disease Pair Analysis of Diagnostic Error (SPADE) approach.145 SPADE is a clinically valid, methodologically sound, statically robust,154 and operationally viable155 method of identifying misdiagnosis-related harms from electronic health record or billing/administrative data, without the requirement of manual chart review (although chart review can inform root cause analysis if so desired). Most often the diagnostic adverse event examined is a subsequent short-term hospitalization for a dangerous disease, although mortality and other outcomes can also be assessed; sometimes an observed minus expected rate is calculated to account for the epidemiologic base rate of the disease in question. Because it relies on an adverse event, SPADE estimates more closely reflect the misdiagnosis-related harm rate and will generally identify substantially lower rates than the true diagnostic error rate (since only a subset of missed cases result in a short-term adverse events). SPADE can use either a look-back (case-control) or look-forward (cohort) architecture. The SPADE look-back approach (diseases to symptoms) works backwards from dangerous diseases (hospitalizations) to identify statistically anomalous (above baseline) patterns of antecedent symptomatic visits (ED treat-and-release with an incorrect, “benign” diagnosis). The look-back approach identifies specific symptoms or other clinical features (e.g., demographics) that increase risk for misdiagnosis, given the patient has the target disease; it also allows calculation of the false negative rate (and sensitivity) among those with the target disease. The SPADE look-forward approach (symptoms to diseases) works forwards from symptomatic ED visits with benign treat-and-release diagnoses to identify statistically anomalous (above baseline) patterns of subsequent hospitalizations for dangerous diseases. The look-forward approach, for a given symptom, identifies specific diseases that are misdiagnosed at excess rates, accounting for real-world prevalence; it permits calculation of the false omission rate (and negative predictive value) among those said not to have the target disease.

Overall, we identified 128 studies which addressed ED diagnostic error rates for 12 of the diseases prespecified in our study protocol. The number of studies was not distributed evenly by disease, with by far the most for stroke. There were many more studies of false negatives than false positives. The majority of false negative-related studies examined the initial ED false negative rate (1-sensitivity) among all patients hospitalized with a dangerous disease; almost all either conducted a detailed chart review to identify misdiagnoses or used a look-back SPADE approach for recent prior treat-and-release visits in large administrative databases. A few looked at the false omission rate (i.e., labelled as disease absent when it was present, calculated as 1-negative predictive value) among all patients discharged with a particular symptom, generally via look-forward SPADE approach, relying on a subsequent hospitalization or similar trigger. Almost all of the false positive-related studies looked at the false discovery rate (i.e., labelled as disease present when it was absent, calculated as 1-positive predictive value) in admitted patients, rather than the false positive rate (1-specificity) which would require data on all patients without the target disease (including those who were discharged from the hospital).

Variation in diagnostic error rates by disease were striking, with the lowest per-disease diagnostic error rates being for myocardial infarction (pooled false negative rate of 1.5%), and most of the remaining key dangerous diseases initially missed at rates of 10 to 36 percent (Table 9). There appears to be a roughly inverse relationship between annual disease incidence and diagnostic error rates, although myocardial infarction is clearly a low outlier in this regard (Figure 3). The highest per-disease diagnostic error/harm rates were almost certainly for spinal abscess (56% false negative rate, n=66 of 119), but per-disease error rates were derived from a single, high-quality study which was ultimately excluded from the final analysis because ED cases could not be separated from those missed in ambulatory care clinics, and the relative proportion seen in the ED remained unknown (despite successful outreach to study authors). The result is mentioned here because the findings were roughly comparable to those found in an older, fully ED-based study that found a 75 percent false negative rate (n=47 of 63).23 That study, which included cases from 1992 to 2002, was excluded from the systematic review because more than half of the cases were presumed to fall prior to the study period (2000 to 2021) and no subgroup analysis was provided describing the more recent cases included in the study. It is also relevant to the validity of this estimated rate that spinal abscess is a rare disease, with fewer than 20,000 cases per year in the United States; it would be difficult for such a rare condition to make the top 15 list of serious misdiagnosis-related harms in ED malpractice claims if errors were not frequent or subsequent serious misdiagnosis-related harms not the norm.

Effects of diagnostic error on health outcomes, as reported, were mixed, including some studies that identified null effects or even paradoxically “protective” effects of misdiagnosis156 after failing to adequately case mix adjust based on initial severity of illness. Nevertheless, increases in misdiagnosis-related mortality were synthesized for aortic dissection (21% relative increase) and reported in individual studies for stroke (ischemic stroke and subarachnoid hemorrhage), venous thromboembolism, and arterial thromboembolism (mesenteric ischemia).

Table 9. Summary of per-disease diagnostic error rates.

Table 9

Summary of per-disease diagnostic error rates.

Figure 3 shows nine of the top fifteen conditions causing serious misdiagnosis-related harms in the emergency department from KQ1. These conditions in order of annual U.S. incidence are: pneumonia, sepsis, myocardial infarction, stroke, venous thromboembolism, arterial thromboembolism, aortic aneurysm and dissection, meningitis and encephalitis, and spinal abscess. The false negative rate varies considerably by disease. The lowest false negative rate is for myocardial infarction (less than 10%) and the highest false negative rate is for spinal abscess (over 50%).

Figure 3

Relationship between annual U.S. incident cases of disease and estimated ED miss rate. ED = emergency department; KQ = Key Question Shown are 9 of the top 15 diseases associated with serious misdiagnosis-related harms in the ED from KQ1. The other six (more...)

Stroke

We identified 50 studies (28 of these U.S.-based) that reported on the rate of diagnostic errors and/or misdiagnosis-related harms among over 1.9 million patients with cerebrovascular events.55, 64, 66, 69, 85, 87, 88, 120, 122, 144, 159198 Studies varied significantly in the methodological approach, definitions to assess diagnostic errors, target populations, and inclusion/exclusion criteria. Most of the studies had a low risk of bias. However, 22 studies had an unclear or high risk of bias in terms of patient selection,177, 179, 182, 194 the reference standard,165, 167169, 177179, 181, 182, 186, 188, 195, 198 or the patient flow.160, 161, 163, 176, 182, 189, 195, 196

Stroke: False Negatives

Twenty-three studies reported on the false negative rate (1-sensitivity) for stroke.64, 66, 87, 88, 120, 122, 159, 160, 169, 171, 173175, 177181, 183, 186, 194196 Fourteen of these were sufficiently comparable to conduct a meta-analysis (Figure 4).66, 85, 87, 88, 159, 160, 171, 178, 179, 181, 183, 186, 194, 196 After contacting the authors, two of these were largely overlapping (Morgenstern, 2004 and Kerber, 2006 [dizziness subgroup]), so we excluded Kerber, 2006 from this meta-analysis. The pooled false negative rate was 15 percent (95% CI 9 to 23; I-squared 99%), with no clinically meaningful or statistically significant heterogeneity based on whether the study included only ischemic stroke, focused on subarachnoid hemorrhage, or had a mixed population that included ischemic strokes and intracranial hemorrhages (high SOE for false negative rate). The highest estimate (false negative rate 40%, 95% CI 38 to 42) was from a large U.S.-based study of patients (n=2303) admitted from the ED with non-stroke diagnoses who were discharged from the hospital with strokes of mixed subtypes, including transient ischemic attack (Chompoopong, 2017).85 Authors acknowledged the limitation that some cases may have involved strokes occurring during hospitalization (i.e., not present at the time of admission). The lowest estimate (false negative rate 2%, 95% CI 1 to 3) was from a large Swiss study of patients (n=2200) of only ischemic strokes derived only from patients admitted to the stroke unit or intensive care units (Richoz, 2015).186 Focusing on strokes admitted to stroke or intensive care tends to inflate diagnostic accuracy and reduce estimates of diagnostic error. Authors acknowledged the limitation that their estimate may have been low because some strokes may never have been detected; their methods note that MRI was not routinely performed—“Systematic diffusion-weighted MRI is not performed in all patients with new neurologic disease in our ED.” In a severity-adjusted analysis, they found worse outcomes and greater mortality.

We further analyzed false negatives excluding any studies with strong case selection filters likely to bias estimates away from the true overall cerebrovascular false negative rate (Figure 5). For this analysis, we excluded 2 studies selecting on case features that confer higher illness severity, which tends to bias towards lower error rates (Richoz, 2015 [stroke unit/intensive care unit admissions]186; Pihlasviita, 2018 [stroke code activations for possible thrombolysis]183). We also excluded 5 studies selecting on case features known to increase false negative risk—3 studies selecting only for younger stroke patients ages 16-50 (Kuruvilla, 2011171; Mohamed, 2013178; Bhattacharya, 2013160) and 2 studies selecting on case features linked to posterior circulation stroke (Kerber, 2006 [dizziness]168; Calic, 2016 [cerebellar stroke location]87). The resulting false negative rate point estimate (17%, 95% CI 9 to 27) was slightly higher than the overall point estimate prior to removing these potentially biased studies (15%, 95% CI 9 to 23), but each point estimate fell well within the other’s 95 percent confidence interval. The 17 percent estimate shown in Figure 5 (low selection bias) is more likely to be representative of the real-world ED rate. Because most studies did not capture missed strokes among ED treat-and-release patients or account for missed TIAs (which have higher error rates), this estimate is likely conservative.

Most of the studies did not compare diagnostic accuracy for TIA to that for acute ischemic stroke, but Whiteley, 2011 provided data that permitted such a calculation. Their results suggest that TIAs are more often missed than ischemic strokes (false negative rate 37.8% [n=14/37] for TIA versus approximately 20.8% [n=41/197, assuming equal proportion of hemorrhages among missed cases as overall], p=0.025). However, Morgenstern found that TIA did not predict greater odds of a false negative (odds ratio [OR] 1.02, 95% CI 0.71 to 1.46).

Figure 4 displays a forest plot of the false negative rate of diagnosing stroke in the emergency department by stroke subtype (ischemic stroke, subarachnoid hemorrhage, and mixed subtypes). Nine studies reported on the false negative rate for ischemic stroke. The pooled false negative rate was 14% (95% CI, 08% to 22%). One study reported a false negative rate of diagnosing subarachnoid hemorrhage of 12% (95% CI, 9% to 17%). Four studies reported on the false negative rate for mixed subtypes of stroke. The pooled false negative rate is 18% (95% CI, 4% to 39%). Overall, the pooled false negative rate of diagnosing any type of stroke is 15% (95% CI, 9% to 23%).

Figure 4

False negative rate for stroke in the emergency department by stroke subtype. CI = confidence interval; ES = effect summary (false negative rate); FN = number of false negatives; SAH = subarachnoid hemorrhage; TP = number of true positives; U.S. = United (more...)

Figure 5 diplays a forest plot of the false negative rate of diagnosing stroke among studies with low selection bias. The pooled false negative rate of diagnsosing ischemic stroke among the four studies with low selection bias was 15% (95% CI, 11% to 20%). One study reported a false negative rate of diagnosing subarachnoid hemorrhage of 12% (95% CI, 9% to 17%). The pooled false negative rate of diagnosing mixed subtypes of stroke among the three studies with low selection bias was 23% (95% CI, 4% to 50%). Overall, the pooled false negative rate of diagnosing any type of stroke among the studies with low selection bias was 17% (95% CI, 9% to 27%).

Figure 5

False negative rate for stroke among studies with low selection bias, by stroke subtype. CI = confidence interval; ES = effect summary (false negative rate); FN = number of false negatives; SAH = subarachnoid hemorrhage; TP = number of true positives; (more...)

Stroke False Negatives: Younger Patients

Three studies included younger adult populations of stroke cases (16 to 50 years) and reported on missed cerebrovascular accidents. The pooled false negative rate of cerebrovascular accidents was 14 percent (95% CI 10 to 19, I-squared 0%).160, 171, 178 All three studies found higher rates of misdiagnosis among younger patients (either <35 or <40 years of age) within their already “young stroke” cohorts. Another study investigated delayed diagnosis of stroke specifically among children <18 years of age and reported that 65 percent of cases were diagnosed ≥6 hours after hospital arrival and 23 percent were diagnosed after 24 hours.177 So, although the measured rate of 14 percent for patients aged 16 to 50 is nominally lower than the overall false negative rate for stroke of 17 percent obtained from other studies, this may be artifactual and related to methods or other inter-study differences.

A study using SPADE methods (which assesses misdiagnosis-related harms, rather than diagnostic error rates, per se, since detection is based on diagnostic adverse/unexpected events), found that patients 18 to 44 years of age were 6.7-fold more likely to suffer a missed opportunity antecedent to a stroke hospitalization than their older counterparts ages 75 and above (3.98% vs. 0.59% with P < 0.001 for differences across age groups).64 The same study reported (in its supplemental “Appendix 2”) limited details on those under age 18, but, compared with those 18 and over, the odds of a misdiagnosis appeared to be greater. Specifically, the observed to expected ratio for antecedent ED treat-and-release visits for headache prior to a stroke hospitalization were 1.9-fold enriched for adults and 11.0-fold enriched for children.

Included studies did not permit a meta-analytic assessment of the overall rate of stroke misdiagnosis in pediatric populations, but available studies do seem to suggest that younger age is a strong risk factor for diagnostic error and associated adverse events, with the youngest patients (who have the lowest overall risk of stroke) having the highest risk of being missed.

Stroke False Negative: Special Stroke Subtypes (Subarachnoid Hemorrhage)

Three studies reported on false negatives in patients with subarachnoid hemorrhage. A prospective cohort study (n=401) from Western Europe reported 26 percent missed subarachnoid hemorrhage diagnosis, although the cohort included cases misdiagnosed outside of the ED, and the false negative rate in the ED was lower (12%) than the aggregate rate.181 When adjusted for initial case severity (i.e., restricting to patients with mild initial clinical presentations [Hunt and Hess grade 1 or 2], who comprised 59% of all cases), misdiagnosed patients had a 3.89-fold increased odds (95% confidence interval 1.9 to 8.0) of a poor clinical outcome. Two studies used look-back SPADE-style methods to assess diagnostic adverse events (i.e., the subset of false negative cases requiring hospitalization after an initial misdiagnosis). One retrospective cohort study reported 3.5 percent missed cases (observed minus expected ED visits within the last 45 days for patients ultimately hospitalized with subarachnoid hemorrhage) using Medicare data.120 Another study reported a 5.4 percent miss rate for subarachnoid hemorrhage based on retrospective data from ED visits in the 14 days prior to hospital admission but, importantly, demonstrated wide variability across institutions (false negative rates ranged from 0-100% across 147 EDs); they found a paradoxical “misdiagnosis is protective” association between missed diagnosis and better health outcomes using crude (unadjusted) 30-day mortality, but were not able to adjust for initial case severity due to the lack of clinical details, leaving unanswered the question of whether earlier diagnosis may have actually improved outcomes in this cohort.195

Stroke False Negative: Special Stroke Subtypes (Dissection, Cerebral Venous Thrombosis)

One study reported that 3.1 percent of patients with cervicocephalic dissection were treated-and-released from the ED in the prior 14 days with related symptoms.175 Two studies included cases with cerebral venous thrombosis: one reported 3.6 percent misdiagnosis. They found longer length of hospital stay among misdiagnosed cases, but no unfavorable outcome (again, unadjusted for initial case severity). They also did a chart review on a smaller group of patients with cerebral venous thrombosis and found 6 percent missed diagnosis.174 The other study reported 20.8 percent diagnostic error rate among cerebral venous thrombosis cases, using the “Safer Dx” Instrument.173 Without adjusting for initial case severity, they found worse health outcomes among cases without a diagnostic error compared to those with a diagnostic error (28.6 versus 0%, P = 0.05),173 again reflecting the apparent “misdiagnosis is protective paradox.” One study of stroke in polytrauma patients found that 11 of 192 patients (5.7%) had acute ischemic strokes, all of which were initially missed (and no neurologic consultations were obtained initially, despite neurologic findings being noted in four cases); the underlying cause for acute ischemic stroke was discovered by neurovascular imaging to be craniocervical dissection in six cases (two carotid artery, four vertebral artery); median time to diagnosis was 2 days (range 0 to 5).198

Stroke False Negatives: Symptom—Specific Populations (Dizziness and Headaches)

Stroke patients presenting with dizziness or headache symptoms are prone to be missed. Dizziness increases the odds of misdiagnosis 14-fold over motor symptoms, and those with dizziness and vertigo are missed initially in an estimated 40 percent of cases.21 A large, population-based stroke surveillance program in Texas used ED chart review by neurologists (including hospitalization and imaging results) to validate stroke diagnoses, demonstrating that 46 out of 1629 (2.8%) of those with a presenting complaint of dizziness were strokes and 16 (35%) were misdiagnosed in the ED.168 The same study found that only 5 of 15 cases with isolated dizziness admitted as stroke TIA from the ED were validated as stroke (i.e., false discovery rate of 67%). A second study from the same Texas cohort followed ED patients with dizziness who initially received a non-stroke diagnosis for a median period of 347 days, reporting a stroke incidence rate of 13.2 per 1000 person years (1.32%); this study found that most of that risk occurred in the first 48 hours after ED treat-and-release.169 A separate U.S.-based study found that stroke hospitalizations were enriched in the 30 days following an ED dizziness discharge, with a false omission rate of 0.3 percent for these diagnostic adverse events; the 180-day cumulative incidence of a major vascular event or death was 0.93 percent.170

Isolated ED headaches appear to be a risk factor for misdiagnosis of both ischemic stroke and intracranial hemorrhage (both intracerebral and subarachnoid).64 A U.S.-based study found that among cases with an ED visit because of headache, 0.3 percent had ischemic stroke and 0.4 percent had any cerebrovascular disease within 1 year.176 Similarly, a single-site study in the United States with regional follow-up showed that 0.6 percent of patients discharged with a benign headache diagnosis from ED had subsequent cerebrovascular disease hospitalization within 1 year.144 Another U.S.-based study using state-level data found that stroke cases were enriched in the 30 days following an ED headache visit, with a false omission rate of 0.2 percent (n=4,253 of 2,101,081 headache discharges) (high SOE for false omission rate).81

Stroke False Negatives: Impact on Care and Outcomes

Multiple studies demonstrated cases who missed acute stroke interventions because of an initial missed or delayed diagnosis, including younger patients.88, 159, 171, 183, 194 In one large study of 2,027 confirmed acute ischemic strokes, 1.1 percent of misdiagnosed cases did not receive tissue plasminogen activator, despite being eligible; as expected, the number of cases eligible for tissue plasminogen activator was smaller in the misdiagnosed group than the correctly diagnosed group.88 However, another study reported that in 22 percent of misdiagnosed ischemic stroke cases, the error resulted in missed or delayed tissue plasminogen activator administration.159 This rate was slightly higher in community hospitals compared to academic centers, although the difference was not statistically significant. With regards to possible harms, misdiagnosed cases were readmitted within the next 60 days almost twice as often as correctly diagnosed cases.159

Using Medicare data and a SPADE-style look-back approach (reflecting diagnostic adverse events), the false negative rate antecedent to stroke hospitalization (defined as observed minus expected prior ED visits) was reported to be 4.1 percent (95% CI 4 to 4.2) within the last 45 days and 3.7 percent (95% CI 3.7 to 3.8) within the last 30 days.120 These cases reflect potential missed opportunities to have prevented major stroke after minor stroke or TIA.

Four studies, one from Australia and three from Western Europe, reported on missed ischemic and hemorrhagic strokes among 5,130 patients and analyzed stroke functional outcomes or mortality. The pooled rate of missed ischemic and hemorrhagic strokes was 7 percent (95% CI 3 to 14, I-squared 98%).181, 183, 186, 194 Using chart review (and an analysis unadjusted for initial severity), an Australian study found no association between missed diagnosis and worse outcome, but a subgroup of misdiagnosed cases who were admitted under non-neuro service showed worse functional outcomes (Modified Rankin Scale score ≥3, 80% versus 41%, P < 0.0001) and greater in-hospital mortality (15% versus 4%, P = 0.002) when compared to those admitted under the neurology service that was robust to adjustment for initial stroke severity.194 Using chart review (and an analysis unadjusted for initial severity), a study from Finland found that 0.8% of misdiagnosed cases could have possible or likely worsened outcome, but no deaths were attributed to misdiagnosis.183 A Swiss registry-based study with prospective data collection (Richoz, 2015) found in a multivariate, adjusted analysis (which included initial stroke severity) that favorable outcomes were 4.8-fold less likely and mortality 4.3-fold more likely among those with misdiagnosed acute ischemic strokes.186 As noted in the subarachnoid hemorrhage section above, when adjusted for initial case severity (limiting to Hunt and Hess Grade 1 or 2, n=236), misdiagnosed subarachnoid hemorrhage patients with an initially mild clinical presentation had a 3.89-fold increased odds (95% CI 1.9 to 8.0) of a poor clinical outcome.181

Stroke: False Positives

Nineteen studies reported on the false discovery rate (1-positive predictive value) for strokes.69, 161, 164168, 179, 180, 182, 184, 185, 187, 190193, 196, 197 After contacting the authors, two of these were largely overlapping (Morgenstern, 2004 and Kerber, 2006 [dizziness subgroup]), so we excluded Kerber, 2006 from this meta-analysis. We have analyzed these by stroke type (Figure 6). The pooled false discovery rate was 21 percent (95% CI 14 to 29), but there was clinically meaningful and statistically significant heterogeneity based on stroke subtype. The most obvious difference was that TIAs were falsely positive at a much higher rate (49%, 95% CI 33 to 64) than ischemic strokes (10%, 95% CI 6 to 16) or brain hemorrhages (10%, 95% CI 7 to 12) (high SOE for false discovery rate). These differences are not surprising, since it is much more challenging to correctly diagnose TIA than completed stroke. The small differences between ischemic stroke, subarachnoid hemorrhage, and intracranial hemorrhage (Holland, 2015 in Figure 6) may be explained by frequent use of computed tomography (CT) scans (often obtained in the ED for neurological symptoms), which are substantially more sensitive for detection of hemorrhages than ischemic strokes.199 Patients treated with thrombolysis using tissue plasminogen activator were false positives less frequently (erroneous treatment 3.9%, n=13/331).183

A single U.S.-based study reported on stroke false discovery rate in the pediatric population by investigating stroke alerts (activated when patient presents with symptoms or signs suggestive of stroke or TIA, prompting rapid neurology stroke evaluation). They found that in 74.2 percent of pediatric stroke alerts, the correct diagnosis was not stroke or TIA. Given the design of this study, the high rates are unsurprising, since stroke alert calls are similar to requesting a neurology consultation for suspected stroke, rather than assigning a diagnosis, per se.172 Nevertheless, this high rate could also potentially reflect (a) a lower threshold for ordering a stroke consultation among children with neurological symptoms, (b) generally low stroke prevalence among children, or (c) increased probability of a false positive misdiagnosis.

As with false negatives, false positives appear to be disproportionately common among patients presenting to the ED with dizziness and vertigo—one included study from Western Europe found 31 percent of benign ear causes were initially misdiagnosed as stroke.129

Figure 6 displays a forest plot of the false discovery rate of diagnosing stroke in the emergency department by stroke subtype (intracranial hemorrhage, subarachnoid hemorrhage, stroke, transient ischemic attack, and mixed subtypes). One study reported on the false discovery rate of intracranial hemorrhage (pooled false discovery rate, 7%; 95% CI, 4% to 10%). One study reported a false discovery rate of subarachnoid hemorrhage of 12% (95% CI, 9% to 16%). The pooled false discovery rate of stroke from 9 studies was 10% (95% CI, 6% to 16%). The pooled false discovery rate for transient ischemic attack from 5 studies was 49% (95% CI, 33% to 64%). The pooled false discovery rate of diagnosing mixed subtypes of stroke from 4 studies was 21% (95% CI, 13% to 31%). Overall, the pooled false discovery rate for diagnosing any type of stroke was 21% (95% CI, 14% to 29%).

Figure 6

False discovery rate (referred/admitted) for stroke in the emergency department by stroke subtype. CI = confidence interval; ES = effect summary (false discovery rate); FP = number of false positives; ICH = intracranial hemorrhage; SAH = subarachnoid (more...)

Stroke Misdiagnosis: Imaging-Focused Studies

We identified five studies that focused heavily on imaging aspects of stroke, including studies of imaging timeliness, radiology accuracy, and the relationship between use of CT and the likelihood of misdiagnosis. A U.S.-based study reported that only 11.5 percent of patients with suspected stroke received a head CT scan within 25 minutes and the remainder (88.5%) received delayed imaging workup.188 Two studies focused on radiology accuracy for accuracy of vascular imaging reads. One reported missed intracranial aneurysm diagnosis in initial radiology resident reads in 13 percent of cases with subarachnoid hemorrhage caused by intracranial aneurysms.55 The other reported 20 percent of large vessel occlusions are missed on initial radiology read among cases with ischemic stroke caused by large vessel occlusion.162 They found that radiologists not subspecializing in neuroradiology were more likely to miss large vessel occlusions compared to neuroradiologists (OR 5.6; 95% CI 1.1 to 29.9; P = 0.04).

Two other studies focused on the link between head CT scan use and the likelihood of a missed stroke diagnosis. Both found ED treat-and-release visits resulting in non-cerebrovascular diagnoses were more likely to be followed by a stroke hospitalization after negative CT scans than among patients without CT scans. A SPADE-style regional study in Canada looked into subsequent strokes among a group of patients who had been discharged from ED with a peripheral vertigo diagnosis and had undergone head CT in that visit. They found that the frequency of stroke occurrence within 30, 90, and 365 days was 0.29 percent, 0.41 percent, and 0.60 percent, respectively. These rates were all higher versus a propensity-score matched control group who had not undergone head CT during their ED visit (0.15%, 0.20%, and 0.36%, respectively) (OR 2.27 for likelihood of 30-day stroke hospitalization [95% CI, 1.12–4.62]).163 A U.S.-based study assessed the risk of future stroke among older patients (60 to 89 years of age) discharged from the ED who had neurological symptoms but were not given a diagnosis of stroke or TIA. They divided these patients into four groups based on presence of symptoms suggestive of stroke or TIA and whether head CT was performed in the ED. The groups were symptom absent/CT absent, symptom present/CT absent, symptom absent/CT present, and symptom present/CT present. The 1-year risk of stroke occurrence was highest in the symptom present/CT present group (2.54%), compared with symptom absent/CT present (1.09%), symptom present/CT absent (0.69%), and symptom absent/CT absent (0.54%) groups. Additionally, the symptom present/CT present group also had a higher risk of stroke occurrence within the shorter 30- and 90-day periods, when compared to other groups.189 These studies suggest that ED clinicians may be accurately risk-stratifying patients at higher risk for stroke, but may then be falsely reassured that a negative head CT scan has “ruled out” ischemic stroke.

Stroke: Summary

There is a large body of evidence on diagnostic accuracy for stroke in the ED. Results are heterogeneous, but generally in predictable ways. False negatives are much more common than with other similarly prevalent diseases (see Myocardial Infarction, below). The overall measured false negative diagnostic error rate is 17 percent, with errors being most frequent for TIAs, next for acute ischemic strokes, and last for intracranial hemorrhages. Error rates are strongly influenced by presenting clinical symptoms, with “typical” unilateral motor and sensory symptoms or signs being protective against error and “atypical” or otherwise non-specific symptoms (e.g., dizziness or headache) substantially increasing risk of false negatives. As a result, patients with posterior circulation strokes are much more likely to be missed, as are those with lower stroke scale severity scores. Similarly, the degree of diagnostic difficulty (and resulting error rate) is increased when patients are themselves “atypical” (especially those under age 40 or without vascular risk factors) or there are distracting case features (e.g., polytrauma198). CT scans appear to provide false reassurance that ischemic strokes have been “ruled out,” increasing the risks of false negative diagnostic errors. Functional outcomes and mortality are worse among those misdiagnosed when patients of similar stroke severity are considered, but this effect is typically masked or even reversed (“misdiagnosis is protective paradox”) when cases are left unadjusted for initial stroke severity. Poor outcomes are 4- to 5-fold more frequent when lower severity strokes are initially missed. The overall measured false positive diagnostic error rate is 21 percent, with these errors also most frequent for TIAs, next for acute ischemic strokes, and last for intracranial hemorrhages. False positives are also probably more common among those with atypical or non-specific symptoms (e.g., dizziness) and among younger patients.

Myocardial Infarction

We identified 15 studies that reported on the rate of diagnostic errors among 869,711 patients presenting with myocardial infarction to the ED.25, 63, 65, 77, 120, 200209 The risk of bias of included studies was generally low.

Myocardial Infarction: False Negatives

Six studies25, 63, 77, 120, 202, 206 assessed diagnostic false negative rates for myocardial infarction in routine ED practice, five of which used variations of the SPADE method, look-back approach, based on recent ED treat-and-release visits antecedent to a hospitalization for confirmed myocardial infarction.145 All were based on either regional or insurance-based capture of both hospitalizations and antecedent ED visits. A meta-analysis was conducted to synthesize those five similarly designed studies25, 63, 77, 120, 202 and estimated a false negative rate of 1.5 percent (95% CI 1.0 to 2.2; I-squared 99.7%; Figure 7; high SOE for false negative rate). These studies indicate that very few patients who are ultimately hospitalized with myocardial infarction are discharged from the ED in the 7 to 30 days prior. They do not address patients whose myocardial infarctions may have been mild or silent, never requiring hospitalization. Thus, these studies more closely reflect misdiagnosis-related harm rates than diagnostic error rates, per se.

We can assess the relationship between the measured harm rates and false negative diagnostic error rates from a large (n=10,689), U.S.-based prospective, randomized trial with 99 percent follow-up of patients, which did not meet our entry criteria because it was conducted in 1993.22 That study (Pope, 2000), which used serial measurement of creatinine kinase myocardial band (CK-MB) as the biomarker, found the missed myocardial infarction rate was 2.1 percent (95% CI 1.1 to 3.1). Even in the oldest study included in our meta-analysis (Schull, 2006, patients 2002-2003), troponin tests were available around-the-clock at 55 percent of 153 EDs responding to a study survey (survey response rate 89.5%, n=153 of 171).25 Given the advances in diagnostic testing for myocardial infarction between the time of the Pope et al. study and the studies included in our meta-analysis, it would be expected that missed myocardial infarction rates in the ED would have fallen (i.e., would be below the 2.1% rate identified in the 1993 randomized trial). Accordingly, a measured misdiagnosis-related harm rate of 1.5 percent in our meta-analysis is probably quite close to the false negative diagnostic error rate for myocardial infarction, at least in absolute terms. If the true myocardial infarction false negative diagnostic error rate is 2 percent, then even though the error rate is 25 percent higher in relative terms, the absolute difference is just 0.4 percent. Thus, diagnostic error and harm rates for myocardial infarction appear to be low enough that the gap between the two values is at the level of rounding error.

Figure 7 displays a forest plot of the false negative rate for myocardial infarction in the emergency department. The pooled false negative rate from five studies was 1.5% (95% CI, 1% to 2.2%).

Figure 7

False negative rate (initially discharged) for myocardial infarction in the emergency department. CI = confidence interval; ES = effect summary (false negative rate); FN = number of false negatives; TP = number of true positives; U.S. = United States (more...)

The sixth and final study (Graff, 2006) also began with a cohort of patients hospitalized for myocardial infarction, but addressed patients admitted, rather than discharged, from the ED. They found that 25.6 percent of myocardial infarction patients were admitted with a non-acute coronary syndrome diagnosis.206 These were mostly other cardiac diagnoses (17.6%) with respiratory diagnoses next (4.7%) and a small proportion of all other diagnoses (3.3%). Patients admitted with non-specific chest pain or coronary artery disease diagnoses were classified with more specific acute coronary syndrome admitting diagnoses (acute myocardial infarction and unstable angina) in the three-fourths of patients who were “correctly” diagnosed on admission (i.e., ED admitting diagnoses were underspecified). The authors found that the non-acute coronary syndrome admitting diagnoses (“diagnostic delay”) were associated with substantially lower quality care (substantially fewer evidence-based therapies applied, including 17% as opposed to 39% undergoing cardiac catheterization) than their counterparts with myocardial infarction who had no delay in diagnosis. Taken together with the meta-analytic results shown in Figure 7, this suggests that EDs are only rarely “missing” heart attacks outright, but diagnostic delays among admitted patients are perhaps substantially more frequent.

One SPADE look-forward study reported that among 325,088 patients who were discharged from the ED with a diagnosis of chest pain or dyspnea, 508 (0.2%) returned to the hospital and were diagnosed with a myocardial infarction (high SOE for false omission rate).77 In one additional prospective study of 1114 patients admitted to three academic ED chest pain units, 991 were discharged after a negative chest pain work up and 0.4 percent developed acute coronary syndrome within 45 days.210 Finally, a Canadian population-based study looked at 498,291 patients aged 40 years old or older who presented to an ED with chest pain and were discharged after assessment. Overall, 0.7 percent of patients were hospitalized within 30 days for myocardial infarction or unstable angina and 0.2 percent died. This study also demonstrated that higher ED volume was associated with significantly lower adjusted OR for mortality or acute coronary syndrome at 30 days.211

Myocardial Infarction: False Positives

Three studies assessed the diagnostic false discovery rate for myocardial infarction in routine ED practice.201, 207, 209 All three focused on false positive ST-elevation myocardial infarction, using patients referred for immediate cardiac catheterization who were determined not to be having ST-elevation myocardial infarction (STEMI). A meta-analysis produced a false discovery rate of 14 percent (95% CI 7 to 22; I-squared 95%; Figure 8; low SOE for false discovery rate) based on those three studies. No evidence on heterogeneity due to country, recruiting period, or clinician training could be detected due to the small number of included studies.

Figure 8 displays a forest plot of the false discovery rate for acute ST-elevation myocardial infarction in the emergency department. The pooled false discovery rate from three studies was 14% (95% CI, 7% to 23%).

Figure 8

False discovery rate (cardiac catheterization) for acute STEMI in the emergency department. CI = confidence interval; ES = effect summary (false discovery rate); FP = number of false positives; STEMI = ST-elevation myocardial infarction; TP = number of (more...)

Myocardial Infarction: Other Studies

Three studies assessed the diagnostic accuracy of specific symptoms for myocardial infarction, such as chest pain and atypical symptoms.65, 200, 203 One study assessed the diagnostic accuracy of 80-lead electrocardiogram (ECG).205 Two studies assessed diagnostic delay including the door-to-balloon time and door-to-reperfusion time.204, 208

Myocardial Infarction: Summary

In general, the evidence on diagnostic accuracy for myocardial infarction is limited but fairly homogeneous. Large studies consistently show that just 1 to 2 percent of patients hospitalized for myocardial infarction were recently treated and released from the ED. Diagnostic accuracy for myocardial infarction patients admitted but not initially characterized as having an acute coronary syndrome is lower, with delays in diagnosis in up to one fourth of cases that potentially contribute to lower-quality care based on evidence-based guidelines. The false discovery rate for STEMI is 14 percent among patients referred for immediate cardiac catheterization, but the number of studies was small and their results heterogeneous.

Aortic Aneurysm and Dissection

Twelve studies with an unclear- or low-risk of bias reported on the rate of misdiagnosis among at least 37,638 patients with aortic aneurysm or dissection.68, 73, 89, 120, 212219 Two studies likely had overlapping study populations as they both included patients with ruptured abdominal aortic aneurysm from the same region of Sweden during similar time periods216, 219; we included the most recent study in the analysis.219

Aortic Aneurysm and Dissection: False Negatives

We pooled eight studies that reported on missed or delayed diagnoses among patients with ruptured abdominal aortic aneurysm89, 213, 219 or acute aortic dissection68, 73, 89, 214, 218, 219 (n=1,799). The estimated false negative rate was 36 percent (95% CI 21 to 52; I-squared 98%; Figure 9; moderate SOE for false negative rate). Studies differed in their definitions of missed or delayed diagnoses. Five studies compared patients who were correctly diagnosed in the ED or at initial presentation with those who were misdiagnosed.89, 213, 214, 218, 219 Two of these studies provided strict criteria for a correct diagnosis. Ohle et al. classified patients as missed diagnosed if they were not diagnosed within the ED, if they received treatment for an alternate diagnosis in the ED, or if they re-presented at an ED within 14 days of initial visit.218 Smidfelt et al. considered patients as correctly diagnosed if aortic aneurysm was mentioned in the medical chart by the ED, if the patient was referred by the ED for an acute CT scan for aortic aneurysm, or if the patient received a laparotomy for suspected ruptured abdominal aortic aneurysm.219 Two other studies used time to diagnosis to determine patients who had a short versus long diagnostic time.68, 73

Figure 9 displays a forest plot of the false negative rate of diagnosing aortic aneurysm or dissection in the emergency department. The pooled false negative rate of ruptured abdominal aortic dissection from three studies was 36% (95% CI, 27% to 46%). The pooled false negative rate for acute aortic dissection among five studies was 35% (95% CI, 14% to 61%). Overall, the pooled false negative rate of aortic aneurysm or dissection was 36% (95% CI, 21% to 52%).

Figure 9

False negative rate (diagnostic delay) for aortic aneurysm or dissection in the emergency department. AAD = acute aortic dissection; CI = confidence interval; ES = effect summary (false negative rate); FN = number of false negatives; RAAA = ruptured abdominal (more...)

We were unable to include three studies in the meta-analysis due to differences in study design and differences in defining missed diagnoses. One was a retrospective cohort that reported a false negative rate of 0 percent among those who received a focused cardiac ultrasound by an emergency physician and 43 percent among those who did not.212 Another study reported a misdiagnosis rate of 24 percent among patients transferred to a referral center, including 15 percent of patients misclassified type of acute aortic syndrome (aneurysm called dissection, dissection called aneurysm, or error in type of dissection).215 Using Medicare data and SPADE-style methods, the false negative harm rate for diagnosis (defined as observed minus expected prior ED visits in advance of a related hospitalization) was reported as 3.4 percent (95% CI 2.9 to 4.0) for ruptured abdominal aortic aneurysm and 4.5 precent (95% CI 3.9 to 5.1) for aortic dissection.120

Three studies reported on the association between misdiagnosis and 30-day mortality.213, 216, 217 Pooling these three studies in a meta-analysis suggests a greater risk of 30-day mortality among those who were misdiagnosed than among those who were correctly diagnosed with aortic aneurysm and dissection (risk ratio [RR] 1.21; 95% CI 1.06 to 1.37; I-squared 0%; Figure 10). In addition to these unadjusted results, one study (Smidfelt, 2021) reported an even greater increased risk for mortality among those who were misdiagnosed when adjusted for age, sex, serum creatinine, and first-recorded systolic blood pressure of 90 mmHg or less (adjusted OR 1.83; 95% CI 1.13 to 2.96). The last of these is a proxy for initial case severity (those with low initial blood pressure were misdiagnosed in 28% vs. 44%, P = 0.001), and, as expected, when adjusted for initial severity (which often confounds the relationship between diagnostic error and misdiagnosis-related harms), the impact of diagnostic delay on mortality increases.

Figure 10 displays a forest plot of the association between 30-day mortality and initial delay in the diagnosis of aortic aneurysm and dissection in the emergency department. Three studies reported on the association between misdiagnosis and 30-day mortality. The pooled result suggests a greater risk of 30-day mortality among those who were misdiagnosed than among those who were correctly diagnosed with aortic aneurysm and dissection (risk ratio [RR], 1.21; 95% CI, 1.06 to 1.37; I-squared, 0%).

Figure 10

Association between initial emergency department delay in diagnosis of aortic aneurysm or dissection and 30-day mortality. CI = confidence interval; RR = risk ratio

Aortic Aneurysm and Dissection: False Positives

One study reported a misdiagnosis rate of 24 percent among transfers, including 9 percent of patients being misclassified as having an aneurysm or a dissection when they did not (low SOE for false discovery rate).215 A recent study of 1,762 emergency transfers for acute aortic syndrome was identified during the final report review (after the period of the systematic search).220 The study found 188 patients misdiagnosed (134 of these referred by ED physicians), including 84 of the 188 had suspected rupture or dissection they did not have (5% false discovery rate, n=84/1,762); all misdiagnoses were attributed to misinterpretation of imaging studies. Taking the two studies together, the estimated false discovery rate was 5 percent (n=93/1,862).

Venous Thromboembolism

Five studies with a low-risk of bias reported on the rate of misdiagnosing venous thromboembolism (N=13,459 patients).54, 221224 All of the studies, except one,224 were conducted outside of the United States.

Venous Thromboembolism: False Negatives

Three studies included patients (n=2,757) with a final diagnosis of pulmonary embolism and reported on the number of patients with a delayed diagnosis, which was defined as a diagnosis 7 days after the onset of symptoms221, 223 or a diagnosis between 24 hours and 30 days after an ED presentation.222 Pooling these three studies in a meta-analysis yielded a false negative rate of 20 percent (95% CI 17 to 24; Figure 11; moderate SOE for false negative rate).221223 Heterogeneity was not significant (I-squared 43%). Limiting the meta-analysis to only studies that were conducted after 2010 yielded a pooled false negative rate of 22 percent (95% CI 18 to 25),222, 223 indicating no change over time.

We did not include one study in this analysis because of the heterogeneity in study design. This study recruited patients with undifferentiated dyspnea and randomized them to receive immediate or delayed point of care ultrasound.54 The sensitivity and specificity of detecting acute pulmonary embolism in the ED were 89 percent and 100 percent, respectively, with immediate point-of-care ultrasound and 83 percent and 100 percent, respectively, with delayed ultrasound.

A second study was not included in this analysis because of heterogeneity in study design. This study was a retrospective interrupted time series evaluating age-adjusted dimer in patients over the age of 50 suspected of having pulmonary embolism (D-dimer ordered, chest related complaints, and no ultrasound order). The primary outcome was use of advanced diagnostic imaging and secondary outcome was diagnosis of pulmonary embolism within 30 days with age-adjusted D-dimer demonstrating a sensitivity of 95.2 percent and specificity of 68.6 percent.224

Two studies reported the mortality associated with a delayed diagnosis of pulmonary embolism.221, 222 One study reported no difference in all-cause mortality at 3 months between those with a delayed (>7 days from symptom onset) versus timely diagnosis (unadjusted OR 0.9; 95% CI 0.4 to 2.0).221 This study showing no difference failed to adjust for baseline initial case severity, and patients diagnosed in timely fashion were clearly sicker at baseline (e.g., oxygen saturation <60 mmHg at presentation, 57 versus 42%, P = 0.03). The other study reported a significantly higher inpatient mortality rate among those with a delayed diagnosis (between 24 hours to 30 days after ED presentation) compared to those with an early diagnosis (unadjusted OR, 45.3; 95% CI 13.2 to 153.4).222

Figure 11 displays a forest plot of the false negative rate of diagnosing pulmonary embolism in the emergency department. Three studies included patients (N = 2757) with a final diagnosis of pulmonary embolism and reported on the number of patients with a delayed diagnosis, which was defined as a diagnosis 7 days after the onset of symptoms or a diagnosis between 24 hours and 30 days after an ED presentation. Pooling these three studies in a meta-analysis yielded a false negative rate of 20% (95% CI, 17% to 24%). Heterogeneity was not significant (I-squared, 43%).

Figure 11

False negative rate (diagnostic delay) for pulmonary embolism in the emergency department. CI = confidence interval; ES = effect summary (false negative rate); FN = number of false negatives; TP = number of true positives; W. = Western

Meningitis and Encephalitis
Meningitis and Encephalitis: False Negatives

We identified one study that reported the rate of diagnostic error among 521 children, aged 30 days to 5 years, who were diagnosed with meningitis or septicemia.93 The study conducted a SPADE-style look back analysis to examine if children hospitalized with meningitis or septicemia in Ontario, Canada had ED treat-and-release ED visit(s) prior to their admission. The study reported 114 (21.9%) of the 521 children had prior treat-and-release ED visits with a median return time of 24.5 hours (low SOE for false negative rate). Although the authors reported no significant difference in the health outcomes among children who had repeated ED visit versus those who were admitted on the first ED visit, they failed to adjust for initial case severity, which likely confounds the finding.

Sepsis

We identified four studies that reported on the rate of diagnostic errors among 3,479 patients presenting to the ED and later diagnosed with sepsis.93, 156, 157, 225 All the studies were retrospective cohort studies (three of the four using SPADE-style look-back analyses in large electronic data sets) to identify missed diagnoses at ED or discrepancy in diagnosis between ED and inpatient. Only one study (Morr, 2017) was performed among adults (over 18 years of age). This study focused on review of consecutive hospital admissions from the ED to an internal medicine service; the case records were systematically assessed for evidence of infection, sepsis, and severe sepsis, and the authors reported on lack of recognition of sepsis or severe sepsis.157

Sepsis: False Negatives

The pooled false negative rate among sepsis patients was 18 percent (95% CI 8 to 32; I-squared 99%; Figure 12; moderate SOE for false negative rate). Subgroup analysis by age showed a significant difference in rate of misdiagnoses among patients under 18 years of age (10%; 95% CI 3 to 21) versus those over 18 years (59%; 95% CI 45 to 72), with rates significantly higher among adults than children. However, the lone adult study (Morr, 2017) used very different methods than the studies in children, focusing on incorrect severity assessment among ED patients admitted with infections, rather than missed opportunities to diagnose infection among ED treat-and-release visits that were followed by sepsis hospitalizations. The Morr, 2017 paper refers to “ED discharge letters” but in the methods section they note that “All medical patients receive a detailed discharge letter upon transfer from the ED to the wards.”; this seems to clarify that the patients are all admitted via the ED, rather than admitted after having been treated and released previously from the ED. Thus, it is likely that the apparent difference in false negatives by age group is methods-related, rather than age-related. It is unsurprising that the rate of ED treat-and-release followed by sepsis hospitalization would be lower than the rate of correctly diagnosed infection requiring admission in which severity (i.e., sepsis) was under-recognized in the ED. Thus, the more generalizable false negative rate is likely 10 percent, rather than 18 percent. Two studies assessed impact of missed diagnosis on health outcomes (30-day mortality), and no significant difference was observed.93, 156 Both studies failed to adequately adjust for initial case severity in performing their analyses of adverse health outcomes.

Figure 12 displays a forest plot of the false negative rate of sepsis in the emergency department. The pooled false negative rate among 3 pediatric studies that evaluate patients who were treated and then released from the emergency department was 10% (95% CI 3% to 21%). One study reported on the false negative rate of sepsis using ED admissions data. The false negative rate was 59% (95% CI 45% to 72%).

Figure 12

False negative rate for sepsis in the emergency department. CI = confidence interval; ED = emergency department; ES = effect summary (false negative rate); FN = number of false negatives; TP = number of true positives; U.S. = United States; W. = Western (more...)

Arterial Thromboembolism
Arterial Thromboembolism: False Negatives

We identified two studies that reported on the rate of false negatives for acute mesenteric ischemia.226, 227 One study assessed delayed diagnosis of acute mesenteric ischemia among 72 cases presenting to the ED.226 Time to surgical consult was ≥24 hours in 15.3 percent of patients (low SOE for false negative rate). Delay in consultation was associated with increased odds of death, although the result was not statistically significant (severity-adjusted OR 3; 90% CI 0.69 to 13; P = 0.11). Time to operation was ≥6 hours in 37.9 percent of cases (n=22 of 58 undergoing operations). Delay in operation was associated with a statistically significant increased odds of death (severity-adjusted OR 3.7; 90% CI 1.1 to 12; P = 0.04). After excluding cases for whom care was withdrawn (i.e., eliminating very high-severity cases that fared very poorly, thereby focusing on milder cases) and again adjusting for illness severity, mortality was substantially increased for both delay in consultation (9.4-fold increased, P = 0.03) and delay in operation (4.9-fold increased, P = 0.04). This again shows that illness severity adjustment is essential for determining the full negative health impact of diagnostic delay, which is understated when illness severity is not considered. The second study focused on radiographic misdiagnosis among 95 patients with 97 acute mesenteric ischemia events.227 Acute mesenteric ischemia was incorrectly diagnosed by the on-call radiologist in 14 of these 97 cases (14%).

Spinal and Intracranial Abscess

One study included as part of the review (Dubosh, 2020) addressed missed spinal abscess among 1,381,614 ED discharges for back pain, enabling assessment of the false omission rate.81 Two others addressed missed cases (false negative rate) in all-comers with spinal abscess but were excluded during the full-text review stage; the nature of these exclusions (described below) is such that they are unlikely to invalidate the study findings, so results are presented here.

Spinal and Intracranial Abscess: False Negatives

One study identified as part of the review examined the frequency of missed spinal and intracranial abscess among ED patients treated and released with “benign” back pain diagnoses.81 In a large retrospective cohort study (look forward method) from six U.S. states, Dubosh et al. found that the most common missed neurologic condition among treat-and-release visits for back pain was intraspinal abscess (46% of missed neurologic conditions among those hospitalized within 30 days were for intraspinal abscess). The absolute rate of 30-day returns for a subsequent hospitalization (including in-hospital mortality) with spinal abscess was 0.1 percent (n=1,320/1,381,614) of “benign” back pain treat-and-release visits from the ED (high SOE for false omission rate). This false omission rate corresponds to one missed spinal abscess for every 1,047 “benign” back pain ED discharges.

One detailed study of missed spinal abscess cases drawn from a large national clinical data repository through the Veterans Administration was captured but excluded from the review at the full text stage solely because it admixed ambulatory clinic care and ED cases; the authors were contacted, but they were unable to provide a breakdown of the number of cases that were ED based (personal communication). If results from that study are applicable to ED missed spinal abscess, the misdiagnosis rate for spinal abscesses is estimated to be 56 percent (n=66/119).228 Pre-defined missed “red flags” in misdiagnosed cases (n=66) were unexplained fever (n=57), focal neurologic deficits with progressive or disabling symptoms (n=54), active infection (n=54), immunosuppression (n=36), intravenous drug use (n=20), prolonged use of corticosteroids (n=16), unexplained weight loss (n=13), back pain duration greater than 6 weeks (n=13), and a history of cancer (n=9). Among misdiagnosed cases (n=66), the mean number of pre-defined missed “red flag” signs was 4.9, which was higher than the mean of 4.3 in those correctly diagnosed (P = 0.03). Diagnostic process failures resulted from: 1) the provider-patient encounter (n=60 with missed red flags [information not gathered during history and physical examination] or inappropriate action [ordering tests] after identifying red flags); 2) the subspecialty consultation process (n=51 in which the provider did not believe referral was required or an appropriate expert was not consulted); 3) patient-related delays (n=17 in which the patient did not show up for a follow-up visit); 4) provider-related delays (n=11 in which the provider took too much time to follow-up test results); and 5) radiographic misdiagnosis (n=5 in which the MRI report was not read accurately and was believed to be non-serious). The level of misdiagnosis-related harms identified was of high severity, with the potentially preventable results of diagnostic delay being death (n=8), severe harm (n=32), moderate harm (n=25), mild harm (n=1), and no harm or unknown (n=0).

Pneumonia

We identified two studies that reported the rate of diagnostic error among 293 patients who were diagnosed with pneumonia.136, 229 Neither study addressed all ED patients with pneumonia. One study reported on community-acquired pneumonia among patients 65 years or older with acute respiratory failure136; the other study reported on round pneumonia among patients under 19 years of age.229

Pneumonia: False Negatives

The first study was a prospective observational study at a University hospital in Paris, France (Ray, 2006).136 This was a well-designed study that looked at ED diagnostic accuracy rigorously, but in the specific population of elderly patients with acute respiratory failure (n=514), a subset of whom had community-acquired pneumonia (n=181). All patients were admitted for an extensive hospital-based diagnostic evaluation. In this narrowly defined patient population, the authors described ED physician diagnostic accuracy for pneumonia as follows (value [95% CI]): sensitivity 0.86 [0.80–0.90], specificity 0.76 [0.71–0.80], positive predictive value 0.66 [0.59–0.71], negative predictive value 0.91 [0.87–0.93], total diagnostic accuracy 0.79 [0.75–0.82]. These values correspond to a false negative rate of 14 percent, a false positive rate of 24 percent, a false discovery rate of 34 percent, and a false omission rate of 9 percent (low SOE for all measures of diagnostic accuracy).

Pneumonia: False Positives

As noted above, the Ray, 2006 study found a false positive rate of 24 percent and false discovery rate of 34 percent. The second study was a retrospective review of radiology cases of round pneumonia conducted at a large tertiary care Children’s hospital in Cincinnati (Kim, 2007).229 Although not mentioned explicitly in the report, it was assumed that the majority of cases would have initially presented via the ED. The authors, on review of the cases, found “three patients (2.6%, three of the initially identified 112) who were originally suspected to have round pneumonia and were later shown to have other diagnoses.” No further details were provided (including whether the errors occurred in ED patients), but it appears these were errors in radiographic interpretation. This would correspond to a false discovery rate of 2.6 percent, but it is highly improbable that this corresponds well to overall ED diagnostic accuracy.

Appendicitis

We identified eight studies that reported the rate of diagnostic error of appendicitis among 7,351 patients.225, 230236 Two studies assessed diagnostic error as part of a prospective assessment of different diagnostic imaging in the ED.230, 231 Three studies conducted a retrospective analysis.225, 232, 233 Two studies examined diagnostic outcome changes before and during the coronavirus (SARS-CoV-2) disease 2019 (COVID-19) outbreak.234, 235 One study examined missed diagnostic opportunities at the ED using a look-back method.236 All approaches may have high risk of biases due to inclusion criteria and sampling.

Appendicitis: False Negatives

Of those with a final diagnosis of appendicitis, misdiagnosis rates ranged from 0.2 to 4.8 percent in pediatric studies (moderate SOE for false negative rate).225, 233, 236238 The false negative rate was 2.9 percent among patients under 18 years of age.225 The false negative rate in an unrelated study among patients 17 years of age and older was 30.8 percent, but this study used a very different method and focused only on missed cases using point-of-care ultrasound.233 That said, an older study (from prior to the study period) which compared younger and older presentations of appendicitis appears to corroborate the notion that diagnostic delays in older adults are more common than among children, with contributions from both the “patient interval” (from symptoms to presentation) and “clinician interval” (from presentation to diagnosis) delay components—the result appears to be a higher rate of complications and greater mortality.239 This difference is not necessarily unexpected, given that appendicitis in older patients is both less common and more atypical (“wrong” age group for the illness).

Appendicitis: False Positives

The pooled false discovery rate for appendicitis diagnoses in the ED was 7 percent (95% CI 4 to 9; I-squared, 0%; Figure 13)230233 (moderate SOE for false discovery rate). The studies included a combination of prospective and retrospective cohorts. However, case selection due to inclusion criteria for certain studies limited their generalizability. False positive diagnoses of appendicitis may result in harm by subjecting patients to unnecessary surgical procedures.

Figure 13 displays a forest plot of the false discovery rate of appendicitis in the emergency department. The pooled false discovery rate from three studies was 7% (95% CI, 4% to 10%).

Figure 13

False discovery rate for appendicitis in the emergency department. CI = confidence interval; ES = effect summary (false discovery rate); FP = number of false positives; TP = number of true positives; U.S. = United States; W. = Western

During the COVID-19 outbreak, Somers et al. showed that the false discovery rate decreased from 26.1 percent to 2.5 percent.234 This observation might have been due to changes in patient illness seeking behavior during the pandemic—since volumes were lower, perhaps only those who experienced more severe symptoms or more advanced illness might have ended up seeking care in the ED, making the mix of diagnoses more “obvious.” However, Willms et al. showed no difference in the false discovery rate before and during the COVID-19 outbreak.235

Fractures

We identified 17 retrospective or prospective studies (4 in the United States240243) that reported on the rate of diagnostic errors and/or misdiagnosis-related harms among 138,551 patients with fractures.74, 83, 121, 240253 Studies varied significantly in the methodological approach, definitions to assess diagnostic errors, target populations, and inclusion/exclusion criteria. Most of the studies had a low risk of bias. The retrospective design in most of the studies makes it difficult to know the true rate of diagnostic error. When Enderson and colleagues changed the study design from retrospective to prospective the incidence of missed traumatic fractures increased from 2 to 9 percent.152 Studies tended to emphasize delay in diagnosis as an endpoint, while clinically meaningful delays affecting outcomes were far fewer in number.

Fractures: False Negatives

Fourteen studies reported on the false negative rate for fractures in the ED.83, 84, 121, 240244, 246248, 250253

In an ED that evaluates adults and children, a total of 350 false-negative errors occurred in 28,904 fractures (1.2%). The sites most often missed in children were elbow (29%) and wrist (21%); in adults, it was the foot (17%), as well as the pelvis and hip (37%) in elderly patients.248 Spine fractures (shown in KQ1 to be more harmful) accounted for just 6.2 percent of missed cases. In a second study of 5,879 patients who presented to an ED, 40 patients had a false negative fracture (0.7%). The missed fractures were in the ankle or foot (28%, n=11), lower arm (22%, n=9), hand and fingers (22%, n=9), hip (10%, n=4) and miscellaneous (18%, n=7).246

However, miss rates varied widely across studies (from 0.02 to 40%), depending on study design, definitions, or included populations. In one study that compared initial radiology resident reads to those of attending radiologists in a tertiary care ED, just 19 out of 81,201 images (0.02%) were classified as a missed fractures.241 In patients with minor trauma, 7 of 4,025 patients (0.2%) had a missed fracture when evaluated in an outpatient clinic.251 In patients with fractures at a specialty orthopedics ED who had imaging read only by an orthopedic surgeon versus a radiologist, the incidence of false-negative fractures was 293 out of 13,561 (2.2%).83 In another study, 51 of 304 limb or pelvis X-rays had discrepancies between ED clinical notes and the final radiology report (17%), although only 15 (5% of the total) were deemed clinically significant.250 In a study of ED ankle X-rays, 61 out of 2947 (2%) were considered major discrepancies that changed management.253 Among polytrauma patients, rates of delayed diagnosis of injury ranged from 2 to 40 percent, with the most common of these being fractures (moderate SOE for false negative rate).

Fracture False Negatives: Polytrauma

In patients with polytrauma, rates of missed secondary fractures are generally higher than in the general ED population, despite multiple trauma surveys searching for injuries. In polytrauma patients presenting to one trauma center 12 percent (n=172 of 1,416) suffered delayed diagnoses of injury; the majority of these were extremity fractures, given that these patients received CT scans of the head, chest, and pelvis as the primary focus of their initial trauma survey for injuries. The incidence of false-negative extremity fractures (in order of the proportion delayed) was hand (54%, n=39 of 72), foot (38%, n=23 of 61), tibia (21%, n=11 of 53), fibula (18%, n=4 of 22), ankle (15%, n=7 of 47), humerus (15%, n=13 of 88), radius (10%, n=11 of 109), patella (8%, n=2 of 26), ulna (8%, n=8 of 96), clavicle (6%, n=12 of 196), scapula (4%, n=6 of 127), femur (2% , n=3 of 134), and cruris (2%, n=2 of 86).121 The importance of ongoing reassessment in polytrauma patients was emphasized. In severe trauma cases requiring CT of the whole body, 39 of 375 patients (10%) had a missed injury, of which 85% could be detected on a second read. This study suggested that a second read in the setting of quality assurance would be helpful to minimize missed fractures.252 In another study in a non-United States trauma center, 64 missed injuries (the majority of which were fractures) were detected in 58 patients out of 1,187 patients seen (4.9% of patients).245 There was a delay in diagnosis of fracture in a pediatric trauma center in 44 of 1,056 patients (4%) who presented with trauma.240 There were eight fractures out of 76 pediatric trauma patients (11%) that were missed: two were of the spine, two were of the head and face, two were in the upper limb, and two were in the lower limb.247 In a large pediatric trauma center, 62 of 2,316 (2%) patients had a missed fracture, the majority of which were upper and lower extremity injuries.249 In another pediatric trauma center, 18 of 196 (9%) were classified as delayed diagnosis of fracture, one of which required surgical treatment.243 In one study from Spain, 49 of 122 (40%) had delayed diagnosis of injury, and the most frequently missed injury was fracture (43%).74

Fracture False Negatives: Abuse

In pediatric patients with a delay in diagnosis of abuse, 54 of 258 patients (21 percent) were falsely classified as a non-abuse fracture.244

Fractures: False Positives

Four studies reported on the rate of false positive diagnoses of fractures.83, 246, 249, 253 Twenty-one of 61 misdiagnosed fractures in adult ED patients were false positives (34% of fracture diagnostic errors).246 Among 13,561 ED patients with minor trauma whose X-rays were not reviewed by an attending radiologist, 337 misdiagnosed fractures were identified (2.5%); of these, 44 (13%) had false-positive fractures.83 Sixty-five of 125 incorrect fracture diagnoses in pediatric skeletal radiographs were false positives (52%).249 Ten of 81 major discrepancies in ED ankle radiographs were false positives (12%).253 We were unable to draw strong conclusions about the rate of false positive fracture diagnoses because of concerns with study limitations and methodological heterogeneity. Nevertheless, a sizable minority of diagnostic errors related to fractures are likely to be false positive (12-52%) rather than false negative diagnoses.

Fractures: Other Studies

One study used a machine-learning algorithm to improve clinician detection of fractures from a sensitivity of 80.8 to 91.5 percent and a specificity of 87.5 to 93.9 percent. The authors suggested this technique could allow expert knowledge to be delivered remotely to generalists.242

Testicular Torsion

We identified two studies (one in the United States and one in Canada) that reported on the rate of diagnostic errors and/or misdiagnosis-related harms among 262 patients with testicular torsion.254, 255 One study was a retrospective review evaluating doppler ultrasound as a means of detecting testicular torsion.254 The other study was a retrospective chart review of ED patients who underwent detorsion and orchiopexy or orchiectomy (2005-2015).255

Testicular Torsion: False Negatives

Both studies reported false-negative errors.254, 255 In one study evaluating doppler ultrasound, three out of 46 patients with a false negative had absent or diminished flow, 18 had an absence of arterial waveform, 29 had heterogeneous echotexture, and 15 had an absence of doppler flow.254 All of the tests had a positive predictive value of 91 percent or higher; none of the test findings had negative predictive values greater than 40 percent. In the other study, the initial miss rate overall was 6 percent (n=12 of 208) and 13 percent among patients with a delayed presentation (n=12 of 94). Among the 12 initially misdiagnosed, 11 were missed in the ED, which corresponds to an ED false negative rate of 5 percent. Delayed presentations were more likely to report isolated abdominal pain, have developmental disorders, or report a history of genital trauma.255 Chan et al., 2019 focused on testing delays and radiographic errors, while Bayne et al., 2017 enabled an estimate of ED false negative rate (n=11 of 208 total cases, all in the “delayed presentation” subgroup [n=94]). Among patients with testicular torsion, 5.3 percent (95% CI, 2.7% to 9.3%) are initially misdiagnosed in the ED (low SOE for false negative rate).

Testicular Torsion: False Positives

One study had three false-positive patients among 46 patients: one with absent or diminished flow, one with heterogeneous echotexture, and one with abscess of doppler flow.254 This corresponds to a 7 percent false discovery rate. We are unable to draw a conclusion about the rate of false positive diagnoses of testicular torsion because of our concerns with study limitations and the imprecise results from a single study.

Other Conditions

We did not find any studies meeting our inclusion criteria that reported on the ED diagnostic error rate for endocarditis, necrotizing enterocolitis, sudden cardiac death, arrythmias, congenital heart disease, ectopic pregnancy, or pre-eclampsia/eclampsia.

Key Question 2c. Approximately how many patients does this equate to nationally in the United States?

Each year in the United States there are 130 million ED visits.13 Given the best estimates outlined in the sections above, it is likely that there are over 7 million ED diagnostic errors, over 2.5 million diagnostic adverse events involving preventable harms, and over 350,000 serious misdiagnosis-related harms, including more than 100,000 serious, permanent disabilities and over 250,000 deaths (Table 10). The studies of general (not disease-specific) diagnostic errors on which these estimates are based were not explicit about the breakdown of false negative versus false positive errors, but used methods related to diagnostic discrepancy, so should have included both types of error (including both “undercalls” and “overcalls” of dangerous diseases). Since there was no explicit search described for the adverse effects of false positive diagnoses (e.g., complications from invasive diagnostic tests or adverse health outcomes from treatment for incidental, yet unimportant, findings), it is presumed that the misdiagnosis-related harms reflect only those related to false negatives for those whose dangerous underlying diseases were missed.

Although these estimates may seem high, they are on par with what has been estimated for harms from inpatient diagnostic error (250,000 harms out of 36 million hospitalizations), based on systematic review data.3 Furthermore, if we use the high-quality, prospective study (Hautz, 2019) of ED admissions (which did not look at discharged patients) to estimate errors and harms, we get numbers that corroborate these figures. There are 16.2 million hospital admissions each year in the United States via the ED.13 If we combine that with a 12.3 percent error rate and 4.8 percent misdiagnosis-related death rate,7 we get 2 million diagnostic errors and 97,000 deaths among patients hospitalized via the ED. Using the ratio of disability to death shown in Table 10, that corresponds to about 136,000 serious harms. These are included among the total of more than 350,000 estimated in Table 10, since the diagnostic adverse event rate and mortality include both discharged and admitted patients. It seems plausible that roughly one third of the serious harms from ED diagnostic error would occur among admitted patients (who are lower in number [12.4% of ED visits end in admission13] but higher in risk), with the rest among those treated and released (who are higher in number [87.6% of ED visits end in discharge13] but lower in risk).

Table 10. U.S. national estimates for ED diagnostic adverse events, serious morbidity, and death.

Table 10

U.S. national estimates for ED diagnostic adverse events, serious morbidity, and death.

These estimates are equivalent to a diagnostic error every 18 patients, a diagnostic adverse event every 50 patients, a serious harm (serious disability or death) about every 350 patients, and a misdiagnosis-related death about every 500 patients. Put in terms of an average ED with 25,000 visits annually and average diagnostic performance, each year this would be over 1,400 diagnostic errors, 500 diagnostic adverse events, and 75 serious harms, including 50 deaths (Table 11). This translates to 10 patients harmed and more than 1 death or disability each week.

Table 11. Estimated “typical” ED frequency of diagnostic errors and misdiagnosis-related harms.

Table 11

Estimated “typical” ED frequency of diagnostic errors and misdiagnosis-related harms.

Key Question 2d. Are there clear commonalities or differences across clinical conditions in the frequency or risk of ED diagnostic errors or misdiagnosis-related harms?

The most striking commonality across all conditions is that mild, non-specific, or atypical symptoms substantially increase the frequency or risk of diagnostic errors and harms; this is elaborated further in the KQ3 section on Illness Characteristics. There is also evidence across diseases that the temporal profile of adverse events after missed major vascular events and infections is one of initially high risk followed by exponential decline over time (elaborated below as it relates to temporal risk windows and optimizing measurement).

The clearest difference across conditions is that, among dangerous diseases, myocardial infarction appears to stand alone as a “shining star” example for which ED miss rates have been reduced to a near-zero level. Even there, however, delays in admitted patients may still represent an area for improvement, and the false discovery rate is 14 percent. Fractures and appendicitis, both less likely to cause serious misdiagnosis-related harms than the other conditions assessed, are also missed at fairly low rates. By contrast, rates of misdiagnosis for neurologic symptoms and neurologic diseases appear to be higher than for most general medical symptoms and diseases. Unsurprisingly, death is the most common serious harm from missed general medical diseases while disability is the most common serious harm from missed neurologic diseases.

ED Treat-and-Release Discharges Versus Hospital Admissions

There is direct evidence that diagnostic errors are more frequent among patients discharged than admitted. Heitmann et al., 2016 found that 1.6 percent of ED discharges and 0.3 percent of patients admitted to a hospital ward via the ED returned within 30 days due to a diagnostic error, and almost all of these (in both subgroups) returned within 7 days.143 This likely indicates that hospital admission serves as at least a partial clinical safety net when there is diagnostic uncertainty or error, and comports with U.S. Medicare data showing that EDs with very high discharge fractions (proportion of patients sent home on any given day) are more susceptible to diagnostic errors associated with short-term, unexpected patient deaths.148

For both stroke and myocardial infarction there was evidence that patients admitted with the wrong ED diagnoses were more frequent than patients misdiagnosed and discharged. Chompoopong, 2017 began with a cohort of patients hospitalized for stroke and found that 40 percent were admitted initially from the ED with non-stroke diagnoses.85 Graff, 2006 began with a cohort of patients hospitalized for myocardial infarction and found that 25.6 percent were admitted initially from the ED with non-acute coronary syndrome diagnoses.206 These rates are much higher than the overall false negative diagnosis rates among patients who are discharged (17% for stroke, 1.5% for myocardial infarction). This may suggest that ED clinicians are (appropriately) focused more on correct disposition than correct diagnosis, per se.

There also appears to be evidence that false negatives for dangerous diseases, particularly among those discharged from the ED, are generally less common than false positives (Table 9). Some false negatives are associated with significant adverse outcomes (including death), but we presume that false positives (i.e., those who undergo diagnostic testing for the dangerous disease in question via hospital admission but are found instead to have some more benign underlying cause) are generally less dangerous for patients. This would seem to suggest that ED clinicians are weighting their diagnostic decision-making tradeoffs appropriately based on asymmetry of outcomes (i.e., dangerous diseases are worse to “undercall” than to “overcall”).

Temporal Profile of Diagnostic Adverse Events/Harms

It has been shown previously that the short-term risk of adverse events following a false negative (missed) dangerous disease in the ED follows a characteristic temporal profile. The initial risk is at its peak, then exponentially declines towards a linear base rate over days to months, depending on the specific disease. We identified studies in our review showing this pattern for stroke,64, 81, 120, 144, 170 myocardial infarction,63, 77, 120 aortic aneurysm/dissection,120 multiple vascular events combined,120 sepsis,78, 93, 94, 256 meningitis,81, 93 and spinal abscess.81 Unsurprisingly, the temporal profile of returns after a missed case seems to mirror the underlying disease biology and natural history, as shown in Figure 14 for stroke.

Figure 14 shows (a) the cumulative incidence of stroke hospitalizations post ambulatory treat-and-release as benign dizziness and (b) the cumulative incidence curve for natural history of major stroke following transient ischemic attack or minor stroke. Data are from the population-based Oxford Vascular Study and adapted with permission. In (a), the cumulative incidence rate of stroke increases, whereas the cumulative incidence of myocardial infarction remains constant. In (b) the risk of stroke and transient ischemic attacks increases until it levels off at 90 days.

Figure 14

Cumulative incidence of stroke hospitalizations post ambulatory (ED or other) treat-and-release as “benign dizziness” (a) and cumulative incidence curve for natural history of major stroke following TIA or minor stroke (b). ED = emergency (more...)

This appears to be true, more generally, of diagnostic adverse events in the ED. Specifically Heitmann et al. also showed that most returns linked back to chart review-detected diagnostic errors occur in the first week (Figure 15). This comports with data from the large administrative and electronic health record data studies alluded to above that use symptom-disease pairs (“SPADE” methods),145 which have found that short-term rehospitalizations occurring at rates statistically above baseline for missed dangerous vascular events and infections occur dominantly in the first month and disproportionately in the first week after ED discharge. This indicates that 72-hour or 7-day revisits are expected to be an enriched source to detect diagnostic error, but that absolute error rates will be substantially underestimated using very short revisit windows for analysis (e.g., 72-hour returns, which are commonly utilized).

Figure 15 shows the nature of short-term revisits. The graph demonstrates that return visits related to diagnostic errors tend to occur predominantly in the first week after ED discharge.

Figure 15

Nature of short-term ED revisits. ED = emergency department; SPADE = Symptom-disease Pair Analysis of Diagnostic Error Note: The graph clearly demonstrates that return visits related to diagnostic errors (dots) tend to occur predominantly in the first (more...)

It should be noted that this temporal profile has also been demonstrated for all ED revisits (without regard to underlying cause for the revisit). Using data from the Agency for Healthcare Research and Quality’s Healthcare Cost and Utilization Project (HCUP) family of databases,259 Rising et al. found that 31 percent of ED visits were followed by a revisit within 1 year (3-day revisit rate 7.5%; 30-day revisit rate 22.4%).147 The modeled cumulative hazard for revisits showed exponential growth over roughly the first 14 days, followed by a linear rise thereafter (best approximated by a double-exponential model, with excellent fit, R2 = 0.9997). The authors concluded that the optimal balance between capturing “excess” acute revisits and “expected” revisits would be achieved by using a 9-day return window for quality measurement, rather than the more typically used 72-hour window. However, for diagnostic error detection, relevant windows likely vary in disease-specific fashion.

Key Question 3. Causes of Diagnostic Errors

Key Points

  • Diagnostic error causes were often multifactorial, but cognitive errors dominated across data sources. In malpractice claims, nearly 90 percent of cases involved failures of clinical decision-making or judgment, regardless of the underlying disease present. In incident reports, key process failures were errors in diagnostic assessment, test ordering, and test interpretation which were usually attributed to inadequate clinical knowledge, skills, or reasoning, particularly in “atypical” clinical cases.
  • Disease-specific studies addressed a mix of predictors, the most common of which were patient demographics (especially age, sex, and race) and illness characteristics (especially symptom type, illness severity, and mode of arrival). Fewer studies addressed clinician characteristics, facility characteristics, or dynamic, context-specific systems factors. There was substantial heterogeneity in the effects of these predictors across diseases and studies, with variability in results partially explained by methodological differences.
  • The effect of age was heterogeneous and disease-specific (e.g., younger age increases risk of missed stroke while older age increases risk of missed appendicitis) and sometimes large in magnitude. Female sex and non-white race were often associated with important (20-30%) increases in misdiagnosis risk; although these disparities were inconsistently demonstrated across studies, being a woman or a racial or ethnic minority was generally not found to be “protective” against misdiagnosis (i.e., was neutral at best).
  • Atypical or non-specific symptoms were the strongest and most consistent predictors of increased risk for a missed diagnosis across diseases studied. For undiagnosed serious medical illnesses, less severe presentations and less urgent modes of arrival increased misdiagnosis risk; for multi-trauma patients, the reverse was true—more, rather than less, severe presentations increased misdiagnosis risk.
  • Other notable predictors of misdiagnosis included care provided by less experienced clinicians, at non-teaching hospitals, with high ED discharge fraction, and during off hours. The diagnostic performance gap with academic (teaching) EDs having lower false negative rates than community (non-teaching) EDs was a fairly consistent finding, but it is unknown whether lower academic false-negative rates were achieved through greater overall diagnostic accuracy or by favoring overutilization, leading to arbitrarily greater admission fractions and resulting in higher false-positive rates.
  • One overarching commonality across causes was that degree of difficulty in assessing a clinical presentation for a specific disease was a critical factor—“obviousness” predicted correct diagnosis and “subtlety” predicted incorrect diagnosis. “Subtle” situations include diseases in the “wrong” age groups; non-specific, milder, or atypical symptoms; and finding second, third, or fourth problems in patients who are very ill (e.g., polytrauma).

Summary of Findings

Key Question 3a. What are the most frequent causes identified?

When considering causes of diagnostic error, these can be framed either as “predisposing factors” (e.g., atypical illness presentation, off hours), “root causes” (e.g., clinical judgment failure, communication failure), or diagnostic “process steps” (e.g., failure during clinical information gathering, test ordering, or test interpretation). In most studies, only one of these frameworks was adopted. The majority of disease-specific studies focused on predisposing factors (often referred to as “predictors” or “risk factors”). By contrast, the majority of cross-cutting (not disease-specific) studies focused on root causes, or, less often, diagnostic process steps. Sometimes root causes were framed explicitly using the “cognitive” versus “systems” versus “mixed” factors, but other times studies applied their own or pre-existing taxonomies to describe the underlying root causes. We identified no studies that attempted to drill down further into the cognitive psychology of cognitive error (e.g., types of decision-making heuristics or associated cognitive biases at play). Even when studies focused on diagnostic process steps such as those found in the NAM report Improving Diagnosis in Healthcare (see Figure 16), relatively few focused on either (a) the patient-facing aspects of delays in engaging the healthcare system at the outset or (b) effective communication of the diagnosis to the patient.

The most robust data on the relative frequencies of overall root causes came from the large malpractice claims study from the United States (Newman-Toker, 2019) and the incident report study from the United Kingdom (Hussain, 2019) that formed the basis of the analysis of the most frequent diseases associated with diagnostic error (KQ1, Table 2).

Newman-Toker et al. broke down the causes into one of 11 major categories (Figure 17). There was an average of 2.4 cause categories identified per case, and these were dominated by clinical judgment factors (present in 89% of cases), regardless of the underlying disease involved (vascular events 93%, infections 89%, cancers 75%, other diseases 87%). This study used data from a large malpractice risk insurer that routinely conducts a standardized case evaluation process. According to the published study, “relevant factors in each case are abstracted based on a complete review of the medical and legal case file including case summaries, medical record data, depositions, and legal proceedings. Cases are reviewed and coded by experienced clinical taxonomy specialists (typically registered nurses with at least 10 years of quality or risk management experience), who abstract data using a multi-tiered coding taxonomy.” It is unknown whether this process might systematically underrepresent certain causal features (e.g., certain fixed or dynamic systems factors), but findings were consistent with prior literature on diagnostic error causes found in non-claims sources from both the ED and other frontline care settings.31, 49, 260 It was also face valid that the distribution of causes was similar for vascular events and infections, but slightly different for cancer (Figure 17). In a further analysis by Newman-Toker et al., among 55 more granular (i.e., more “split” rather than more “lumped”) causes, 7 of the top 10 were clinical judgment factors (Table 12).

Figure 16 is the National Academy of Medicine figure showing the steps in the diagnostic process where failures can occur. The steps are: (1) Patient experiences a health problem, (2) patient engages with health care system, (3) the diagnostic process, (4) communication of the diagnosis, (4) treatment and (5) outcome. The diagnostic process is represented as a circle and includes information integration and interpretation, working diagnosis, and information gathering.

Figure 16

Diagnostic process steps where failures can occur that contribute to diagnostic errors.

Figure 17 shows the root causes of diagnostic errors in the emergency department for cancer, vascular diseases, and infections. For all conditions, clinical judgement factors were the most common.

Figure 17

Root causes of emergency department diagnostic errors overall and by disease category. Data derive from a large U.S.-based malpractice claims study (Newman-Toker, 2019); the mean number of cause categories identified per case was 2.4, so the number of (more...)

Hussain et al. provided fewer details on root causes, but the message was similar—“Both the wrong and delayed diagnoses had largely common themes for contributory incidents, including: insufficient assessment (32%); inappropriate response to diagnostic imaging/investigations (25%); and failure to order diagnostic imaging/investigations (8%)… In all diagnostic error reports, the most common contributory factors (identified in 1577 reports, 69%) related to staff or human factors: “inadequate skill or knowledge”; “mistake”, “missed task or job to do” (e.g., checking diagnostic test results); and “failure to follow protocol”.” Overlapping causes were not described, but these clinician-focused cognitive causes accounted for 70-90 percent of all cases in which contributory factors were available.

A smaller incident report study by Okafor et al. also found that most diagnostic errors (n=214) were associated with multiple causes (2.9 causes per case [n=615/214]), but cognitive factors still predominated.31 They described 317 cognitive factors (52%), 192 system-related factors (31%), and 106 illness or patient factors they referred to as “non-remedial” (17%). Cognitive factors were faulty information verification (21%, n=130), faulty information processing (16%, n=97), faulty data gathering (10%, n=61), and faulty knowledge (5%, n=29). System-related factors were inefficient process (13%, n=77), high workload (11%, n=66), handoff/communication problem (5%, n=28), and insufficient resources/poor equipment (3%, n=21). Illness factors were atypical presentation (5%, n=33), complicated medical history (3%, n=19), and rare presentation (1%, n=7). Patient factors were “limited historian” (5%, n=33), language barrier (2%, n=10), and psychiatric issues or non-adherence (1%, n=4). The top 5 causes (faulty information verification, faulty information processing, inefficient process, high workload, and faulty data gathering) accounted for 70 percent of all causes identified.

Table 12. Top contributing factors to emergency department diagnostic error in malpractice claims.

Table 12

Top contributing factors to emergency department diagnostic error in malpractice claims.

Representativeness of Malpractice Claims Data for Root Causes

It is known that malpractice claims data represent a biased sample of cases, so it is then reasonable to consider whether bias(es) might influence the root causes of diagnostic error identified. As described above, it was clear from ED incident report studies (e.g., Hussain 2019,16 Okafor 201631) that the spectrum of root causes identified is quite similar to that found in ED malpractice claims studies—mostly cognitive errors related to bedside diagnostic decision-making (especially clinical examination, test ordering, or integration of test results into diagnostic reasoning). What is not known is whether both malpractice claims and voluntary incident reports might be biased towards cases with cognitive errors by physicians. This question cannot be easily addressed by retrospective studies relying on chart review, since most potential root causes must be inferred (i.e., they are not actually captured or recorded). Nor can it be addressed by diagnostically oriented, experimental vignette-based studies (which only assess for cognitive errors). To address this question rigorously, one would need a cohort study or clinical trial that prospectively captured all potential root causes and then assessed diagnostic errors and root causes. We found no such studies, so this remains an unanswered scientific question.

Key Question 3b. Do causes identified differ based on severity of harms?

The only information we were able to identify on this issue comes from Newman-Toker, 2019. Clinical judgment factors accounted for roughly the same 89 percent of cases resulting in high-severity (serious) harms as in the lesser-severity harm cases.

Key Question 3c. Do different causes have differential impact on patient outcomes (i.e., harms)?

We were not able to identify any studies that addressed this question.

Key Question 3d. Overall and for each clinical condition, are the following characteristics associated with errors/harms?

The three main sources for variation in a diagnostic “test” (in this case a clinical diagnosis rendered by the ED care process) are the patient, the testing process, and the observer (i.e., diagnostician). Variation contributes to bias and random error. Diagnosticians are not only the observers who make diagnoses but also are part of the testing process (e.g., by obtaining clinical history or performing a physical examination). As part of our study method for this report, we prospectively defined characteristics and factors that have been shown to impact diagnostic errors in prior studies (Table 13) and used these to abstract data from included studies. Individual clinicians were rarely the subject of research on diagnostic error, so variation at the level of clinicians reflects “average” characteristics among a pool of clinicians within a given study.

One high-quality, prospective study looked across conditions at predictors of diagnostic “discrepancy” (which met the definition of diagnostic error used in this report) among consecutive patients admitted to the hospital via the ED. Hautz, 2019 found that the only factor that predicted diagnostic error was the diagnosing ED physician’s assessment that the patient presented atypically for the diagnosis assigned (OR 3.04; 95% CI 1.33 to 6.96; P = 0.009).7 They found no evidence that patient characteristics (age, gender), other illness characteristics (triage category, specific chief complaint, diagnostic category), clinician characteristics (gender, experience), dynamic systems factors (ED overcrowding, noise), or diagnostic process factors (perceived diagnostic difficulty, confidence in the diagnosis) predicted diagnostic error.

Table 13. Prospectively defined potential predictors or risk factors for diagnostic error.

Table 13

Prospectively defined potential predictors or risk factors for diagnostic error.

Patient Characteristics

We identified 108 studies that reported the effect of one or more patient characteristics on diagnostic error. We report the impact of patient characteristics separately by condition. Across conditions, the impact of age, sex, race, and ethnicity were reported far more often than the impact of language, socioeconomic status/income, health literacy, or health insurance.

The most common patient factors studied in relation to the misdiagnosis of stroke were age, sex, and race. Older age was associated with a lower risk of misdiagnosis64, 88, 171, 183 and patients with missed stroke were younger than the correctly diagnosed cases.160, 164, 174, 175, 186, 187, 192, 195 However, several studies found no age-difference in the time to evaluation for stroke.161, 185, 188, 261 Women were more likely to be misdiagnosed64, 164, 175, 183, 192, 262 and have a longer time of evaluation.185, 188, 261 Black64, 174, 188, 262 and Hispanic64, 262 patients were also at increased risk of misdiagnosis. Some studies reported no difference by race or ethnicity.160, 179

Twenty studies reported on patients’ characteristics and missed or delayed diagnosis of myocardial infarction. Studies reported mixed results on the effect of age on myocardial infarction misdiagnosis. Age was significantly associated with decreased risk,25, 63, 201 increased risk,77, 204, 206, 263, 264 or no effect on myocardial infarction misdiagnosis.120, 202, 205, 209, 265, 266 Three studies reported higher risk of misdiagnosis of myocardial infarction among female patients,120, 264, 267 One study found even among patients who presented with cardiac chest pain and cardiac troponin > 99th percentile, women were less likely to be diagnosed with MI, to undergo cardiac catheterization, or to be using evidence-based medications within 90 days of discharge.267 The rest of the studies reported no effect by sex on misdiagnosis of myocardial infarction.25, 63, 202, 204, 205, 208, 209, 265, 268270 There were mixed results on the effect of race on misdiagnosis of myocardial infarction. Some studies reported an increased risk among African American patients,63, 77, 202 while others reported no significant effect of race on myocardial infarction misdiagnosis.120, 206, 208, 209, 266, 271 Several studies found no effect by ethnicity,63, 77, 120, 202 or socioeconomic status.25, 63, 77, 202 Due to concern for delayed STEMI treatment among women and older patients, one study aimed to assess the performance of a physician-blinded prehospital activation system for STEMI in comparison with standard systems with physician involvement. In the standard system, female sex and age > 75 were independent predictors of treatment delay in hospitals with and without a prehospital notification system. By contrast, with implementation of a physician-blinded prehospital notification system, there was no difference in treatment delay by age and there was a smaller gap in treatment delay among women.264

Eight studies reported on the effect of age, sex, race, and drug abuse on accuracy diagnosing aortic aneurysm and dissection. Studies showed conflicting results about the effect of age on diagnostic delay or missed diagnosis of aortic aneurysm and/or dissection. Some studies reported significant decreased or increased risk of diagnostic delay among older age patients,73, 120, 214 while others reported no significant effect of age on delayed or missed diagnoses.68, 216218 Two studies reported increased risk of diagnostic delay among female patients,120, 219 others reported no significant difference among male or female patients on missed or delayed diagnosis of aortic aneurysm and/or dissection.68, 73, 216218 Other studies reported no effect of race or drug abuse on delayed or missed diagnosis of aortic aneurysm and/or dissection.68, 120, 217 None of the studies on risk of delays in aortic dissection diagnosis found a statistically significant difference between those with a history of Marfan’s syndrome and those without,68, 73, 217 although the presence of a known history was, if anything, protective (median time from presentation to diagnosis 2.2 hours for those with a known history versus 4.5 hours for those without, P = 0.06668).

We identified one study that reported on the effect of patient characteristics on diagnostic errors among 521 patients under 5 years of age presenting to the ED and later diagnosed with sepsis or meningitis.93 Compared to 30-90 day-old children, older age children (age 91 days-2 years and >2 to 5 years) experienced higher odds of missed diagnosis of sepsis or meningitis in the ED (OR 2.56; 95% CI 1.49 to 4.41 and OR 2.26; 95% CI 1.17 to 4.35, respectively).93

Two studies assessed clinical and patient characteristics associated with delayed diagnosis of appendicitis.236, 272 There was no effect of age (among children) or sex on delayed diagnosis of appendicitis in either study. Race, ethnicity and insurance status could not be studied in relation to diagnostic delay due to significant differences in these characteristics between the case and control group and was not assessed in the other study.236, 272 Michelson et al., 2021 found 63 percent of children had a delayed diagnosis of appendicitis, of which 76.8 percent were deemed possible or probable missed opportunities to improve diagnosis.272 In comparison with children who received a timely diagnosis, patients with delayed diagnosis of appendicitis had longer hospital length of stay, higher rates of perforation, and a higher likelihood of undergoing two or more abdominal surgeries (OR 8.0; 95% CI, 2.0 to 70.4).272 Lastunen et al. 2021 found that among patients with uncomplicated appendicitis on initial CT, age greater than 60 years was independently associated with progression to complicated appendicitis at the time of operation.273 Although we did not find any included studies that directly addressed older age (i.e., adult presentations) as a risk factor for misdiagnosis in appendicitis, cross-study differences suggested that it might be a risk factor. In one study, the false negative rate among patients 18 years of age and under was 2.9 percent.225 In an unrelated study, the false negative rate among patients 17 years of age and older was 30.8 percent.233 Although this latter study used a very different method and focused only on missed cases using point-of-care ultrasound,233 an older study (from prior to the study period), which directly compared younger and older patients with appendicitis, appears to corroborate older age as a risk factor. In that older study, diagnostic delays among older adults were more common than among children, with contributions from both “patient interval” (from symptoms to presentation) and “clinician interval” (from presentation to diagnosis) delay components—the result appears to be a higher rate of complications and greater mortality.239

Nine studies reported on the distribution of age and sex among patients with diagnostic errors related to fractures. However, the effect of age or sex on misdiagnosis of fractures was not quantified in any of the studies.

Three studies reported about patient characteristics and delayed diagnosis of testicular torsion.254, 255, 274 These studies focused on patient-related delays prior to seeking care (known as the “patient interval” in studies of cancer) as it related to delay in definitive therapies and clinical outcomes. Chan et al., 2019 found that patients with testicular torsion who underwent orchiectomy had significantly longer prehospital pain duration compared with those who underwent testicular salvage (18.75 versus 3.56 hours; P = 0.003).254 For patients who underwent orchiectomy, in-hospital time intervals were not significantly different than those who underwent testicular salvage. Bayne et al., 2017 found that patients in the delayed presentation group were 4 times more likely to have a developmental, cognitive, or social disorder than patients in the acute presentation group (10.6 versus 2.6%; P = 0.02). Half of the patients in the delayed group reported having autism spectrum disorder. Patients reporting a history of recent genital trauma were twice as likely to present in the delayed vs acute setting (14.9 versus 7%; P = 0.07). Misdiagnosed patients were younger and weighed less than those correctly diagnosed in the acute setting (9.9 years versus 12.9 years; P = 0.006; 42.6 kilograms versus 59.2 kilograms; P = 0.01). All boys who were misdiagnosed eventually underwent orchiectomy compared with 24.6 percent of those correctly diagnosed in the acute period (P < 0.0001).255 One study reported on potentially avoidable testis loss in the setting of delayed diagnosis and treatment of cryptorchidism.274 Despite international guidelines which recommend surgical exploration and orchidopexy prior to 18 months of age, the authors found 60% of the patients were above this age when they presented with preventable cases of testicular torsion. Further, there was significant delay (over 6 hours) from symptom onset to presentation to the ED in 72 percent of patients, which was associated with higher rates of orchidectomy (56% versus 23% in those who arrived within 6 hours; P = 0.04). There was not a significant effect of age on the duration of delay in ED presentation, and the effect of sex was not quantified.274

One study looked at rates of concordance and discordance between ED diagnosis and discharge diagnosis across all International Classification of Diseases (ICD)-10 codes and found no difference by age or sex.275 Another study focused on delays (including diagnostic delays) due to difficulty obtaining intravenous access (DIVA).276 Patients with DIVA (or 3.1% of the population) were more likely to be female, black and were more often triaged to a higher acuity track. Throughout the ED, DIVA was associated with delays in median time to completion of lab testing, intravenous fluid and contrast administration, pain medication administration and delayed admission and discharge orders.

Illness Characteristics

We identified 120 studies that reported on the effect of one or more illness-related factors on diagnostic error in the ED. Most of the studies had a low risk of bias and/or low concerns for applicability. However, 12 studies had concerns with patient selection58, 160, 177, 179, 194, 200, 244, 266, 271, 272, 277, 278 and 19 studies had concerns with the reference standard.176179, 186, 188, 195, 208, 213, 244, 267, 272, 275, 279284 We report the impact of illness characteristics separately by condition. The illness characteristics most studied were symptom type and illness severity, followed by mode of arrival and diagnostic tests ordered. Atypical presentations were the strongest and most consistent predictors of increased risk for a missed diagnosis. Comorbidities were often studied as outcome predictors (e.g., mortality), but generally not in relation to diagnostic error, apart from polytrauma, where comorbidities tended to increase the risk of missing a second disease.

Thirty-one studies reported on stroke, including five prospective cohorts, 18 retrospective cohorts, eight registries, two case-control studies, and one cross-sectional study that reported on the illness-related causes of diagnostic error among patients presenting to the ED. We meta-analyzed two studies, including 522 patients, indicating an increased risk of misdiagnosis in posterior circulation stroke (RR 2.51; 95% C, 1.46 to 4.33; I-squared 59%; Figure 18).159, 171 Three other studies also confirmed the increased risk of misdiagnosis in posterior circulation stroke but were not included in the meta-analysis because of study design, and lack of sufficient information.87, 186, 194 Atypical presentation66, 87, 88 and non-specific symptoms,184 dizziness,69, 159, 194 altered mental status,88, 194 and loss of consciousness,88, 194 syncope,194 headache,69 involuntary movement,69 and having a negative Face-Arm-Speech-Time test were associated with increased risk of misdiagnosis.161, 194 Compared to the correctly diagnosed stroke patients, misdiagnosed cases had a tendency to present without focal neurological deficits69, 88, 183, 186, 194 and with lower clinical severity as judged by the following: (a) ED triage resuscitation/emergency category,194, 195 (b) the National Institutes for Health Stroke Scale (NIHSS) score for ischemic stroke/TIA,164, 179, 183, 192, 197, 262, 285 (c) the ABCD2 score for TIA,69, 184, 185 and (d) the Hunt and Hess and Fisher scale scores for subarachnoid hemorrhage.181 By contrast, the presence of unilateral weakness or numbness were protective against misdiagnosis.88 A retrospective review of an acute stroke registry found a bimodal NIHSS distribution among missed stroke cases, indicating that very severe cases may also be at risk of misdiagnosis (e.g., due to presentation in stupor/coma from basilar artery occlusion).186 Mode of arrival by emergency medical services/ambulance also decreased delayed/missed diagnosis of stroke.160, 185, 188, 261, 262 MRI was performed equally often among misdiagnosed and correctly diagnosed cases,184, 194 but with longer delays among the misdiagnosed.194 Unsurprisingly, stroke-specific sequences such as diffusion-weighted MRI and neurovascular imaging were used less frequently among misdiagnosed cases.184 Whether or not ED patients with neurologic complaints had symptoms highly suggestive of TIA/stroke (e.g., aphasia or weakness), the use of head CT portended increased risk of subsequent stroke.189 Similarly, having a head CT scan at the index visit for headache was associated with an increased risk of subsequent cerebrovascular event91, 176; however, one study showed that a reduction in the use of head CT scan in ED visits for headache had no effect on the rate of death or subsequent cerebrovascular events.140 Two studies assessed delayed diagnosis of ischemic stroke/TIA among pediatric patients (n=181).177, 286 They found no effect for the type of first contact with the medical sector, pediatric NIHSS, level of consciousness, symptoms, or the location of the brain lesion. One study looked systematically for ischemic stroke among polytrauma cases brought to a trauma center and found 11 acute ischemic strokes among 192 patients (5.7%)—none were detected initially and none had neurologic consultation obtained at initial trauma triage (100% missed); the median time to diagnosis was 2 days (range 0-5).198 These studies all point to greater miss rates in case presentations with a higher degree of diagnostic difficulty (transient, milder, non-specific, or atypical symptoms21).

Figure 18 is a forest plot of the risk ratio of misdiagnosis among people who have posterior circulation stroke. The pooled risk ratio was 2.51 (95% CI, 1.46 to 4.33).

Figure 18

Risk ratio of misdiagnosis among people who have posterior circulation stroke. CI = confidence interval; RR = risk ratio

We identified 11 studies on myocardial infarction, including two registries, two prospective cohorts and seven retrospective cohorts that reported on the illness-related causes of diagnostic error among patients presenting to the ED. Meta-analysis was not possible because of differences in the definitions of the risk factors and diagnostic error. The rate of misdiagnosis was lower among patients with chest pain65, 209, 266 and more severe triage levels.25, 287 However, one study found no difference in the median door-to-balloon time between patients with and without angina pectoris265 and another study found no difference in the percentage of triage delay between triage levels.266

We identified 12 studies on aortic aneurysm/aortic dissection, including two registries, one randomized controlled trial, and nine retrospective cohorts that reported on the illness-related causes of diagnostic error among patients presenting to the ED. Meta-analysis was not possible because of missing data and differences in the definitions of the potential risk factors among these studies. Misdiagnosed cases were more likely to present with atypical symptoms,68, 73 dyspnea,73, 217 systolic blood pressure of above 105 mmHg,68, 73, 216 or clinical features resembling myocardial infarction, including angina pectoris,217 positive troponin,73, 288 and acute coronary syndrome-like findings on ECG.73 Cases who underwent CT scan had less diagnostic delay than those who did not.68, 73

We identified five studies with a low or unclear risk of bias on pulmonary embolism, including one prospective cohort and four retrospective cohorts that reported on the illness-related causes of diagnostic error among patients presenting to the ED with pulmonary embolism. We included three studies (655 patients), reporting on hemoptysis, cough, and pleuritic chest pain,278, 289, 290 and four studies (881 patients) reporting on dyspnea222, 278, 289, 290 in a meta-analysis (Figure 19). The risk of misdiagnosis increased with the presence of hemoptysis (RR 2.08; 95% CI 1.06 to 4.07; I-squared 0%) and cough (RR 1.75; 95% CI 0.98 to 3.13; I-squared 66.4%), decreased slightly with pleuritic chest pain (RR 0.86; 95% CI 0.54 to 1.37; I-squared 45.6%), and was not related to dyspnea at the index visit (RR 1.00; 95% CI 0.82 to 1.22; I-squared 64.2%). One study reporting an increased risk of delayed diagnosis in the absence of dyspnea was not included in the meta-analysis because of the unclear number of misdiagnosed patients.223 In addition, two studies reported on syncope in the clinical presentation of PE.278, 289 Kline et al., reported syncope at a higher rate among those diagnosed within 48 hours after leaving the ED (delayed) than patients diagnosed while in the ED at the initial presentation.278 However, Torres et al., found syncope at a similar rate among ED diagnosed patients and those sent home with a wrong diagnosis, but less frequently among patients diagnosed with PE during hospitalization.289 Compared to the correctly diagnosed patients with pulmonary embolism, misdiagnosed cases were less likely to have D-dimer tested in the initial work-up222, 289 and more likely to present with pulmonary infiltrates on chest X-ray.289, 290

Figure 19 displays a forest plot of the risk ratio of misdiagnosis of pulmonary embolism. Three studies (655 patients) reported on hemoptysis, cough, and pleuritic chest pain and four studies (881 patients) reported on dyspnea. The risk of misdiagnosis increased with the presence of hemoptysis (RR 2.08; 95% CI 1.06 to 4.07; I-squared 0%) and cough (RR 1.75; 95% CI 0.98 to 3.13; I-squared 66.4%), decreased slightly with pleuritic chest pain (RR 0.86; 95% CI 0.54 to 1.37; I-squared 45.6%), and was not related to dyspnea at the index visit (RR 1.00; 95% CI 0.82 to 1.22; I-squared 64.2%).”

Figure 19

Risk ratio of pulmonary embolism misdiagnosis in patients presenting with cough, dyspnea, hemoptysis, or pleuritic chest pain. CI = confidence interval; RR = risk ratio

We identified five or fewer studies on other conditions in the ED, where meta-analysis was not possible because of different definitions or statistical measures, or the limited number of reports on each risk factor.

Across these studies, higher triage severity increased misdiagnosed injuries in pediatric or adult trauma patients.74, 127, 130, 243, 291 However, clinical and triage severity decreased sepsis misdiagnosis among children93 and adults.157

Typical symptoms such as isolated scrotal pain for testicular torsion255 and right lower quadrant abdominal tenderness for appendicitis272, 292 decreased misdiagnosis. However, atypical presentation, as in isolated abdominal pain for testicular torsion255 and lack of abdominal pain or abdominal pain accompanied by constipation for appendicitis292 increased misdiagnosis. Imaging on first presentation by a single sonography or CT scan decreased diagnostic delay of testicular torsion.293, 294 The studies on preoperative imaging for appendicitis were inconclusive. In two reports, CT scan or sonography was performed less frequently among adults with missed appendicitis at index visit than those with a same-day diagnosis.272, 292 One study showed no change in the rate of negative appendectomy (i.e., false positive diagnosis of appendicitis) but a delay to surgery with preoperative imaging,295 while another study indicated that preoperative imaging reduced the false discovery rate from 10 percent to 3 percent.296

Clinician Characteristics

We identified 30 studies, including one randomized controlled trial, three prospective cohorts and 17 retrospective cohorts, two case-control studies, five cross-sectional studies, one registry, and one case series that reported on clinician characteristics associated with diagnostic error among patients presenting to the ED. Most of the studies had a low risk of bias and/or low concerns for applicability. However, five studies had concerns with patient selection132, 194, 200, 266, 297 and five studies had concerns with the reference standard.186, 202, 281, 298, 299 The sources of heterogeneity between studies were differences in the definitions of the clinician factors, study design, and patient selection. Provider type and clinical experience (including training level) were the most frequently reported factors. Most studies were limited by a retrospective design that made it difficult to evaluate some potential risk factors such as clinician fatigue.

Numerous studies addressed accuracy of diagnosing patients based on provider type. For strokes, Arch found that neurological consultation was strongly associated with fewer diagnostic errors (35% [n=20/55] of missed cases were seen by a neurologist, while 95% [n=213/225] of correctly diagnosed cases were seen by a neurologist, P < 0.001).159 These numbers correspond to a false negative rate of 9 percent (n=20/233) for neurologists which is roughly half of the estimated overall error rate in the ED shown in KQ2 (which includes multiple studies that gave “credit” to correct ED diagnoses employing neurological consultation); however, authors did not report results on a per-symptom or case-mix adjusted basis, so these results could potentially be confounded by indication (i.e., neurologists might have been consulted disproportionately in obvious cases and they might not have fared so well in subtle cases). Richoz found that “ED physicians were a little less than twice as likely as neurologists or neurologists in training to miss the right diagnosis.”186 Importantly, this was in cases where the ED initially missed the diagnosis (among 43 initially missed strokes by ED physicians, 33 underwent neurological consultation without suspicion for stroke by the ED, and 14 of these were correctly diagnosed as stroke by the neurologist); however, the neurologist also caused the misdiagnosis in four cases suspected to represent strokes by the ED clinician.186 In two multivariable models, Morgenstern found point estimates that neurology consultation (obtained in just 8.6% of stroke cases) cut diagnostic error by 34 to 51 percent, but results were imprecise and confidence intervals overlapped with no difference.179 Venkat found that neurology service admission (versus non-neurology service) was associated with lower rates of stroke misdiagnosis (non-neurology admissions were 11% of correctly diagnosed vs. 35% of misdiagnosed cases, p < 0.001).194 Liberman found that, in cerebral venous sinus thrombosis, fewer misdiagnosed patients had neurology consultations, but the result was not statistically significant (81.8% among misdiagnosed vs. 95.2% among correctly diagnosed, P = 0.19).173 Yi found that access to telestroke video consultations did not reduce false positive transfers for mechanical thrombectomy for large vessel occlusions causing stroke.285 In summary, studies that assessed neurologist accuracy found that neurologists generally missed fewer strokes than ED clinicians, but neurologists also missed strokes (even sometimes when ED clinicians correctly suspected them). There was also clear evidence of opportunities for improvement by ED clinicians in stroke diagnosis. Vaghani found a large number of patients who presented with red flags and multiple stroke risk factors did not undergo appropriate ED diagnostic evaluation, and processes failures related to the patient–provider encounter (history and physical examination) were the most frequent cause of diagnostic errors.300 However, one study of ED patients with suspected acute stroke found that formal use of bedside diagnostic stroke scales improved ED clinician sensitivity for detecting stroke over ED clinical impression alone (76% clinical impression vs. 83% using Recognition of Stroke In the Emergency Room (ROSIER) [P = 0.005]; use of the Face Arm Speech Test [FAST] scale by ED clinicians was not statistically different than use of ROSIER by ED clinicians 81% FAST vs. 83% ROSIER [P = 0.39]).196 Results were said to be similar whether the scales were performed by a physician or nurse. This suggests that relatively simple interventions might be helpful.

For potential surgical conditions, surgeons were less likely to miss ruptured aortic aneurysm than internists.219 However, compared to emergency physicians, surgeons were more likely to misdiagnose common surgical complaints in the pediatric ED including head trauma, testicular pain, and abdominal pain.126 Data suggest that early recognition of diseases such as testicular torsion needing emergent treatment are sometimes delayed and/or missed; in these cases absence of early surgical consultation was deemed to be the main cause.274, 301

For radiographic diagnoses, specialists in radiology generally provided more accurate diagnoses, and subspecialists were the most accurate when interpreting images in their own subspecialty. ED clinicians (non-radiologists) had significantly higher error rates compared to radiologists and radiology residents when interpreting ED imaging.299 In acute stroke patients, neuroradiologists missed fewer large vessel occlusions on CT angiography than non-neuroradiologists.162 However, subspecialty radiologists who interpreted ED imaging outside their area of expertise had diagnostic error rates similar to radiology residents.299 Radiologists were less likely to miss acute mesenteric ischemia on CT imaging of patients with acute abdominal pain if clinicians had suspected the diagnosis prior to CT referral.227 In patients with fractures who had imaging read only by an orthopedic surgeon (without attending radiology backup during the visit), the incidence of false-negative fractures was 2.2 percent,83 which is slightly higher than that generally reported for radiologists (see KQ2 fractures).

Multiple studies addressed accuracy of diagnosing patients at the bedside based on clinical experience, including training level. Less clinical experience of ED clinicians showed a trend towards increased stroke misdiagnosis (≤6 years of experience OR 1.20; 95% CI 0.80 to 1.75)69 but was not identified as a predictor of missed myocardial infarction.202 A stroke study “found no significant difference in diagnostic accuracy between neurologists and trained neurology residents.”183 Radiology residents were more prone to diagnostic error than attendings in the diagnosis of stroke,162 subtle pelvic fractures130 and interpreting CT scan,132 CT angiography,55 and MRI.133 Nevertheless, one large study comparing “off hours” initial radiology resident reads to those of attending radiologists (n=81,201) found diagnostic errors in just 0.2 percent.241 Earlier year of residency in radiology was associated with greater risk of MRI misinterpretation.133 Although radiology residents had overall suboptimal sensitivity (87%) for detecting intracranial aneurysms on head CT angiography in subarachnoid hemorrhage patients, there was no clear benefit to overall diagnostic accuracy based on year of residency training.55

We identified one study assessing training background which found that hospitals with a greater proportion of emergency medicine board certification among their ED clinicians was associated with fewer missed diagnoses of myocardial infarction (median 0.3% [interquartile range 0 to 1.15] for hospitals in the top quartile versus median 2.0% [interquartile range 0 to 33.33] for hospitals in the bottom quartile of emergency medicine board certification).202

We did not find studies that addressed a clinician’s history of disciplinary action as a predictor. We also did not find studies that addressed clinician fatigue as a predictor.

Fixed Systems Factors

Fixed systems factors were those that would generally not change on a given day or at a given visit for a specific patient. We identified 21 studies that reported on fixed facility or health systems factors that were associated with diagnostic errors/harms.25, 63, 64, 78, 93, 156, 160, 178, 188, 189, 195, 202, 208, 244, 253, 281, 302305 We identified six studies that reported on both fixed and dynamic factors.25, 63, 64, 93, 188, 195 Nine studies took place in the United States,63, 64, 78, 156, 160, 178, 188, 189, 202 seven studies took place in Canada,25, 93, 195, 211, 244, 303, 304 three studies took place in the United Kingdom or Western Europe,253, 281, 302 and two studies were based in Australia.208, 305

Twelve studies reported on the association between a facility’s teaching status and rates of misdiagnosis.25, 63, 64, 78, 93, 178, 188, 189, 193, 195, 202, 302 Schull et al, 2006, Cifra et al., 2020, and Rosenman et al., 2020 found no association between teaching status and myocardial infarction, sepsis, and stroke misdiagnosis respectively.25, 78, 189 All other studies found significantly lower odds of misdiagnosis at academic centers.

Five studies reported on variation by U.S. geographic region in rates of misdiagnosis.63, 64, 78, 202, 303 Moy reported significantly higher odds of myocardial infarction misdiagnosis in the Midwest relative to the Northeast.63 Wilson reported significantly lower rates of myocardial infarction misdiagnosis in the Mid-Atlantic, West, South, Central, and Mountain regions of the country.202 Newman-Toker reported non-significantly lower rates of stroke misdiagnosis in the Northeast relative to the Midwest, South, and West.64 Cifra reported significantly higher odds of sepsis misdiagnosis in California, Florida, and Massachusetts relative to New York.78 Cheong reported on geographic variation in Canada, and found the West had significantly higher odds of appendicitis misdiagnosis that relative to the Maritime region.303

Five studies reported on facility ownership/business models.63, 64, 78, 160, 202 Moy, Newman-Toker, and Cifra found no association between facility ownership and rates of myocardial infarction and stroke misdiagnoses respectively.63, 64, 78 Wilson found that public hospitals had higher odds of myocardial infarction misdiagnosis relative to private hospitals.202 Bhattacharya found non-significantly higher rates of correctly diagnosed strokes among young adults presenting to primary stroke centers (PSC) relative to non-PSCs.160

Four studies reported on the association between population density and rates of misdiagnosis.63, 64, 202, 208 Moy and Wilson found a significant association between lower population density and higher rates of myocardial infarction misdiagnosis.63, 202 Williams also reported an increased rate of myocardial infarction misdiagnosis in rural regions of Australia, but did not report on significance.208 Newman-Toker found that small metropolitan regions had lower odds of stroke misdiagnosis relative to large metropolitan areas; the effect size was small but statistically significant.64

Four studies reported on the association between average ED volume/annual number of visits and diagnostic accuracy.64, 78, 93, 211 Ko, in a study following approximately 500,000 Canadian adults with treat-and-release ED visits for chest pain, found that EDs with higher annual volumes of chest pain complaints had significantly lower rates of myocardial infarction/unstable angina hospitalizations and death in the 30 days following those treat-and-release ED encounters.211 This trend continued until the ED volume reached 1400 annual visits—once volumes exceeded 1400 annual chest pain visits, there was no longer a significant reduction in the rates of acute coronary syndrome hospitalizations or death relative to the lower-volume EDs. Newman-Toker found that lower-volume EDs had significantly higher odds of stroke misdiagnosis, and that moderate-volume EDs had non-significantly higher odds of stroke misdiagnosis relative to high-volume EDs.64 Cifra found that rates of pediatric sepsis misdiagnosis were significantly higher in lower-volume EDs.78 Vaillancourt did not find a significant association between ED volume and the rate of pediatric sepsis misdiagnosis.93

Two studies reported on access to electronic health records.78, 304 Cifra found that hospitals’ accuracy decreased non-significantly when diagnosing pediatric sepsis if the hospital had fully implemented electronic health records.78 Gouin found that emergency physicians’ diagnostic accuracy increased non-significantly with use of digital versus conventional radiography viewing using a Picture Archiving and Communications System (PACS).304

Two studies reported on access to testing.63, 202 Both studies found that access to cardiac catheterization facilities reduced risk of myocardial infarction misdiagnosis, but the findings in the Wilson study were not statistically significant.202 The Wilson study also found a significant benefit to diagnostic accuracy from being at what was classified as a “high-tech” hospital.202

Two studies reported on average ED discharge fraction.63, 64 Newman-Toker found that higher discharge fractions were associated with increased risk of stroke misdiagnosis.64 Likewise, Moy found that higher discharge fractions were associated with significantly higher odds of myocardial infarction misdiagnosis.63 Both studies were compatible with findings from a large Medicare-based study (outside the systematic review) which found that unexpected deaths (associated with apparent diagnostic errors) within 7 days of an ED treat-and-release visit were increased at EDs with higher discharge fractions. In that study, hospitals in the lowest quintile of admission fraction from the ED had the highest rates of early death—3.4 times higher (0.27% versus 0.08%) than hospitals in the highest quintile of admission fraction—despite serving healthier populations, as measured by overall 7-day mortality among all comers to the ED.148

Two studies reported on average inpatient occupancy rates influencing misdiagnosis rates.63, 64 Newman-Toker found that occupancy rates did not affect rates of stroke misdiagnosis.64 Moy found significantly lower rates of myocardial infarction misdiagnosis among hospitals with higher (classified as “medium” or “high” relative to “low”) occupancy rates.63 The implications of this finding are uncertain.

One study reported on access to consultants.93 Vaillancourt found that hospitals with access to pediatric consultations improved accuracy among children with meningitis or sepsis.93

We did not identify any studies that evaluated the association between delivery or payment models and rates of misdiagnosis.

Dynamic Systems Factors

Dynamic, context-specific systems factors were those that might change on a given day or at a given visit for a specific patient. We identified 17 studies that reported on dynamic, context-specific systems factors.25, 63, 64, 88, 92, 93, 122, 127, 162, 183, 188, 195, 208, 253, 265, 285, 286 Seven studies took place in the United States,63, 64, 88, 122, 188, 265, 285 five studies took place in the United Kingdom or Western Europe,92, 127, 162, 183, 253 three studies took place in Canada,25, 93, 195 and two studies took place in Australia.208, 286

Sixteen studies reported on the rates of misdiagnosis during off-hours.25, 63, 64, 88, 92, 93, 127, 162, 183, 188, 195, 208, 253, 265, 285, 286 Newman-Toker reported significantly increased odds of stroke misdiagnosis during off-hours.64 Muhm also reported increased cases of misdiagnosis in polytrauma cases during off-hours though did not report on significance.127 Rose reported suspected stroke patients had more rapid access to immediate CT scans during off hours.188 Parikh, Schull, Daverio, and York reported mixed results.25, 253, 265, 286 Moy, Vermeulen, Fasen, Williams, Pihlasviita, Madsen, Yi, Mirete, and Vaillancourt reported no effect.88, 92, 93, 162, 183, 195, 208, 285

We identified two studies that reported on the relationship between same-day ED crowding and rates of misdiagnosis and found mixed results.63, 64 Unexpectedly, Moy63 found that high levels of same-day ED crowding were associated with lower risk of misdiagnosis (OR 0.78, P = 0.009), but this appears to have been a univariable analysis that might have been confounded by other factors (e.g., high same-day ED admission fraction, which was not measured, but in a similar analysis for stroke64 was strongly protective against error). Although Newman-Toker64 did not find an association between same-day ED crowding and odds of stroke misdiagnosis, there was an increased risk of stroke misdiagnosis among incomplete ED visits (e.g., patient left against medical advice), suggesting that incomplete diagnostic assessments may be more important than crowding, per se (though it is expected that overcrowding is likely to increase incomplete ED visits). We identified one study that reported on the relationship between same-day ED discharge fraction and rates of misdiagnosis which found a strong association, with higher discharge fraction on the day of the visit (top quintile versus bottom quintile) increasing the odds of a misdiagnosis 6.3-fold (P < 0.001).64 We found no studies that assessed the association between handoffs, same-day ED staffing, or same-day ED illness severity and misdiagnosis rates, but the Okafor 2016 incident report study did note inadequate or failed handoffs (5% of all causes) and high workload (11% of all causes) as contributing factors.31

Key Question 3e. Are there significant commonalities or differences among causes of ED diagnostic errors or associated harms across clinical conditions?

The clearest and most consistent causal connections across conditions are that (1) most ED diagnostic errors happen at the bedside and disproportionately involve cognition and clinical judgement as root causes; (2) illness characteristics are a strong and consistent predictor of diagnostic error—“obviousness” predicts correct diagnosis and “subtlety” predicts incorrect diagnosis; and (3) the final common pathway for false negatives in patients with dangerous underlying diseases is failure to order tests or consultations, resulting in inappropriate discharge from the ED. It is the second of these that merits additional consideration here, because some heterogeneity in results identified in the systematic review can be explained via the interaction between illness characteristics and other characteristics (e.g., patient demographics).

Atypical or non-specific symptoms were among the strongest and most consistent predictors of increased risk for a missed diagnosis across diseases. On the one hand, this is almost a truism—clinicians do not miss diagnoses when they are obvious, they miss them when they are subtle. On the other hand, it is a deeply complex problem, because “subtlety” comes in multiple forms: (a) low prevalence/ pre-test probability/ base rate (e.g., hemiplegia is caused by stroke more than half the time; dizziness is caused by stroke just 3 to 5 percent of the time); (b) degree of difficulty (e.g., it may be intrinsically harder to perform the bedside HINTS eye movement exam to differentiate stroke from inner ear disease306308 than to order a troponin level to identify myocardial infarction); (c) training/ background knowledge/ familiarity/ expertise (e.g., training in emergency medicine focuses more on critical care neurology than on what has been called “acute diagnostic neurology” the medical discipline concerned with the initial assessment, diagnosis, management, and referral of patients presenting with new neurologic symptoms that are not obviously due to serious, life-threatening neurologic diseases … but might be.”309; thus, the varied presentations of stroke may be more challenging to sort out than heart attacks).

An interesting twist on the issue of atypical case presentations is how it interacts with other predictors, leading to seemingly contradictory findings that are, in fact, internally consistent. For example, the effect of age is heterogeneous and disease-specific (e.g., younger age increases risk of missed stroke while older age increases risk of missed appendicitis). However, it is likely that these findings are largely explained by atypicality because the disease is occurring in the “wrong” patient population. Stroke is a disease of the elderly, so younger patients with stroke are atypical (and therefore more likely to be misdiagnosed). Likewise, appendicitis is a disease of the young, so older patients with appendicitis are atypical (and therefore more likely to be misdiagnosed). The same applies to illness severity, again with seemingly contradictory findings. For undiagnosed serious medical illnesses, less severe presentations and less urgent modes of arrival increase misdiagnosis risk; for multi-trauma patients, the reverse is true—more, rather than less, severe presentations increase misdiagnosis risk. Again, context is crucial—in the case of undiagnosed serious medical illnesses, higher severity is a “signal” that makes diagnosis easier, but in the case of polytrauma, higher severity is “noise” that makes diagnosis (of the subtle hand fractures, for example) harder or less pressing.

Achieving equity in diagnosis by addressing racial and other diagnostic health disparities is of recognized importance to achieving diagnostic excellence.310 Not all studies found an increased risk of diagnostic error with female gender or non-white race, but no studies that normalized for baseline risk of having the target disease found these demographic factors to be protective. In general, most studies that found an association with gender, race, or ethnicity, found a 20 to 30 percent increased risk of diagnostic error for women and minorities. The remaining studies showed null effects. Heterogeneity in data presentation made it challenging to perform meta-analysis to estimate an average health disparity-related effect, and the role of implicit or explicit bias was not directly measured. Much of the apparent heterogeneity in results for demographic predictors may stem from confusion about the inferences to be drawn from different study designs. The look back method speaks to the relative risk or odds of a misdiagnosis conferred on a patient based solely on their gender or race, while the look forward method estimates the absolute risk of a misdiagnosis based on a mix of disease and misdiagnosis prevalence. Because ED clinicians are likely to calibrate their decision-making to baseline disease prevalence, this may contribute to some proportion of the demographic disparities seen in diagnosis, if actual disease prevalence is lower among women or minorities (see Discussion for additional details). Disparities in diagnosis should be a focus of future research, and special care should be taken to ensure that rigorous epidemiologic and statistical methods are used to address this concern, since incorrect methods can lead to erroneous inferences.

It was noteworthy that testicular torsion was one of the few conditions which focused heavily on risk factors that increased the “patient interval” (time prior to engaging the healthcare system). In particular, studies assessed whether the patient was cognitively impaired or developmentally delayed. While this does occur for other conditions (e.g., delays in seeking stroke care are linked to memory impairment, health literacy, and race),311313 it may be that the symptom of testicular pain is particularly challenging for young boys to share with their parents, leading 6 percent of patients (n=12 of 208) to hide their symptoms for more than 24 hours.255

Fewer studies addressed clinician characteristics, facility characteristics, and dynamic, context-specific systems factors. Results were heterogeneous, but notable predictors of misdiagnosis in some studies included care provided by less experienced clinicians, at non-teaching hospitals, with high ED discharge fraction, and during off hours.

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (8.1M)

Other titles in this collection

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...