NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
Newman-Toker DE, Peterson SM, Badihian S, et al. Diagnostic Errors in the Emergency Department: A Systematic Review [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2022 Dec. (Comparative Effectiveness Review, No. 258.)
Findings in Relation to the Decisional Dilemmas
The key decisional dilemma for this evidence review is “What are the most common and significant medical diagnostic failures in the emergency department (ED), and why do they happen?” This report summarizes current best evidence as it relates to the nature, frequency, and causes of diagnostic error in the ED. It provides the first comprehensive look at current best evidence related to ED diagnostic error and fills key gaps in prior understanding. The report’s findings offer new insights into which clinical problems should be targeted for solutions, how the impact of those solutions might be measured, and what types of interventions are most likely to succeed.
Key Question (KQ) 1. What clinical conditions are associated with the greatest number and highest risk of ED diagnostic errors and associated harms?
Although limited by biases in the data towards diseases causing more severe harms when missed, the top 20 individual diseases associated with diagnostic errors (independent of harm severity), in approximate rank order, were found to be fracture, stroke, myocardial infarction, appendicitis, venous thromboembolism, spinal cord compression and injury, aortic aneurysm and dissection, meningitis and encephalitis, sepsis, traumatic brain injury and traumatic intracranial hemorrhage, arterial thromboembolism, lung cancer, ectopic pregnancy and ovarian torsion, pneumonia, testicular torsion, gastrointestinal perforation and rupture, spinal and intracranial abscess, open and non-healing wounds, cardiac arrhythmia, and intestinal obstruction (with or without hernia). It is likely that this list of misdiagnosed diseases, which derive from two large “numerator-only” studies (one malpractice-based and one incident report-based), is strongly skewed by reporting bias towards diseases that, when missed, lead to serious harms. It is also likely that the list is skewed towards false negatives relative to false positives; many “benign” diseases that do not cause immediate threat to life or limb are likely missed in far higher total numbers than most of these disorders. Finally, it may also be partly skewed towards errors likely to be confirmed in hindsight by radiographic review, including both fractures and lung cancer.
Best available evidence indicates that the top 15 individual diseases associated with the greatest number of serious misdiagnosis-related harms in the ED, in rank order, were (1) stroke, (2) myocardial infarction, (3) aortic aneurysm and dissection, (4) spinal cord compression and injury, (5) venous thromboembolism, (6/7 – tie) meningitis and encephalitis, (6/7 – tie) sepsis, (8) lung cancer, (9) traumatic brain injury and traumatic intracranial hemorrhage, (10) arterial thromboembolism, (11) spinal and intracranial abscess, (12) cardiac arrhythmia, (13) pneumonia, (14) gastrointestinal perforation and rupture, and (15) intestinal obstruction. These data derive from a large, nationally representative study of malpractice claims in the United States17 and are bolstered by corroborating data from a similarly large, nationally representative incident reporting system in the United Kingdom16; together these two studies represent 78 percent of the diagnostic error cases analyzed for KQ1 (n=4,561 of 5,817). These results are further bolstered by results from a recent malpractice study that was not included in the report, because it was identified after completion of our grey literature search. This was a report on ED diagnostic errors from The Doctor’s Company which also identified stroke as the top category, stating, “The top categories for final diagnosis among the settled claims differed slightly. The highest classification remained cerebrovascular disease, but at a larger percentage (18 percent).”314 Unsurprisingly, spinal abscess, myocardial infarction, aortic aneurysm and dissection, arterial thromboembolism, and sepsis all also appeared among the top missed conditions. It is possible that missed myocardial infarctions and lung cancers may be overrepresented in malpractice claims, so their ranks could be overstated. However, it is also likely that this is a relatively unbiased list of diseases leading to serious misdiagnosis-related harms. The source data (which are organized by the final, correct diagnosis) likely reflect almost entirely false negatives, but this is probably still an accurate reflection of serious misdiagnosis-related harms. Put differently, death or permanent disability is probably a rare outcome among patients with non-life- or limb-threatening diseases mistaken for dangerous ones (e.g., migraine mistaken for stroke and leading to excess imaging and hospital admission). Nevertheless, complications (including death) can certainly occur, especially when invasive procedures are involved, such as when surgery is performed because of a false positive appendicitis diagnosis.315 The precise frequency of such adverse outcomes is unknown, but it is likely that such cases would appear in medicolegal claims with equal or greater odds relative to false negative diagnoses of dangerous illnesses, since the legal claim must present evidence that the patient’s outcome would have differed but for the diagnostic error, and this is more easily proven for a patient whose misdiagnosis is a false positive (i.e., “healthy” without the disease) who suffers a complication from treatment for an incorrect diagnosis than for a patient whose misdiagnosis is a false negative (i.e., “sick” with the disease) who suffers from the disease itself.
Taken together, these 15 diseases account for an estimated 68 percent of all serious harms from diagnostic error in the ED. The so-called “Big Three” disease categories (vascular events, infections, and cancers), in their totality, account for an estimated 72 percent of all ED diagnostic errors resulting in serious misdiagnosis-related harms. However, major vascular events (42%) and infections (23%) substantially outnumber cancers (8%) in the ED clinical setting. Pediatric populations have fewer high-severity harms than adults and, unlike adults, more infections than vascular events; less is known about the ranks of specific disease distributions.17
When considering ED diagnostic errors of mixed severity, missed fractures are the most frequent conditions reported in malpractice claims and incident reports.16, 31, 71, 80, 90 However, the level of harm associated with most missed fractures is generally lower than that for missed major medical and neurologic events,17 so they are not among the more common causes of serious misdiagnosis-related harms to patients. Perhaps more importantly, they may be overrepresented in claims as well as incident reports due to ascertainment and reporting biases, perhaps related to the relative ease with which radiographic misdiagnosis can be documented (i.e., using the tangible artifact of the radiograph), even well after the fact (see KQ1a above for details). Epidemiologic data suggest that other diagnostic errors (e.g., for conditions producing lower-severity harms and unaccompanied by radiographs) are likely far more frequent than fractures yet go unaccounted for in malpractice claims or incident reports. For example, missed diagnoses of inner ear diseases are likely an order of magnitude more frequent than fractures, yet do not appear on “top ten” lists of the most commonly missed conditions (see KQ1a). Missed appendicitis is also commonly noted in such reports, but data on frequency are conflicting.
The most commonly misdiagnosed clinical presentations may be abdominal pain, trauma, and neurological symptoms (e.g., dizziness, headache, back pain). However, data are sparse.
Gaps filled: Prior to this report, there was a clear evidence gap regarding the most frequent diseases missed in the ED, and data from different sources appeared conflicting. Best available evidence regarding the most frequent causes of serious misdiagnosis-related harms has now been synthesized, and clearly points to missed vascular events and infections as the principal causes, with stroke the undisputed leader in total serious harms, particularly permanent disability. Just 15 diseases likely account for more than two-thirds of all serious harms; this means that eliminating preventable patient harms from ED diagnostic error is more tractable than previously imagined.
Gaps identified: A number of gaps were identified in preparing this report. These are described below in the section on Strengths and Limitations (Evidence subsection).
KQ2. Overall and for the clinical conditions of interest, how frequent are ED diagnostic errors and associated harms?
Although based on just a few higher-quality studies less likely to be impacted by systematic under-ascertainment bias, best available evidence indicates that an estimated 5.7 percent (95% confidence interval [CI] 4.4 to 7.1) of all ED visits will have at least one diagnostic error. The overall (not disease-specific), per ED visit, potentially preventable diagnostic adverse event rates were estimated as follows: any harm severity 2.0 percent (95% CI 1.0 to 3.6), serious misdiagnosis-related harms (i.e., permanent, high-severity disability or death) 0.3 percent (plausible range [PR] 0.1 to 0.7), and misdiagnosis-related deaths 0.2 percent (PR 0.1 to 0.4). For each misdiagnosis-related death, it is estimated that there are roughly 0.41 (PR 0.27 to 0.60) ED patients suffering non-lethal, permanent, serious disability. If generalizable to all U.S. ED visits (130 million), that translates to over 7 million ED diagnostic errors, over 2.5 million diagnostic adverse events with preventable harms, and over 350,000 serious misdiagnosis-related harms, including more than 100,000 serious, permanent disabilities and 250,000 deaths. This is equivalent to a diagnostic error every 18 patients, a diagnostic adverse event every 50 patients, a serious harm (serious disability or death) about every 350 patients, and a misdiagnosis-related death about every 500 patients. Put in terms of an average ED with 25,000 visits annually and average diagnostic performance, each year this would be over 1,400 diagnostic errors, 500 diagnostic adverse events, and 75 serious harms, including 50 deaths. These estimates corroborate the National Academy of Medicine (NAM) position that improving diagnosis is a “moral, professional, and public health imperative.”5
The overall preventable diagnostic adverse event rate of 2.0 percent and misdiagnosis-related death rate of 0.2 percent both come from the only high-quality, prospective study to look at diagnostic adverse events using systematic phone and chart review follow-up on 503 patients both discharged and admitted from the ED. The death rate from such a small study is necessarily imprecise, but supported by corroborating, indirect evidence from other sources, including the other high-quality prospective study of mortality (see report text of KQ2 for elaboration). Retrospective trigger-based studies included many more ED visits (sometimes hundreds of thousands) and often revealed substantially lower rates, but this was almost certainly due to systematic under-ascertainment, as described in the report text for KQ2. Estimates of diagnostic adverse events varied more than 100-fold across studies (i.e., across hospitals) from 0.01 percent at a large, U.S.-based tertiary care ED to 1.6 percent at a small regional ED in Denmark. It is unknown how much of this high degree of variation is real versus study design related.
Variation in diagnostic error rates by disease were striking, with the lowest per-disease diagnostic error rate being for myocardial infarction (false negative rate 1.5%), well below the estimated average diagnostic error rate across all diseases (5.7%). Most of the top harm-producing dangerous diseases are initially missed in the ED at rates of 10 to 36 percent, but spinal abscess is likely the principal high outlier with 56 percent missed initially. There is roughly an inverse relationship between annual disease incidence and diagnostic error rates, although myocardial infarction is clearly a low outlier. Among the diseases producing frequent death or disability, myocardial infarction stands alone as an exemplar for which ED miss rates have been reduced to a near-zero level, and its rank in malpractice studies may be overstated.
Gaps filled: Prior to this report, there was a clear evidence gap regarding the frequency of diagnostic errors and misdiagnosis-related harms, and data from different sources were highly variable and difficult to compare. Importantly, specific studies identified during the review strongly point to a high degree of systematic under-ascertainment of both errors and harms in the most common types of retrospective studies. Best available evidence regarding the frequency of diagnostic errors and harms both per ED visit and per disease case has now been synthesized. Evidence clearly points to a large public health burden of ED diagnostic errors and rates of diagnostic error for most dangerous diseases that offer a fair amount of “room for improvement.” We also present the first meta-analytically supported data on increased mortality from diagnostic error. Finally, demonstrating what appears to be large inter-ED variability in diagnostic error rates suggests many errors are likely remediable, rather than “the price of doing business.”
Gaps identified: A number of gaps were identified in preparing this report. These are described below in the section on Strengths and Limitations (Evidence subsection).
KQ3. Overall and for the clinical conditions of interest, what are the major causal factors associated with ED diagnostic errors and associated harms?
Best available evidence indicates that cognitive errors dominate. Although errors were often multifactorial, nearly 90 percent of cases involved failures of clinical decision-making or judgment, regardless of the underlying disease present. Key process failures were errors in diagnostic assessment, test ordering, and test interpretation. Most often these were attributed to inadequate clinical knowledge, skills, or reasoning, particularly in “atypical” cases.
Atypical presentations, non-specific symptoms, and diseases that seem “out of place” (e.g., stroke in a younger patient or appendicitis in an older patient) were among the strongest and most consistent predictors of increased risk for a missed diagnosis across diseases. In other words, clinicians do not miss diagnoses when they are obvious, they miss them when they are subtle. Therefore, solution-making to eliminate preventable harms from diagnostic error must be focused entirely on subtler disease presentations, not obvious ones. For example, it is thoroughly insufficient to attempt to tackle missed stroke in the ED by strengthening existing stroke treatment pathways and reducing door-to-needle times for administration of thrombolytic therapies. Instead, it is essential to create mechanisms that rapidly identify patients with subtle stroke symptoms which are prone to be missed (e.g., dizziness and headaches), in order to bring such patients into stroke treatment pathways so they too may benefit from prompt therapy (e.g., dual antiplatelet therapy for early secondary prevention, which, if applied in the first 24 hours, lowers risk of major stroke after minor stroke or transient ischemic attack by 34% over the next 21 days316).
Taken together, this suggests that interventions to reduce harm from ED diagnostic error must directly tackle problems in bedside diagnostic skills and clinical reasoning for atypical presentations of the 15 diseases producing the most harm. If substantial headway is to be made, we must develop system-wide solutions to address these cognitive problems.2 Options fall into three basic mechanisms that all target increasing the availability of diagnostic expertise:
- (1)
enhance the expertise of ED clinicians through deliberate practice training and feedback;
- (2)
support ED clinicians’ decision-making through teamwork, including access to experts;
- (3)
minimize cognitive load by deploying technologies that digitally encapsulate expertise.
Achieving equity in diagnosis by addressing diagnostic health disparities is of acknowledged importance to achieving diagnostic excellence.310 Studies that normalized for baseline risk of having the target disease often found an association with gender, race, or ethnicity, with a roughly 20 to 30 percent increased risk of diagnostic error for women and minorities.
Gaps filled: Prior to this report, there was a clear evidence gap regarding the overall causes of diagnostic errors and misdiagnosis-related harms in the ED, including both root causes and contextual risk factors. Clear results here point to a high frequency of cognitive errors in cases with subtle or atypical clinical presentations. This identifies a clear target for systems-based interventions that target cognitive error—increase the availability of diagnostic expertise at the point of care for dangerous diseases with a known high rate of misdiagnosis-related harms.
Gaps identified: A number of gaps were identified in preparing this report. These are described below in the section on Strengths and Limitations (Evidence subsection).
Strengths and Limitations
Evidence
Overall, the evidence available supported answers to all three Key Questions, including a majority of the sub-questions. On KQ1 (diseases), the literature was relatively strong for diseases causing more severe harms but fairly weak on the disease distribution for lower-severity errors. On KQ2 (frequency), the literature was strong on false negatives but relatively weak on false positives. Estimates for overall error and harm rates were drawn principally from three smaller studies (combined n=1,758), none U.S.-based, but these were the only studies that did not restrict patients by disease and still conducted systematic patient follow-up to minimize under-ascertainment of diagnostic errors. There is reason to believe that both the overall and disease-specific results generalize to U.S.-based EDs (see Applicability Section). On a disease-specific basis, literature about error frequency was strongest for stroke, myocardial infarction, and aortic aneurysm and dissection; weaker for venous thromboembolism, meningitis and encephalitis, sepsis, arterial thromboembolism, spinal abscess, pneumonia, appendicitis, fractures, and testicular torsion; and absent for endocarditis, necrotizing enterocolitis, sudden cardiac death, arrythmias, congenital heart disease, ectopic pregnancy, and pre-eclampsia/eclampsia. On KQ3 (causes), the literature was strongest for patient and illness characteristics and relatively weaker on clinician characteristics, fixed systems factors, and dynamic systems factors. Overall, there is a relative paucity of literature on diagnostic errors among pediatric ED populations. More studies are warranted, including research on how the distribution of diseases (KQ1), rates of diagnostic error (KQ2), and causes/risk factors (KQ3) differ from those in adult patients. Specific gaps identified for each question with potential remedies are described below for each KQ.
The list of diseases under consideration for the overall search and, specifically, KQ2, was prespecified on the basis of prior literature and informed by the use of a TEP and Key Informant interviews. This approach was chosen because, in the timeframe for conducting the work, it was not possible to complete the KQs “in series” (i.e., to do KQ1 first and then start the search anew for KQ2 and KQ3). Thus, this was the only methodologically feasible approach. This represents a limitation (particularly as relates to the list of diseases assessed in KQ2). We have assessed the impact of this limitation through the final results derived from KQ1, and the impact appears to have been modest. The prespecified list appears to have been fairly complete vis-à-vis the most common causes of misdiagnosis-related harms—for example, in the largest incident report study of ED diagnostic errors (n=2,288) (which was not used to determine the prespecified list), all top 12 conditions found in that study (Hussain et al., Table 116) appeared in our prespecified list. No other conditions identified in that study had higher individual frequency, and, collectively, all of those other conditions combined accounted for just 30 percent of the total incidents reported (n=679/2,288) (i.e., our list embraced more than 70% of the total incident reports related to diagnostic error and all of the top conditions). Our prespecified list included searches for error rates for 14 of the top 20 diseases identified in malpractice claims and 10 of the top 15 associated with the largest number of serious misdiagnosis-related harms. While some conditions (particularly those affecting children) may have been underrepresented (e.g., missed child abuse/non-accidental trauma), we found no evidence to suggest that using a prespecified list based on prior literature, Technical Expert Panel and Key Informant interviews appreciably affected our results. However, because of the constrained focus on the most common conditions, we do not have data on misdiagnosis of less common conditions that may nevertheless be of importance to ED clinicians (non-accidental trauma, necrotizing fasciitis, compartment syndrome, brain tumors, obstructive hydrocephalus, ovarian torsion, post-partum hemorrhage, etc.); this is a limitation. We also do not know whether exclusion of smaller studies (n<50) by design influenced results.
Most studies did not directly address issues surrounding measurement of diagnostic error (e.g., validity, reliability, determination of causes, preventability, or attribution of harms). In clinical practice, many disease reference standards are insufficiently understood, developed, and implemented, so diagnosticians often disagree on final patient diagnoses. To the extent that manual chart reviews were used to identify errors, original studies are likely to suffer from problems of poor chart documentation,318, 319 low inter-rater reliability,320, 321 and hindsight bias.322 The problem of author bias in choice of definition or method of measurement (e.g., specialists [or diagnostic error “advocates”] determining ED misdiagnosis and favoring more lax definitions of error/harm, or the reverse, with ED clinicians favoring more stringent definitions) is difficult to ascertain. Our use of the NAM definition of diagnostic error mitigates some of these concerns, since there is less subjectivity inherent in a diagnostic label change (e.g., discharged with “musculoskeletal chest pain” returns with “aortic dissection” within 24 hours) than in the determination of preventability, which is known to be highly subjective.320 Also, many included studies used stringent measurement protocols or objective statistical methods (e.g., Symptom-disease Pair Analysis of Diagnostic Error [SPADE]145). Nevertheless, poorly standardized or low-reliability measurements are important limitations.
Gaps in Evidence for KQ1—Diseases Associated With Diagnostic Error/Harm
- The literature on diagnostic error is dispersed and challenging to aggregate. A concerted effort should be made to standardize reporting language in studies that address diagnostic error and harms (e.g., by creating an extension to the Standards for Reporting of Diagnostic Accuracy Studies [STARD] reporting guidelines323) and improve meta-data tagging of relevant studies by the National Library of Medicine.
- Differences in disease classification, categorization, and granularity (i.e., lumping versus splitting) powerfully influence frequency rankings. This hampers synthesis across studies, so standardized reporting categories and definitions should be adopted. This could be accomplished using the Agency for Healthcare Research and Quality (AHRQ) standardized coding schema from the Clinical Classifications Software,101 as was used in the study that defined the top 15 above.
- Data on the conditions most often misdiagnosed in the ED, independent of outcome severity, remain uncertain—fractures are common but probably overrepresented relative to other lower harm-severity illnesses, while other common conditions are probably more frequently misdiagnosed based on epidemiologic data (e.g., benign inner ear disorders), yet go underreported. This would ideally be addressed through nationally representative mechanisms of annually tracking ED diagnostic error, using existing mechanisms such as AHRQ’s Healthcare Cost and Utilization Project (HCUP) family of databases259 or the Centers for Disease Control and Prevention’s National Center for Health Statistics.324 In such a process, special attention should be paid to differences in conditions between children and adults, since less is known about pediatric diagnostic error distribution. Some diseases relevant to children were not identified in our preliminary search or through our Technical Expert Panel and Key Informant interview processes, so were not explicitly assessed in our protocol (e.g., ovarian torsion,82 child abuse,325–328 brain tumors); these may be important to future inquiries.
- The special case of child abuse (which was not incorporated into our study design but was identified during the review/comment period for the report) highlights an important gap around recognition of diagnostic errors for diseases that may be intentionally concealed, rather than surfaced, as problems. The Centers for Disease Control and Prevention have estimated that nearly 1 in 7 children suffer abuse and neglect, resulting in 1,750 deaths in the United States in 2020.329 One older study of 173 abused children under age 3 with head injuries found 54 (31%) were not recognized by physicians (across settings) as non-accidental injuries; among these, 15 (28%) were reinjured after the misdiagnosis.330 A more recent, multi-center, ED-based study in the Netherlands found that EDs complying with screening guidelines for child abuse were 4-fold more likely to detect cases (0.3% versus 0.1%, P < 0.001), suggesting that many missed cases are likely detectable.331 Because abusive parents are highly unlikely to file a malpractice claim for an ED missed diagnosis of abuse, malpractice data will grossly underrepresent this condition. The same is likely to be true for other forms of abuse (e.g., missed spousal abuse, elder abuse), certain socially unacceptable conditions (e.g., missed cases of illicit drug use or dependence), or factitious disorders (e.g., missed Munchausen syndrome). Furthermore, individuals may be more likely to seek care at different EDs,332 limiting the utility of single institutions to detect missed cases (e.g., via chart review). For these populations and disorders, special efforts must be made to identify misdiagnoses using alternative data sources and methods.
- Data on the symptoms or clinical presentations most often misdiagnosed are sparse. This is a problem because solution-making for diagnostic error requires a focus on clinical presenting symptoms, rather than diseases (because patients attend the ED with new or troubling symptoms, and the diagnostic process must then focus efforts on identifying the underlying causes). This should be rectified by leveraging existing coding architectures, such as that provided by the International Statistical Classification of Diseases and Related Health Problems 10th Revision (ICD-10), “Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified (R00-R99).333” This could be accomplished via modified billing requirements (e.g., the Centers for Medicare & Medicaid Services could require all encounters billed from the ED, regardless of final disposition, to be tagged permanently with a standardized symptom-based code).
Gaps in Evidence for KQ2—Frequency of Diagnostic Error/Harm
- Diagnostic accuracy and error rate terminology should be standardized for reporting purposes (e.g., by creating an extension to the STARD reporting guidelines323). More research should be done to assess systematic differences between prevalence-independent measures (false negative rate, false positive rate) and prevalence-dependent measures (false omission rate, false discovery rate), particularly since most of the literature is mixed-and-matched in this regard (i.e., focused on either false negative rates [sensitivity] or false discovery rates [positive predictive value], but not on false positive rates [specificity], false omission rates [negative predictive value], or total accuracy). The impact of study design on different diagnostic accuracy parameters should be assessed.
- Methodological approaches used in most of the identified studies tend to bias towards underestimation of diagnostic errors and misdiagnosis-related harms by one to two orders of magnitude. The literature is heavily weighted towards retrospective administrative studies that use variable definitions, differing time windows for outcome assessment, and fall short on ascertainment because of incomplete outcome event data. New measurement approaches are needed, including those that capitalize on regional or insurance-based assessment of adverse events such as hospitalizations and deaths. It may be necessary to rigorously develop statistical inflation factor estimates that facilitate adjustment of retrospective study results to match prospectively obtained rates. Time windows should be standardized and based on appropriate empiric evidence.143, 147
- Data on disease-specific health outcomes associated with diagnostic error were limited, and many were incorrectly reported as null effects (or even “protective” effects) without proper severity matching (or adjustment) from the time of initial clinical presentation. A guide to proper analysis (including initial case severity adjustment) to assess the adverse health outcomes of diagnostic error should be developed and disseminated by AHRQ.
- More research should be done to assess preventability of harms from diagnostic errors, since there is moderate inter-rater variability in clinician ratings of preventability.320
- More research should be done on the magnitude and severity of false positive diagnostic errors in the ED, since most of the studies identified focused on false negatives.
- More research should be done to understand the biases present in both malpractice claims and incident report data. For example, diagnostic error rates for myocardial infarction are just 1.5 percent, yet there are nearly as many claims and incidents as there are for stroke, which affects a similar number of patients but is misdiagnosed 10-fold more often. Likewise, sepsis affects more patients overall than stroke and is probably missed at slightly lower rates (meaning there are expected to be a similar numbers of misdiagnosed cases), yet there are many fewer claims and incident reports for sepsis. It is unknown how much of these differences relate to true outcome differences across diseases versus the disease-specific probability that a malpractice claim or incident report is filed.
- More research should be done to measure diagnostic error rates among admitted patients, since we identified few studies of this type, but there were more errors than expected (e.g., a 12% error rate correlated with a 2.4-fold increase in mortality7 and frequent missed myocardial infarction among patients admitted with other diagnoses). It is possible that these errors account for one third of all ED-related serious harms.
- More research should be done to assess the relationship between admission fraction and diagnostic error rates, including total diagnostic accuracy (particularly with respect to academic versus non-academic status). It appears that, at least in some studies, academic teaching hospitals have lower diagnostic error rate (among those discharged) but a higher admission fraction than non-teaching hospitals. Because individual studies rarely address both false negatives and false positives together, it is unknown whether overall diagnostic performance or accuracy (i.e., area under the receiver operating characteristic curve) at teaching hospitals is actually better or they are simply making different disposition decisions by trading off false negatives (fewer discharged missed cases of dangerous diseases) in favor of false positives (more patients with unnecessary hospitalizations).
- More research should be done to assess the utilization and cost implications of diagnostic error, including both those treated and released from the ED and those admitted to the hospital. Relatively few studies addressed this issue in rigorous ways.
Gaps in Evidence for KQ3—Causes of Diagnostic Error/Harm
- Analysis and reporting of risk factors for (or causes of) diagnostic error in the current literature is highly variable. Much of the heterogeneity in results for demographic predictors (e.g., gender or race) may stem from confusion about the inferences to be drawn from different study designs that either look back from hospitalized patients with a given disease or look forward among patients with a given symptom (see KQ3). Reporting should be standardized (e.g., by creating an extension to the STARD reporting guidelines323) so that health equity in diagnosis can be accurately measured. The root causes of measured diagnostic disparities should be examined, including the role of implicit or explicit bias towards women, minorities, or other vulnerable populations. Research should be done to assess the contribution of (nominally correct) prevalence-based decision-making on the part of ED clinicians to diagnostic health disparities. Other patient characteristics reflecting marginalized status334 (e.g., members of religious minorities; lesbian, gay, bisexual, transgender, and queer [LGBTQ+] persons; persons with disabilities; persons who live in rural areas; and persons otherwise adversely affected by persistent poverty [including homelessness] or inequality) or the presence of marginalizing co-morbidities (e.g., mental health or substance use disorders335, 336 or obesity) that may increase the risk of diagnostic error are understudied and deserve further equity-related research. To summarize, measuring health equity in diagnosis should be a key focus of future research, and special care should be taken to ensure that rigorous epidemiologic and statistical methods are used to address this concern, since incorrect methods can lead to erroneous inferences.
- We found relatively few studies that assessed the impact of clinician characteristics, fixed system characteristics, and dynamic system characteristics. There were relatively few studies that addressed potentially important predictors related to ED clinicians (e.g., training background, years of clinical experience, history of disciplinary action, fatigue), fixed systems factors (e.g., access to consultants, access to tests, delivery system/payment models), and dynamic systems factors (e.g., ED staffing, ED workload, crowding, handoffs, discharge fraction). These are important areas for future study, since they may be used to identify high-risk individuals, sites, or practices that could be targets for remedial action. For example, the path to closing the measured diagnostic performance gap between community and academic EDs (with lower false negatives at teaching centers) is unclear; to guide solution-making, it would be very helpful to know whether lower false negative rates at academic centers reflect greater total diagnostic accuracy (lower false negatives and lower false positives) or a merely a lower threshold for further diagnostic testing and admission (lower false negatives and higher false positives).337
- We found no studies included in the review that considered how teamwork directly impacted the risk of diagnostic error for better or worse (e.g., involvement of patients, trainees, advanced practice providers (APPs), ED nurses, allied health professionals, or specialists; typical ED team composition; or team cohesion and dynamics). Recent studies suggest that ED diagnostic accuracy can be improved through the direct engagement of specialist consultants as part of the diagnostic team caring for ED patients.338 It would also be valuable to know whether engaging ED nurses in support of ED clinician diagnosis by promoting adherence to guidelines, protocols, or pathways would improve diagnostic accuracy or outcomes for patients (as demonstrated previously in other areas of patient safety).339 Likewise it would be valuable to know whether findings from vignette-based trials showing that medical students make more accurate diagnoses when working in teams than when working alone340 also apply to routine, real-world ED care delivery, as implied by one recent study that focused on systematic physician cross-checking in the ED.341
- We found few studies that addressed whether patients themselves affected ED diagnostic errors for the worse (e.g., via delayed recognition of the problem as part of the “patient interval” in diagnosis [see KQ3 results regarding testicular torsion]) or for the better (e.g., via proposed patient-facing strategies to prevent diagnostic error or mitigate resulting harms5, 342). Further study is needed to assess the impact on diagnosis-related health outcomes of delayed (or rapid) disease recognition by patients themselves; the role of directly engaging patients as part of the diagnostic team343; and more effective methods for shared diagnostic decision making as part of “patient-centered diagnosis.344”
- We found limited evidence on the distribution of causes based on harm severity and no evidence of whether certain error causes were more likely to result in patient harm. It would be helpful for future studies to report the relationship between causes and harms to determine whether specific causal factors are more important targets in reducing harms.
Review
Neither the study team nor the TEP prospectively identified five conditions that ultimately appeared among the 15 most harmful conditions identified as part of KQ1. As a result, spinal cord compression and injury (#4), lung cancer (#8), traumatic brain injury and traumatic intracranial hemorrhage (#9), gastrointestinal perforation and rupture (#14), and intestinal obstruction (#15) were not included in the original, prespecified disease-specific searches related to KQ2 and KQ3. Thus, rates are not available. It is likely that the causes of missed lung cancer differ somewhat from the studied vascular events and infections. Cognitive errors are likely to have been errors in interpreting radiographs (missed lung nodules) and systems errors are likely to have been errors in communication or handoffs for follow-up of incidental findings.
Applicability
The majority of patient populations studied are likely applicable to a typical U.S.-based adult ED population. However, the relative paucity of pediatric studies suggests that caution should be exercised when extrapolating results to children/pediatric EDs. Studies were disproportionately conducted in academic hospital settings. There is some evidence that such hospitals have lower diagnostic error rates but higher admission fractions; non-teaching hospitals may have lower admission fractions and higher diagnostic error rates.148 This means that non-teaching hospitals may experience higher error rates than those listed above in KQ2; however, there is no specific reason to believe results from KQ1 or KQ3 do not apply. As noted in the section on Strengths and Limitations (Evidence subsection), outcome measures were neither homogeneous nor consistently reported across studies. This was principally an issue for KQ2 (rates) and, to a lesser extent, KQ3 (causes). Nevertheless, we believe we were able to combine studies appropriately and summarize both rates and causes where evidence supported meta-analysis.
Despite sourcing key portions of the data for KQ2 (rates) from a small number of studies conducted in countries outside the United States, we believe the results apply to U.S.-based EDs. Point estimates for overall error and harm rates were drawn from three studies based outside the United States (Canada, Spain, and Switzerland, with a combined n=1,758), but these were the only higher-quality studies found that conducted systematic patient follow-up to minimize under-ascertainment of diagnostic errors. The overall estimated ED diagnostic error rate of 5.7 percent was far lower than the measured false negative rates for the top serious harm-producing diseases other than myocardial infarction (range 10-56%, Table 9), and 9 of the 12 disease-specific rates included U.S.-based studies (not pulmonary embolus, meningitis, or pneumonia). The measured overall harm and death rates derived from a single, well-designed, prospective Canadian study. Although that study excluded “less urgent” and “non urgent” cases (which may artificially inflate the estimated mortality rate), the study was also conducted at an academic institution, diagnostic errors resulting in mistreatment were classified as treatment errors, and the methods used for determining a preventable diagnostic adverse event (minimum certainty of 5 on a 6-point Likert scale by at least 2 of 3 emergency medicine reviewers) was very stringent (all of which may artificially reduce the estimated mortality rate). Because the measured mortality rate and range triangulate well with estimates from the two European studies, a nationally representative U.S.-based source (Medicare data on short-term deaths post ED treat-and-release with a “benign” diagnosis148), and benchmarking from autopsy data in relation to ED error (see KQ2 Plausibility of Mortality Estimates From Higher Quality Studies), we believe they are likely representative.
The misdiagnosis-related death and total serious harms rate can be compared to the estimated rates for inpatient care. A prior systematic review by Gunderson et al.3 found a total diagnostic adverse event rate of 0.7 percent in hospital inpatients. One of the studies cited in that review (Zwaan, 2010) found 29 percent of these hospital-based diagnostic errors resulted in death (and another 26% were associated with persistent disability at hospital discharge). Combining these results suggests the inpatient misdiagnosis-related mortality rate is roughly 0.2 percent (and the inpatient serious misdiagnosis-related serious harm rate is roughly 0.4 percent). These hospital-based estimates comport well with the ED serious harm rates estimated in this evidence report. The diagnostic adverse event rate in the hospital (from the prior review) is lower than the ED (in this review) while the serious harm rate in the hospital is higher than the ED. This makes sense since hospital care is permitted more time and greater diagnostic resources (i.e., it is expected that errors would be less frequent), but harm severity is higher because patients are sicker.
Because there are known differences between ED training and certification in the United States and other countries that might influence applicability, we reached out to study authors to determine the training background of ED clinicians from these three studies. The study from Spain (Nuñez, 2006) used to estimate the diagnostic error rate among treat-and-release discharges differed the most from U.S. ED practice—because there is no emergency medicine training pathway, ED clinicians were trained as a mix of internists, surgeons, and family physicians. The study from Switzerland (Hautz, 2019) used to estimate the diagnostic error rate among admitted patients and to triangulate mortality was closer to U.S. ED practice. There were 33 different attending ED physicians, all with a primary degree in internal/hospital medicine, and 26 of 33 (78.8%) had further formal specialization in emergency medicine. Mean professional experience was 11 years since graduation (range 6-25 years) and mean experience in emergency medicine was 6 years (range 1-25). The study from Canada (Calder, 2010) used to estimate diagnostic adverse event and mortality rates was very similar to U.S. ED practice. All attendings (estimated by study authors as n≈55) had training or certification in emergency medicine. The majority (estimated by that study’s authors at ~80%) underwent a 5-year emergency medicine training program (which is longer than the 3 to 4 years of emergency medicine training typical in the United States), while the minority (estimated by that study’s authors at ~20%) underwent a 1-year emergency medicine certification program following 2 years of family medicine training. Thus, from the two studies used to estimate harms, about 92 percent (n≈81/88) had specific training and certification in emergency medicine, and 50 percent (n≈44/88) had more training in emergency medicine than would be typical in a U.S.-based emergency medicine residency.
While the referral architecture by which patients attend EDs likely differs across countries (including some included as part of our review), we found no evidence that studies conducted in comparable, disease-specific populations outside the United States had substantively different results than those conducted in U.S.-based EDs. Comparison across studies within each disease did not demonstrate any systematic differences in diagnostic error rates between U.S.-based and non-U.S.-based EDs. The one disease-specific study which included both U.S.-based and European EDs and compared diagnostic performance directly across continents found slightly longer diagnostic delays for aortic dissection patients in North America when compared to Europe; from the list of investigators included in the registry, 12 of 14 North American sites were U.S.-based institutions and the other two were in Canada, while the European sites were from seven countries, including Spain and Switzerland.68 Thus, there is reason to believe that the error and harm rate estimates are either representative of U.S. ED performance or perhaps low.
Given that this systematic review spans studies from more than two decades, there are naturally applicability concerns regarding recency of estimates. We found no clear differences based on the epoch in which studies were reported (2000 to 2010 versus 2011 to 2021), although comparisons were limited to just a few diseases based on data availability. The one study which explicitly assessed temporal trends for cardiovascular misdiagnosis in U.S.-based EDs (2006-2014, using Medicare data) found no significant trends for myocardial infarction or aortic dissection and a rising trend (increased false negative diagnostic errors) over time for ruptured aortic aneurysm, subarachnoid hemorrhage, and ischemic stroke.120 Thus, we believe that the disease-specific error rates, despite in some cases being more than a decade old, are either representative of current U.S. ED performance or, for some diseases, perhaps low.
Implications for Clinical Practice, Education, Research, or Health Policy
Although not all diagnostic errors or associated harms are preventable, we believe that the current report outlines a clear path forward towards eliminating those misdiagnosis-related harms in the ED that are preventable—(1) it identifies the diseases with the greatest burden of misdiagnosis-related harms, permitting prioritization; (2) it clarifies which clinical presentations have the greatest opportunity for improvement, focusing improvement efforts and delineating diagnostic performance benchmarks to assess progress; and (3) it pinpoints the common root causes and contexts, defining the nature and scope of appropriate solutions, and explaining why modular solutions are more likely to work than general ones. Limitations of the report included reliance on a few high-quality studies for the list of diseases (KQ1a) and overall error/harm rates (KQ2a) as well as inconsistent methodology across studies (including issues related to data sources, measurement methods, and causal relationships).
Several policy recommendations flow directly from the report’s findings and the documented limitations in the evidence base: (1) standardizing measurement and research results reporting to maximize comparability of measures of diagnostic error and misdiagnosis-related harms5, 345, 346; (2) creating a National Diagnostic Performance Dashboard to track performance (analogous to the Dartmouth Atlas Project for utilization of healthcare services347); and (3) using multiple policy levers (e.g., research funding, public accountability, payment reforms)5 to facilitate the rapid development and deployment of solutions to address this critically important patient safety concern. The first flows from the lack of standardized measurement of diagnostic error and harms identified by the systematic review. The second derives from the lack of adequate national benchmarking and lack of comparability of measurement across EDs identified in this systematic review. The third derives directly from the overall public health scale and scope of the problem identified by the review. These interventions will require the application of new resources, and the magnitude of such resources should be commensurate with the large public health burden.
Considerations for Clinical Practice and Policy
Challenges Facing ED Diagnostic Safety and Quality
Discussing diagnostic errors can feel overwhelming for clinicians, educators, researchers, and policymakers alike. Clinically there is already a long list of things required for patient safety and quality, so addressing diagnostic errors feels like “one more thing.” ED physicians do not routinely receive performance feedback, so may be mis-calibrated as to their diagnostic accuracy,116 raising internal doubts about the magnitude of this as a safety problem. Skepticism related to the role of hindsight bias in retrospective studies further fuels such doubts.322 Diagnostic competence is also deeply personal for physicians and tied to their sense of identity as a clinician,348 likely more so than medication errors from bad handwriting or patient falls in the hospital. Especially for older ED physicians in the United States, the historical struggle for recognition of Emergency Medicine as its own discipline has fostered a degree of “hyper-independence”309 that may feel threatened by discussions of diagnostic error which link back to insufficient diagnostic expertise as a potential cause. For educators, there is already too much to teach and too little time to teach it. It seems hard to know even where to begin with diagnostic errors, since they happen for all symptoms and all diseases, and our present modes of education appear to be insufficient to the task.349 For researchers, this is a complex, multi-faceted problem that does not lend itself well to reductionist methods or precise outcome measurement. For policymakers, this is a deeply technical area where scientific consensus is often lacking, solutions appear to be few in number342, 350 (and too narrowly constructed), and the best course of action may seem to be inaction. It is also self-evident that fixing diagnostic errors will be difficult—had it been easy, it would have been done long ago. There would have been no need for an over 400-page report in 2015 from the NAM, entitled Improving Diagnosis in Healthcare,5 describing this multi-faceted, “wicked problem”351 (in the technical sense352), nor need for the current report. Lastly, any attempts to fix the problem carry an associated risk of unintended consequences. For example, EDs have often been criticized for the overuse of diagnostic tests—an emphasis on diagnostic error has the potential to increase testing among low-risk patients, increasing costs, adding radiation exposure or other diagnostic test-related risks, and leading to more incidental findings that themselves adversely impact patient wellbeing.353 Some of our findings suggest that, at least for myocardial infarction, the balance may already have shifted in the direction of test overuse, excess workups, and diagnostic overcalls (see KQ2). Furthermore, ED overuse of increasingly sensitive diagnostic tests now risks overdiagnosis354, 355 of mild forms of illness where, despite a correct diagnosis, harms (physical, psychological, or financial) may ultimately outweigh treatment benefits (e.g., sub-segmental pulmonary embolism354).
Concerns Over ED Diagnostic Test Overuse Due to a Focus on Diagnostic Safety
In considering implications for clinical practice and policy, it is important to examine the apparent tension between test underuse and test overuse as it relates to diagnostic errors. A common concern is that a focus on false negatives will drive diagnostic test overuse and more false positives (as well as adverse impacts of greater testing such as risks, incidental findings, and costs). For example, concern over missed stroke in ED dizziness appears to be driving increased use of neuroimaging.306, 356 Head computed tomography (CT) is the primary neuroimaging modality used to search for stroke in ED dizziness,357 and there is strong evidence that CT overuse in ED dizziness presentations is increasing radiation exposure and healthcare costs without improving diagnosis of stroke or other neurologic diseases.163, 357–359 Conversely, the argument is often made that a focus on cost containment and care efficiency will drive test underuse and more false negatives. For example, there are legitimate concerns that downward financial pressure on use of magnetic resonance imaging (MRI) in back pain presentations360 may increase the risk of missed spinal abscess, which requires spine imaging for diagnosis. But this “tradeoff” scenario assumes that (a) current practice optimally applies existing diagnostic methods, (b) innovations in diagnosis do not occur, and, therefore (c) the only way to influence diagnosis is to alter the threshold for ordering existing tests (e.g., by lowering the threshold and testing patients at very low risk for the target disease). This premise then leads to the (often) erroneous conclusion that diagnosis is a “zero sum game” and the only choice is to “pick your poison” between more false negatives (favor specificity, sacrifice sensitivity) and more false positives (favor sensitivity, sacrifice specificity).337 However, this is generally a false dichotomy, since current practice often fails to apply basic diagnostic methods (e.g., proper history-taking and neurologic examination in patients with back pain at risk for spinal abscess228) and innovations that actually improve diagnosis (e.g., via better education or training, new clinical pathways, novel diagnostic tests, enhanced teamwork in diagnosis, greater access to specialists, or improved feedback and calibration) will almost always increase both sensitivity and specificity at any given decision threshold. The result is then fewer false negatives and fewer false positives, sometimes even at a lower total cost.337, 361
Implications for Solutions To Reduce Diagnostic Error and Associated Harms
In pursuing new solutions to tackle ED diagnostic errors, the first question that any chief quality officer, risk management professional, or policymaker should ask is whether there are cross-cutting (non-problem-specific) solutions that could be implemented immediately in the ED (e.g., a diagnostic “time out” for clinicians to reflect on their own diagnostic process362 or tools that help patients summarize their symptoms363). Although this would seem to be the quickest way to solve the problem of ED diagnostic error, there is some evidence to suggest that general solutions like this are unlikely to work. Our KQ3 findings indicate that cognitive errors in diagnostic reasoning predominate as causes. Monteiro et al. have nicely summarized the extensive body of evidence that diagnostic expertise is deeply problem-specific in their 2020 review article aptly subtitled “The enduring myth of generalisable skills.”317 Our KQ2 findings also support this position, given that ED clinicians are clearly quite accurate in diagnosing myocardial infarction, but far less accurate with other dangerous diseases. Our KQ3 findings further bolster this position, given that clinical symptoms which are “atypical” are the most consistent risk factors for misdiagnosis, within a given disease. Put differently, being expert at diagnosing heart attack in patients with chest pain does not confer the same expertise in diagnosing stroke in patients with dizziness; the converse is also true. As a result, all solutions will likely need to be tailored on a symptom- and disease-specific basis (i.e., modular).
Target diseases should be prioritized based on (a) the overall share of misdiagnosis-related harms (particularly high-severity harms), (b) higher absolute error or harm rates (i.e., with ample opportunity for improvement), (c) variability in diagnostic performance (including known health disparities or variation by organization, site, or provider), and (d) availability or cost-effectiveness of promising solutions. This approach to prioritization reflects an emphasis on public health needs while balancing societal costs and benefits. There is a value judgment to be made when comparing more frequent but less severe harms (as with missed fractures) to those that are less frequent but more severe (as with missed stroke). From a purely utilitarian standpoint, the aggregate societal disutility in these two categories of diagnostic error may be similar, but our personal experience with patients who have suffered diagnostic errors is that they care more about permanent, severe harms than temporary, milder ones, even if the latter are more frequent. Therefore, we believe that prioritization on the basis of high-severity harms (KQ1, Table 3) makes both the most public health sense and the most patient-centered sense. Nevertheless, solutions targeting very high-frequency errors may also be warranted.
Just 15 diseases likely account for more than two-thirds of the serious misdiagnosis-related harms in the ED, so these should certainly become the initial priority focus. Only one of these is missed at rates near zero—myocardial infarction stands alone with a miss rate of 1.5 percent. While there may still be room for improvement among admitted patients, trying to further reduce missed heart attacks in the ED may prove challenging.77 Instead, we should leverage the prior successes in deploying chest pain clinical pathways for diagnosis of acute coronary syndromes to serve as a model for how to improve diagnosis for other symptoms and diseases. That process took decades of focused research on heart attack diagnosis,364 followed by concerted quality improvement efforts to improve diagnosis through care process redesign,365 including partnering with specialists from a relevant discipline (cardiology) to achieve optimal outcomes.366 Lessons learned should now be extended to other diseases in the top 15.
A strong next candidate for targeted diagnostic safety and quality initiatives, based on results of this systematic review and priority-setting approaches described above, would be to construct clinical pathways for dizziness to identify strokes. Improving diagnosis of strokes in dizziness is a top priority for ED clinicians,356 and a clinical practice guideline for acute dizziness diagnosis is currently under development by the Society for Academic Emergency Medicine.367 Dizziness now leads to nearly 5 million ED visits per year, at a cost of over $10 billion.115 Dizziness and vertigo are “atypical” stroke symptoms relative to the more familiar (and obvious) unilateral weakness or inability to speak. Strokes presenting with dizziness are missed 40 percent of the time,21 leading to an estimated 45,000 to 75,000 missed strokes.115 Brief bedside physical exam techniques that look closely at eye movements (known as “HINTS”368) have been shown to have greater accuracy (sensitivity) than MRI in this specific context.307, 368–370 Current evidence shows that many ED physicians are unfamiliar, uncomfortable, or inexpert in using these bedside techniques.306, 307 This creates an opportunity for diagnostic quality improvement. Our results show that more diagnostic expertise is needed. This could be accomplished by enhancing ED physician expertise via scalable education techniques such as virtual patients,349 supporting ED clinicians through access to dizziness experts via telehealth,338 or leveraging devices (including mobile phones) married to algorithms that digitally encapsulate expert interpretive knowledge about these findings.371, 372 One could envision that similar quality initiatives might target other symptom-disease pairs such as abdominal pain (aortic aneurysm/dissection, mesenteric ischemia), altered mental status (sepsis, meningitis/encephalitis), or back pain (spinal abscess).
Implications for Operational Quality Measurement and Benchmarking
A recent issue brief from AHRQ outlines the full palette of options for operational measurement of diagnostic errors.373 Below, based on ED measurement needs derived from our systematic review and meta-analysis, we offer specific suggestions from among the list of possibilities mentioned in that brief. We divide these into methods that are disease-specific and those that are disease-agnostic. We also note measures that are “numerator only” (i.e., they are all “events” and the precise population from which these events are drawn is ill-defined) and briefly summarize the use of each data type considering findings from this report. These are followed by some general recommendations on measurement based on our findings. Because no single measurement method can address all types of diagnostic errors, ED diagnostic errors should be tracked using a portfolio of metrics that include the following:
- Disease-Specific Data Sources/Metrics for Diagnostic Error. Disease-specific measurement facilitates targeted quality improvement efforts and assessment of their impact. These measures should be used to address symptoms, diseases, or symptom-disease pairs that are either common or frequently misdiagnosed.
- SPADE (Symptom-disease Pair Analysis of Diagnostic Error)145 - We identified multiple studies using SPADE or related methods for missed stroke,64, 120, 155 myocardial infarction,63, 77, 120 aortic aneurysm/dissection,120 sepsis,78, 93, 94, 256 and meningitis,93 but it can be applied to any acute disease which confers excess short-term risk of an adverse clinical outcome when left untreated after an initial treat-and-release visit. The look back method (from diseases to symptoms) can be used to discover clinical presentations (often “atypical” ones) at high risk of misdiagnosis, as well as other risk factors for misdiagnosis, such as age, gender, or race. The look forward method (from symptoms to diseases) can be used to measure absolute rates of misdiagnosis-related harms and monitor performance in response to diagnostic improvement initiatives. SPADE is a clinically valid, methodologically sound, statically robust,154 and operationally viable155 method of identifying misdiagnosis-related harms from electronic health record or billing/administrative data—importantly, without the requirement of manual chart review (although chart review can inform root cause analysis if so desired). However, SPADE relies on detecting adverse events. From the studies we identified, these are relatively infrequent (typically less than 1 percent of treat-and-release cases), so stable measurement generally requires thousands of encounters. That means that at a medium to large-sized ED, relatively common symptoms (e.g., abdominal pain, chest pain, dizziness, headache, back pain) can be mined using SPADE for misdiagnosis-related harms linked to more common dangerous diseases such as stroke, myocardial infarction, sepsis, or pneumonia using a rolling 6- to 12-month window. Smaller hospitals or rarer diseases generally require longer assessment time windows. Also, related symptoms77, 256 or diseases120 can be aggregated to increase the sample size. If insufficient data are available for stable measures, SPADE can be used as an electronic trigger mechanism to identify cases for manual chart review.
- Change from ED admitting diagnosis to final hospital discharge diagnosis - We found many studies that measured false positives among patients admitted to the hospital for a specific target disease. For example, this included studies of the rate at which admissions to a medical unit with suspected myocardial infarction turned out to be incorrect or the rate at which the cardiac catheterization lab consulting service was activated unnecessarily. These studies tended to focus on overutilization of clinical services or hospital admission. This method is likely to be more helpful in assessing overall diagnostic accuracy for a given disease if paired with a search for false negatives, at least among admitted patients (e.g., admission for “fall” that turns out to be a missed myocardial infarction). This involves a search for when the ED admitting diagnosis differs from the final hospital discharge diagnosis for a given dangerous disease, as done retrospectively for myocardial infarction using Medicare data206 and in robust prospective fashion by Hautz et al., 2019 across medical conditions.7 Even more robust would be to combine this with 30-day disease-specific hospitalizations after ED treat-and-release for a more complete capture of false negatives, although we did not identify any disease-specific studies that combined these sorts of data.
- Unannounced standardized patients374 (“secret shoppers”) - Although no studies of this type were identified in our review, standardized “fake” patients can be used to assess diagnostic quality for specific symptoms or diseases in clinical practice.375 This approach decreases the variance in measurement, allowing very direct comparisons, down to the individual ED clinician level. However, the effort and expense required makes this an option that should be reserved for very high-stakes scenarios (e.g., pay-for-performance benchmarking for diagnosis of a specific clinical presentation).
- Disease-Agnostic Data Sources/Metrics for Diagnostic Error. Disease-agnostic measurement facilitates inquiry into overall error and harm trends but is less actionable. These measures should be used to track the impact of broad interventions likely to affect the overall diagnostic error rate (e.g., change in staffing model or access to specialists) and, when possible, as a general benchmarking tool to compare across institutions.
- Malpractice claims (numerator only) - At most institutions, ED claims are readily captured and thoroughly analyzed. Based on this review, claims should be presumed to be both biased towards dangerous diseases and to substantially underrepresent total errors, but also to be mostly representative of diagnostic errors resulting in serious harms (barring perhaps overrepresentation of heart attacks and radiographically determined misdiagnoses). Tracking changes in the frequency or severity of claims in response to diagnostic improvement interventions may work for more common conditions in claims (e.g., stroke) or using long-term averages over time for less common ones, but the latter may be impacted by other secular trends.
- Incident reports (numerator only) - These are most useful if there is a structured mechanism for identifying the incident as a diagnostic error and concerted efforts are made to encourage reporting by clinicians.100, 339, 376 This includes physicians (who rarely report but are best positioned to report on diagnostic issues),376 as well as other team members such as nurses (who routinely report but do not routinely view diagnostic errors as within the scope of their reporting duties100, 339, 376). Their value is principally in identifying unexpected errors or latent risks. Incident reports can be combined with similar data (e.g., patient complaints, autopsy, morbidity and mortality rounds cases377, 378). Incident reports can be enhanced and made more informative via the use of common formats that permit aggregation of data at the local, regional, or national levels.379 The AHRQ Common Formats for Event Reporting (CFER) now include a special common format for Diagnostic Safety event reporting (CFER-DS) that has recently been developed for use by patient safety organizations (PSOs).379, 380 The CFER-DS (and all of the other AHRQ Common Formats) are available in the public domain to encourage their widespread adoption. An entity does not need to be listed as a PSO or working with one to use the Common Formats. However, it should also be noted that the Federal privilege and confidentiality protections only apply to information developed as patient safety work product by providers and federally listed PSOs working under the Patient Safety and Quality Improvement Act of 2005.
- Electronic triggers for ED treat-and-release visits (unplanned revisits or outcomes) - Electronic triggers represent an important mechanism for identifying potential diagnostic adverse events that then trigger manual chart review. After case review and confirmation as diagnostic errors, the rate can be tracked over time for any changes. Typical triggers include short-term revisits, hospitalizations, or adverse patient outcomes (e.g., non-hospice death), if available. Based on our review, 72 hours provides an enriched sample but will substantially underestimate totals. The range of reasonable time frames is estimated at 7 to 30 days, but it appears 14 days is sufficient to capture the majority of adverse events following diagnostic errors. Ideal ascertainment time windows are likely disease-specific in relation to natural history.
- Electronic triggers for ED admissions (unplanned escalation in care or change in treating service) - Typical triggers include intensive care unit transfers after routine (ward) admission, transfer of admitting service (e.g., from medicine to neurology), or adverse patient outcomes (e.g., in-hospital death). Because our review found that diagnostic errors are enriched among admitted patients and associated with both hospital mortality and increased length of stay, it would not be unreasonable to screen any chart with a change in diagnosis from ED admitting diagnosis to hospital discharge for diagnostic errors, as in Hautz et al., 2019.7
- Routine or sampled follow-up outreach to patients (e.g., Leveraging Patient’s Experience to improve Diagnosis [LEAPED]8) - Although methods for determining diagnostic error using routine or sampled patient follow-up contact are still being optimized, feasibility has recently been established.8, 381 Our review identified very few studies assessing failures in communicating diagnoses to patients (which are also defined as diagnostic errors by the NAM). Direct patient outreach post-visit is likely the only method by which diagnostic errors due to communication failures with patients can be ascertained. If follow-up is obtained very early (e.g., less than 72 hours), communication failures will predominate. If follow-up is obtained later (e.g., 30 days), it is likely that both communication failures and diagnostic accuracy can be captured. Such later phone calls are likely to serve as an important source of information regarding diagnostic errors with less severe consequences, including temporary harms, which our review found were poorly captured by existing methods.
- General Recommendations for Measuring Diagnostic Error. Below are general insights about measurement derived from our systematic review of the literature.
- False negatives and false positives - It would be optimal to measure all four aspects of diagnostic accuracy (true positives, true negatives, false positives, false negatives). This permits calculation of all accuracy statistics – sensitivity (false negative rate), specificity (false positive rate), negative predictive value (false omission rate), positive predictive value (false discovery rate), and total diagnostic accuracy (total diagnostic error). However, doing so requires combining multiple types of data, and we found no studies that did this. This can be done more easily for a single disease than for all diseases simultaneously.
- Balancing measures – Diagnostic process improvements (e.g., through use of new or different test batteries, structured clinical pathways, or teamwork in diagnosis) that increase total diagnostic accuracy will generally lead to reductions in both false negatives and false positives.337 However, one potential ED clinician response to concerns over (false negative) diagnostic errors is to simply “do more of the same” by changing their personal threshold for ordering diagnostic tests; this tends to produce diagnostic test overuse and excessive hospital admissions, rather than more accurate diagnosis.337 All diagnostic error-related measures should be accompanied by balancing measures that address rates of diagnostic test utilization and hospitalization.
- Outcome ascertainment - Optimal outcome ascertainment involves prospective data collection, as seen in the few very high-quality studies of diagnostic error on which our overall error and harm estimates are most heavily based. Ideally, this would be built into the process of routine care—systematically recording presenting symptoms, admitting diagnoses, discharge diagnoses, plus follow-up events and outcomes. With modern electronic health records, EDs can generally secure all but the last of these data points. Systematic ascertainment of outcomes is a crucial addition. All errors requiring chart review should be analyzed using diagnosis-specific root cause analysis procedures (e.g., specialized diagnostic error fishbone diagram382).
- Pitfalls in measurement - Based on the review, we identified several pitfalls in measurement that some studies failed to address, leading to heterogeneity, apparently conflicting results, and, in some cases, false conclusions.
- Finding discrepant diagnoses versus errant processes:6 The literature is admixed with studies that examine diagnostic errors very differently. A key aspect is whether studies require only an incorrect diagnosis label (or even a communication failure despite the correct label, as in the NAM’s definition), mandate some identifiable diagnostic process failure, require preventability, or only consider it an error when outcomes were judged to have been impacted. As expected, the highest frequency of errors will be measured when a label failure is all that is required and the lowest when resulting harms must have been judged by clinicians to have been preventable. However, as demonstrated nicely by Hautz, 2019, even a label failure (regardless of process) is associated with 2.4-fold increased mortality and 3.4-day increased hospital length of stay.7 Thus, even without ascertaining diagnostic process failures, label failures alone portend worse outcomes. This suggests that identifying label failures (which is easier and has greater inter-rater reliability than identifying process failures7) is preferable as a starting point for measuring errors from a quality improvement standpoint. It also suggests that studies or results should not be directly compared when different definitions are used. We recommend using the NAM definitions, which reflect all diagnosis label failures as errors, without regard to process.5
- Counting errors versus harms: When assessing post-treat-and-release ED returns or hospitalizations, it is incorrect to label these “diagnostic errors” because they are actually misdiagnosis-related harms. The severity of harms may be judged minor (e.g., temporary inconvenience and loss of confidence in the healthcare system383), but they still represent harms. Furthermore, many patients who suffer diagnostic errors “get lucky” temporarily (i.e., suffer no short-term consequences of the diagnostic error), but these patients are nevertheless at risk of delayed harms from lack of secondary prevention. For example, mislabeling a transient ischemic attack as “benign positional vertigo” may prevent the patient from getting secondary stroke prophylaxis. Untreated, 10 to 20 percent of such patients will suffer a major stroke in the subsequent 90 days,113, 258, 384 but even if the patient is one of the fortunate ones who do not have a major stroke in that time frame, the diagnostic error may nevertheless prevent the patient from being recognized as needing long-term stroke prophylaxis or risk factor modification.
- Tracking hospital crossovers: Not all patients return to the same hospital (or even health system) when they develop new or worsening symptoms after having been treated and released from an initial ED. This means there is systematic under-ascertainment of diagnostic adverse events (e.g., subsequent hospitalizations) when out-of-network crossovers are not considered. One estimate using a regional health information exchange found that 25 percent of patients who visit the ED more than once will cross over to another hospital or health system.385 When patients are misdiagnosed, they may be more likely to return to a different ED than if they were correctly diagnosed.145 One study included in our review found a 37 percent crossover rate.195 The importance of this for measurement is that hospitals should recognize that their true misdiagnosis-related harm rate could be more than 1.5-fold higher than measured using intra-hospital data. The implication for national benchmarking, payment incentives, and other high-stakes accountability initiatives is that data sources that capture out-of-network follow-ups are critical to ensure comparability of case ascertainment. Potential data sources include (a) insurance-based billing data such as Medicare, (b) linkable state-level data such as AHRQ’s State Emergency Department Databases (SEDD)386 and State Inpatient Databases (SID),387 or (c) regional health information exchanges such as Maryland’s Chesapeake Regional Information System for our Patients (CRISP).388
- Including morbidity in addition to mortality: From this review, we estimate that 29 to 41 percent of the serious misdiagnosis-related harms from ED diagnostic error are permanently disabling, rather than lethal. This means that mortality statistics alone will understate the total by 1.4- to 1.7-fold and diseases that confer a high rate of morbidity relative to mortality will be underrepresented in summaries of mortality. In particular, this includes neurologic diseases with a high proportion of serious harms that are morbid but not mortal—spinal abscess (82%), stroke (71%), and meningitis (48%). It is expected that untreated neurologic disorders tend to produce more permanent disability than death (and this likely includes two other top 15 neurologic conditions associated with serious misdiagnosis-related harms for which we were unable to ascertain the breakdown of morbidity versus mortality (i.e., spinal cord compression and injury; traumatic brain injury and traumatic intracranial hemorrhage)). Given that the organ system most often involved in diagnostic errors leading to serious harms is the nervous system (34%, Table 4), mortality alone will be a particularly poor health outcome proxy and will tend to substantially understate both individual diseases and total harms.
- Controlling for initial severity (“misdiagnosis is protective paradox”): Illness severity is often a confounder of the relationship between diagnostic error and health outcomes for patients (i.e., is causally linked to both the risk of misdiagnosis and the risk of a bad health outcome). An observational study that directly compares a population of all correctly diagnosed and all incorrectly diagnosed patients will generally find that initial case severity is higher among the correctly diagnosed population, skewing health outcomes for these patients in an unfavorable direction. This effect will tend to nullify the unadjusted, measured impact of diagnostic error or even reverse it (“misdiagnosis is protective” paradox).1 We found a number of studies which failed to control for initial case severity and, as a result, drew erroneous inferences about lack of impact of diagnostic error on patient outcomes. No measures of this type should be considered valid unless appropriate statistical controls (e.g., matching or adjustment) are used to account for initial case severity or its proxies.
- Addressing preventability of harms: There is moderate inter-rater variability in clinician ratings of preventability.320 This issue is more complex for diagnostic errors than treatment errors because there is dual uncertainty—first, whether the diagnostic errors themselves can be prevented and, second, whether treatments for the diagnosed underlying diseases would prevent any associated untoward outcomes. Although combined diagnosis-treatment studies are recommended by the Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) group as ideal, this two-step link from diagnosis to health outcome is rarely assessed when diagnostic interventions are put to the test.389, 390 The strongest evidence of preventability of harms will come from prospective (preferably randomized) studies that measure the health outcomes of interventions to improve diagnosis and that demonstrate both greater diagnostic accuracy and a link between that greater accuracy and improved patient outcomes. Absent this level of rigor, measurements of inter-institutional variability (adjusted for likely confounders) may be a good proxy for preventability, with lower-performing institutions striving to match outcomes from higher-performing institutions.
- Differences in causal inferences based on different denominators: In the same way that inferences about error rates may differ dramatically depending on the denominator used (e.g., false negative rate [denominator all with disease] versus false omission rate [denominator all at-risk patients]), the same is true for causal inferences. For example, much of the apparent heterogeneity in our KQ3 results for demographic predictors likely stems from confusion about the inferences to be drawn from different study designs. For example, one study showed that being a woman or a minority is a risk factor for misdiagnosis of heart attack when looking back from heart attack admissions to antecedent treat-and-release ED visits, but not when looking forward from chest pain discharges to subsequent heart attack hospitalizations.77 These results seem conflicting, but they are not. The reason for this difference is as follows. The look back method normalizes overall risk for heart attacks by starting with heart attack hospitalizations as the denominator; this, in turn, allows investigators to assess the impact of gender or race on the likelihood of misdiagnosis, given equal baseline risk of the underlying disease. However, the look forward method uses chest pain discharges as the denominator; here the distribution of heart attacks is uneven by gender and race, with the largest number of heart attacks being among white men. Since the impact of disease prevalence is greater than the impact of misdiagnosis risk, the result is that white men are more likely to return having had their heart attack initially missed. Thus, the look back method speaks to the relative risk or odds of a misdiagnosis conferred on a patient based solely on their gender or race, while the look forward method estimates the absolute risk of a misdiagnosis based on a mix of disease and misdiagnosis prevalence. Because ED clinicians are likely to calibrate their decision-making to baseline disease prevalence, this may contribute to some proportion of the demographic disparities seen in diagnosis. Since that proportion is unknown, additional research should be done to assess the impact of prevalence-based reasoning on demographic disparities in diagnosis.
- Approaches to Measurement at the Institutional Level. No single measurement method or individual measure will suffice. A “portfolio” approach is needed. A one-size-fits-all approach is unlikely to be equally appropriate for all institutions. Offered below are a few different ways that an institution might choose to approach measuring diagnostic errors.
- Tailored-risk portfolio: An institution with limited measurement resources and a need to convince institutional leadership of the return on investment for measuring diagnostic errors might take a tailored-risk approach. This could begin with numerator-only measures (e.g., malpractice claims or incident reports) to identify specific symptoms or diseases which have been a known source of institutional risk. These could then spark a disease-specific approach to measurement such as SPADE for those clinical presentations, with balancing measures related to false positives and resource utilization (e.g., test frequency, hospital admission rates, false discovery rate). After addressing one or more diseases and showing improvement, the entire process could be repeated to identify current risks and then address new conditions.
- Top-harms portfolio: An institution with intermediate measurement resources and institutional recognition of the importance of diagnostic error might develop a local dashboard for the top conditions generally causing the greatest misdiagnosis-related harms from “undercalls” (i.e., false negatives with dangerous diseases)—top five most harmful vascular events (stroke, myocardial infarction, aortic aneurysm and dissection, venous thromboembolism, arterial thromboembolism) plus top five most harmful infections (meningitis/encephalitis, sepsis, spinal/intracranial abscess, pneumonia, necrotizing fasciitis). They could (i) use SPADE look-back metrics to identify high-risk clinical presentations, (ii) design interventions to address the most pressing of these, and (iii) measure impact of these interventions using SPADE look-forward metrics. They could use balancing measures related to false positives and resource utilization (e.g., test frequency, hospital admissions, false discovery rate) for each of these high-harm diseases to address “overcalls” (i.e., false positive diagnoses of dangerous diseases or inappropriate resource use in pursuit of those diseases).
- Comprehensive portfolio: An institution with more substantial measurement resources and leadership support to pursue institutional diagnostic excellence might combine the tailored-risk and top-harms portfolio approaches (described above) with systematic sampling of patient feedback (e.g., using LEAPED8) and systematic use of disease-agnostic e-triggers to identify (i) 7-day hospital admissions after ED treat-and-release visit to identify additional high-risk diseases or clinical presentations and (ii) unplanned escalation in care or change in treating service for admitted patients. Taken together, this comprehensive approach would address almost all potential opportunities to improve diagnostic performance in pursuit of diagnostic excellence.
- High-Stakes Measurement for Accountability, Payments, and National Benchmarking. Based on the results of this review, high-stakes, cross-institutional comparisons require greater standardization and efficiency than can be achieved using most of the available data sources and methods listed above.
- Data source (likely Medicare or HCUP databases): While integrated health plans (e.g., Kaiser Permanente, Intermountain Healthcare, and Geisinger Health System) and the Veterans Administration have electronic medical record data sources that are fairly complete and comparable within their respective systems, there are no such data sets for all hospitals nationally. Currently, the most promising data for high-stakes measurement in the United States are from Medicare beneficiaries, since Medicare billing data are gathered in fairly consistent fashion, from a relatively unbiased sample of older patients, at almost all U.S. hospital EDs. They are unconstrained by health system crossovers or geographic boundaries, and they incorporate death data. They do not, however, represent children, so cannot be used to assess pediatric diagnostic error. They also represent only a subset of ED cases (roughly 24 percent in 2018391), which means the sample size for some hospital-level analyses will need to sacrifice temporal resolution for smaller hospitals. It is possible that with greater state-level engagement in maintaining linkable ED visit (SEDD) and hospitalization (SID) patient databases that these two obstacles can be overcome, and the preferred data source would then likely become the AHRQ family of HCUP databases (though integration with the national death index for out-of-hospital mortality would be an important addition to increase capture of important outcomes). Both data sources would benefit by the addition of ongoing health-related quality of life (HRQoL) metrics, but implementation of this could prove cumbersome. An alternative would be for the Centers for Disease Control and Prevention to adapt the National Hospital Ambulatory Medical Care Survey (NHAMCS) to include short-term patient follow-up from their nationally representative sample of ED visits.
- Measurement method (likely SPADE): We found no methods of measurement other than SPADE using a statistically robust approach to measuring diagnostic error without the reliability challenges and high costs faced by triggered manual chart review or routine patient follow-up assessment (e.g., phone calls at 30 days). This appears to be the most promising method currently available for achieving valid, high-stakes measurement that can easily incorporate case mix severity adjustments or propensity score case matching.163 Missed cancer (including lung cancer) may require alternative monitoring methods, since the temporal risk profile of adverse events after a lung cancer misdiagnosis are very different than those after a missed vascular event or infection, making it less readily amenable to current SPADE methods.
- Disease metrics (Top 10+ for Serious Misdiagnosis-Related Harms): A reasonable place to start for national ED quality measurement would be to create metrics for the top five most harmful vascular events (stroke, myocardial infarction, aortic aneurysm and dissection, venous thromboembolism, arterial thromboembolism) and top five most harmful infections (meningitis/encephalitis, sepsis, spinal/intracranial abscess, pneumonia, necrotizing fasciitis). Diseases most appropriate to pediatric misdiagnosis, such as appendicitis and testicular torsion, could be added if the data source were changed to one that was not age restricted (i.e., if it were not Medicare data). Standardized ICD code sets for each disease could be derived from the Elixhauser system used by AHRQ in its Clinical Classifications Software,101 with appropriate modification to match the diseases in question (as done recently by Newman-Toker, et al.17). Ideally these measures would be endorsed by Emergency Medicine specialty societies (e.g., American College of Emergency Physicians, Society for Academic Emergency Medicine) and national quality and safety organizations (e.g., National Quality Forum, The Joint Commission).
- National Diagnostic Performance Dashboard: AHRQ, other government bodies (e.g., Centers for Disease Control and Prevention’s National Center for Health Statistics324), or non-governmental organizations could monitor the overall epidemiology and variability of diagnostic performance (specifically, diagnostic outcomes, which can be adjusted for case mix severity) across the nation (analogous to the Dartmouth Atlas Project for utilization of healthcare services347). For the 10+ diseases noted above, disease-specific metrics could be combined into a National Diagnostic Performance Dashboard. For simplicity, this might initially use only a look-back approach and ignore specific symptoms (as done recently by Waxman, et al120). Later, for greater precision and monitoring of diagnostic quality and safety performance, ICD symptom code sets could be added for the most common ED symptoms. This would allow realization of the full potential of SPADE analysis, using both look-back (identifying high-risk presentations and disparities in diagnosis) and look-forward (measuring absolute harm rates and monitoring impact of solutions) approaches, which have been shown to vary substantially by hospital (e.g., for acute myocardial infarction, where misdiagnosis-related adverse event rates varied 3.3-fold from 0.6% to 1.9% across individual EDs, P < 0.00177) and permit observed minus expected analysis to detect statistically valid excess adverse events above the base rate.256 The purpose of a national monitoring mechanism would be multiple: (i) providing a benchmarking tool for individual institutional ED performance; (ii) monitoring national diagnostic quality and safety (e.g., temporal trends and health disparities) to help guide policy decisions; (iii) assessing the impact of major policy interventions (e.g., payment reforms that incentivize better diagnostic performance).
Research Recommendations
Specific research recommendations related to KQ1, KQ2, and KQ3 may be found in the sections above entitled “Gaps in Evidence,” but a high-level summary is provided here. For KQ1, the diseases most often misdiagnosed but causing lesser or longer-term harms are poorly understood. Research is needed to better understand the most common less-harmful conditions misdiagnosed other than fractures (e.g., inner ear diseases, migraine headaches). More research is also needed to better characterize the diseases associated with diagnostic error in pediatric ED settings and specialty EDs, where there are many fewer studies. For KQ2, large U.S.-based studies using rigorous, prospective ascertainment are needed to validate that estimated error rates reflect current U.S. ED diagnostic performance, but these should be deliberately designed to assess the extent to which less rigorous but easier methods of measurement can serve as valid proxies. Special attention should be paid to further assessing the relative frequency (and absolute total rates) of harm (and estimated preventable harm) among discharged versus admitted patients, and among false negatives versus false positives in each of these groups. False positive diagnostic errors should be a key research focus, given the relative paucity of studies addressing this issue. For KQ3, more research needs to be done to clarify the extent to which structural factors (particularly those that could be induced to change by payment mechanisms) are strong predictors of diagnostic error and harms. For example, these might include ED discharge fraction, staffing patterns (e.g., volumes per clinician, routine availability of consultants), and access to specialized imaging or diagnostic laboratory tests. Additional work should be done to better elucidate the relationship between clinician mental models of disease prevalence and implicit bias towards specific demographic groups (e.g., women and minorities). It is unknown how much of the health disparity seen with diagnostic errors can be attributed to true prevalence effects with appropriate clinical risk assessments, perceived (yet false) prevalence estimates, versus fundamental bias; some of these closely related effects may only be readily differentiated using experimental methods, as seen in many cognitive psychology experiments. Reporting of research on diagnostic errors should be standardized, probably with an extension to the existing STARD reporting guidelines focused on diagnostic error-related studies.
Conclusions
This report summarizes current best evidence regarding the nature, frequency, and causes of ED diagnostic errors. Our review findings are tempered by limitations in the underlying evidence base, including issues related to data sources, measurement methods, and causal relationships. Nevertheless, its contents are relevant to patients, ED clinicians, quality officers, risk management professionals (and professional liability insurers), educators, and policymakers, among others. The results and conclusions presented herein should be viewed through the lens of opportunity for quality improvement and increased diagnostic safety for ED patients, rather than as an indictment of current ED care or ED clinicians. It is acknowledged that the ED is a particularly challenging setting in which to practice medicine, and many factors contribute to diagnostic errors that occur there.
Despite this, we estimate that 1 of every 18 patients is misdiagnosed in the ED, one of every 50 suffers a diagnostic adverse event, and about 1 of every 350 patients is seriously harmed as a consequence of diagnostic error. Put in terms of an average ED with 25,000 visits annually and average diagnostic performance, each year this would be over 1,400 diagnostic errors, 500 diagnostic adverse events, and 75 serious harms, including 50 deaths. This translates to 10 patients harmed and more than 1 death or disability each week at an average-sized ED. New insights for the field generated by this report include the following:
- Just 15 diseases likely account for more than two-thirds of serious misdiagnosis-related harms in the ED, making the problem of diagnostic error more tractable than previously imagined. Among these ten diseases, myocardial infarction is the only one with miss rates near zero (1.5%), well below the estimated average diagnostic error rate across all diseases (5.7%). The field should seek to replicate these successes for other high-harm diseases (which currently have estimated miss rates of 10-56%), modeling new interventions after the successful multi-pronged approaches to ED diagnosis of chest pain and acute coronary syndromes. Target diseases should be prioritized based on (a) the overall share of high-severity harms, (b) higher absolute error or harm rates (i.e., with opportunity for improvement), (c) variability in diagnostic performance (including known health disparities or variation by organization, site, or provider), and (d) availability or cost-effectiveness of promising solutions. Missed stroke in patients presenting with dizziness, which ranks high on all four criteria (stroke is #1 cause of harm; rate of missed stroke in dizziness is 40%; variability is documented based on hospital characteristics; and solutions have been demonstrated in clinical trials), is likely the top target.
- We estimate that each year in the United States there may be more than 7 million diagnostic errors and 350,000 patients who are permanently disabled or die due to diagnostic error. Methods of measuring diagnostic errors in the ED are highly variable, but, even when similar methods are used, diagnostic error rates vary up to 100-fold across individual hospitals. More than any other finding, this variability indicates that opportunities for diagnostic quality improvement exist. Diagnostic error measurement and reporting should be standardized for both internal and external benchmarking purposes, including public accountability. This report proposes approaches to standardizing measurements and measurement pitfalls to avoid. When quantifying serious misdiagnosis-related harms, it is imperative to measure both mortality and morbidity to fully represent adverse health outcomes for patients. In doing so, great care should be taken to avoid the known trap of the “misdiagnosis is protective” paradox1 by using clinically appropriate, statistically valid adjustments for initial case severity. Solutions should be designed to address both false negatives and false positives, and all measurement and reporting of diagnostic error should be accompanied by balancing measures that monitor diagnostic test utilization and hospital admission rates.
- Root causes of ED diagnostic errors are disproportionately cognitive in nature and mainly happen at the bedside. Those resulting in serious misdiagnosis-related harms involve failures of clinical assessment, reasoning, or decision-making in roughly 90 percent of cases. The strongest, most consistent predictors of ED diagnostic error are case factors that increase the cognitive challenge of identifying the underlying disorder, with “non-specific,” “atypical,” or “milder” symptoms being the most frequent. This suggests that system-wide, scalable solutions need to be developed to tackle cognitive problems, and that these solution sets must be targeted to address not the most common clinical presentations of key diseases of interest but the most commonly misdiagnosed clinical presentations of key diseases of interest. This is a tractable approach because epidemiologic studies using the SPADE look-back method have shown that only a handful of symptoms account for the majority of missed clinical presentations for any one disease64, 77, 94—in other words, these are what might be called “typical” atypical cases or recurring diagnostic pitfalls.392 To support reliable delivery of enhanced diagnostic expertise at the bedside, solution sets should capitalize on training, teamwork, and technology. Interventions and tools should be tailored to specific symptoms/diseases, then organized as modules. For example, stroke diagnosis could be bolstered to explicitly identify posterior circulation strokes among patients with dizziness and vertigo (one of the key symptoms conferring the greatest risk of misdiagnosis). This might include (a) scalable training tools using virtual patient cases349; (b) enhanced teamwork via clinical pathways that incorporate rules to determine need for specialty consultation338 or engage nurses and other allied health professionals more effectively in the diagnostic process339, 343, 393; and (c) computer-based decision support using point-of-care technology.371, 372 All of these solutions should be subjected to rigorous outcomes research to assess any benefits to improved diagnosis or unintended consequences (e.g., test overuse).
Future research should emphasize areas in which data are lacking, such as the burden of diagnostic errors and harms related to diseases with less immediate and severe consequences, pediatric ED diagnostic errors and harms, and the causal contributions of systems factors potentially amenable to policy intervention. Importantly, large, prospective studies are needed to validate current diagnostic error rate estimates, as well as to help develop valid proxy measures that are more readily and routinely acquired for operational measurement. A key focus of research should be to define symptoms and diseases for which diagnostic errors and associated harms can realistically be mitigated and to measure the real-world impact of interventions and strategies in reducing these errors and harms. Policy changes to consider based on findings from this review include: (1) standardizing measurement and research results reporting to maximize comparability of measures of diagnostic error and misdiagnosis-related harms5, 345, 346; (2) creating a National Diagnostic Performance Dashboard to track performance (analogous to the Dartmouth Atlas Project for utilization of healthcare services347); and (3) using multiple policy levers (e.g., research funding, public accountability, payment reforms)5 to facilitate the rapid development and deployment of solutions to address this critically important patient safety concern. Resources applied should be commensurate with the large public health burden.
- Discussion - Diagnostic Errors in the Emergency Department: A Systematic ReviewDiscussion - Diagnostic Errors in the Emergency Department: A Systematic Review
Your browsing activity is empty.
Activity recording is turned off.
See more...