The Key Question (KQ) is divided into four subquestions:
- KQ1a. What is the comparative diagnostic accuracy of approaches that can be used in the primary care practice setting or by specialists to diagnose attention deficit hyperactivity disorder (ADHD) among individuals younger than 7 years of age?
- KQ1b. What is the comparative diagnostic accuracy of electroencephalogram (EEG), imaging, or approaches assessing executive function that can be used in the primary care practice setting or by specialists to diagnose ADHD among individuals aged 7 through 17?
- KQ1c. For both populations, how does the comparative diagnostic accuracy of these approaches vary by clinical setting, including primary care or specialty clinic, or patient subgroup, including, age, sex, or other risk factors associated with ADHD?
- KQ1d. What are the adverse effects associated with being labeled correctly or incorrectly as having ADHD?
The gold standard or reference standard against which diagnostic tools were compared was diagnosis by a mental health specialist, such as a psychologist, psychiatrist or other care provider. In many cases, clinicians used published scales or semi-structured diagnostic interviews to ensure a well-validated and reliable process of confirming the diagnosis of ADHD according to the Diagnostic and Statistical Manual of Mental Disorders (DSM), as outlined in more detail in the evidence table. Many identified studies included a broader age range rather than differentiating clearly between younger (KQ1a) or older (KQ1b) than seven years of age. Hence, we added a section describing the results for parental ratings, teacher ratings, clinician tools, and biomarkers before addressing the Key Questions. The section summarizes results by test and most studies evaluated a combined sample of children and adolescents. The KQ1a section describes all diagnostic approaches for children younger than seven years of age regardless of the applied test. The KQ1b section describes EEG, imaging, and executive function tests for children seven and up.
4.1. KQ1, ADHD Diagnosis Key Points
Key points pertaining to the diagnosis of ADHD are as follows.
- Multiple approaches showed promising diagnostic performance (e.g., using parental rating scales), but estimates of performance varied considerably across studies, and the strength of evidence (SoE) was generally low.
- Diagnostic test performance likely depends on whether youth with ADHD are being differentiated from typically developing children or from clinically referred children who had some kind of mental health or behavioral issue.
- Rating scales for parent, teacher, or self-assessment as a diagnostic tool for ADHD have high internal consistency but poor to moderate reliability between raters, indicating that obtaining ratings from multiple informants (the youth, both parents, and teachers) may be valuable to inform clinical judgement.
- Studies evaluating neuropsychological tests of executive functioning (e.g., Continuous Performance Test) used study-specific combinations of individual cognitive measures, making it difficult to compare performance across studies.
Diagnostic performance of biomarkers, EEG, and magnetic resonance imaging (MRI) scans show great variability across studies and their ability to aid clinical diagnosis for ADHD remains unclear. Studies have rarely assessed test-retest reliability, no findings have been replicated prospectively using the same measure in independent samples, and real-world effectiveness studies of diagnostic performance have not been conducted.
- Very few studies have assessed performance of diagnostic tools for ADHD in children under the age of 7 years and more research is needed.
- The identified diagnostic studies did not assess the adverse effects of being labeled correctly or incorrectly as having a diagnosis of ADHD.
4.2. KQ1, ADHD Diagnosis Summary of Findings
We identified 231 studies addressing the performance of tests aiming to diagnose ADHD.18, 21, 24, 27, 28, 111, 112, 115, 117, 119–121, 124, 134, 135, 140–143, 152, 153, 157, 159, 162, 167–170, 172, 177, 179, 181–192, 197, 198, 210, 211, 213, 214, 218, 223, 230, 231, 233, 234, 237, 241, 242, 244–246, 251, 253, 260, 263, 267, 276, 277, 282–285, 287, 293, 297–301, 303, 307, 309, 311, 312, 314–316, 319, 322, 323, 327, 331, 336, 338–340, 342, 344, 346, 347, 351, 352, 355, 356, 359, 362, 365, 366, 369, 370, 379, 382, 385, 388–391, 393–395, 397, 400–405, 407, 408, 412, 413, 415–417, 420–424, 427, 429, 434, 436–438, 445–450, 462–465, 467–470, 473, 475, 477, 479, 482, 486, 487, 491, 493–496, 498–502, 506, 514–516, 518, 519, 524, 527, 528, 536, 537, 541–543, 546–549, 553, 558, 559, 563, 564, 566, 570, 571, 576, 580–584, 587, 591, 592, 599, 600, 603, 605, 607, 614, 615, 625, 627, 630–633, 635, 638, 639, 641, 642, 644, 647 The methodological rigor and the reporting varied substantially in the identified studies. The potential for risk of bias in the studies is documented in Figure 5. The critical appraisal for the individual studies is in Appendix D.
Selection bias was likely present in two thirds of studies. Often samples were restricted and did not necessarily represent the full range of children with ADHD. For example, studies explicitly reported using a convenience sampling strategy. Index test issues were present in ten percent of studies. Although the review was restricted to studies reporting a clinical diagnosis of ADHD for participants, reference standard issues were also present in a small number of studies, in particular due to lack of details on procedures and/or diagnosticians.111, 142, 233, 342, 405, 412, 450, 516, 553, 642 Flow and timing was rated as high risk of bias in several studies.111, 121, 143, 162, 172, 312, 319, 351, 379, 501 Typically this was due to an unclear participant flow (e.g., it was unclear whether the diagnosis was known before the results of the index test was known).
We also assessed possible applicability issues that could influence the generalizability of the reported data. Figure 6 shows the summary of rated applicability. The applicability for the individual studies is in Appendix D.
In several studies, samples were employed that do not represent the general population of children with ADHD, usually because children with co-morbidities were excluded. In addition, several papers took place in specialty care settings with diagnostic and treatment options that go beyond the standard course of action for children with ADHD.
4.3. Summary ADHD Diagnosis by Tests for All Age Groups
We broadly differentiated between parental ratings, teacher ratings, tools for clinicians, teen self-reports, neuropsychological tests, imaging, EEG, biomarker, activity markers, and other (e.g., electrocardiogram [EKG] indicators). Studies evaluated a large number of different tools within the broader categories. In addition, where studies used the same diagnostic tool (e.g., a rating scale), authors used different components of the tool (e.g., specific subscales) or combined components in a variety of ways (e.g., different neuropsychological parameter). We identified 68 studies that used machine learning algorithms to determine the best diagnostic approach.28, 115, 120, 121, 143, 152, 157, 172, 179, 181, 182, 185–188, 191, 211, 214, 223, 233, 234, 245, 253, 282, 283, 299, 303, 322, 323, 340, 355, 356, 369, 370, 388, 394, 400, 402, 403, 407, 408, 412, 420, 429, 434, 438, 449, 450, 467, 468, 473, 494, 495, 518, 541, 543, 571, 581, 582, 591, 592, 599, 603, 630–633, 641 Studies were published since 201228 and came from 21 different countries, but primarily the United States28, 152, 223, 233, 234, 282, 299, 323, 400, 403, 412, 467, 495, 518, 1188 and China.185, 187, 188, 191, 394, 407, 408, 571, 581, 630, 632, 641 A third of identified studies used EEG markers as the data source115, 120, 143, 157, 172, 179, 187, 188, 322, 340, 370, 394, 412, 438, 449, 468, 473, 494, 592, 883 with another third of the studies using MRI191, 282, 495, 518, 571, 581, 630, 633, 1188 The remaining studies used neuropsychological test components, rating scale scores, activity estimates, or other sources. Some studies were able to achieve 100 percent sensitivity with the help of machine learning (corresponding specificity 100%)143, 152 Other studies maximized specificity, and some achieved 100 percent specificity in machine learning supported diagnostic models (corresponding sensitivities 100, 97, 75, 98, and 100% respectively).121, 143, 152, 370, 450 Across machine-learning supported studies, accuracy ranged from 61 percent282 to 100 percent.143, 152, 468
Given that most studies included younger (typically 5- and 6-year-olds) and older children, the following section describes diagnostic tools relevant to all age groups. Some studies evaluated more than one test (e.g., a parental rating and a teacher rating).
4.3.1. Parental Ratings
We identified 59 studies using Parental ratings to diagnose ADHD.18, 117, 134, 168, 169, 190, 218, 223, 230, 233, 234, 241, 242, 244, 251, 263, 285, 287, 297, 300, 301, 311, 314, 331, 336, 339, 342, 344, 359, 362, 390, 391, 423, 424, 427, 447, 448, 463, 464, 482, 487, 491, 498, 502, 514–516, 519, 527, 528, 547, 553, 558, 559, 584, 587, 605, 638, 642 The earliest study meeting inclusion criteria was published in 1985.514 Evaluations of parental rating tools came from five different English-language speaking countries, but most studies were from the United States.134, 169, 190, 230, 233, 234, 241, 242, 244, 251, 263, 285, 297, 299, 311, 331, 336, 339, 342, 344, 359, 390, 391, 423, 424, 427, 448, 463, 464, 482, 487, 491, 498, 502, 514–516, 519, 527, 528, 547, 553, 558, 559, 584, 605, 638, 642The populations studied were predominately males and included participants ranged between the ages of two and 18. Four studies exclusively included children younger than seven years old.331, 516, 519, 559 For studies that distinguished between ADHD presentations, most of the participants were diagnosed with the combined or inattentive presentations. In one study focusing on preschool age children who presented with disruptive behavior disorders, 57 percent of participants were diagnosed with the hyperactive/impulsive presentation.331 While ADHD participants with co-occurring disorders were not excluded from most studies, only a few purposely included children with specific co-occurring disorders such as disruptive behavior disorders331 or autism.234, 447 However, about half of identified studies came from clinical samples, rather than general neurotypically developing children–- i.e., they identified children undergoing a diagnostic workup for a potential diagnosis of ADHD, conduct disorders, autism, or depression.
In half of the identified studies, White participants made up more than 70 percent of the sample. One study evaluated diagnostic accuracy a sample in which over 50 percent of participants were Black/African American,462, 536 and one study was identified in which 85 percent of participants were Hispanic or Latino.553 Studies reported predominantly on the estimated sensitivity and specificity. Some studies also reported on the area under the curve (AUC) as a summary test performance, but other key outcomes were less frequent. Figure 7 plots the sensitivity and specificity for the parental rating scale evaluated in the study.
The studies reporting sensitivity and specificity (the measures are not independent from each other, and high sensitivity can come at a cost of low specificity and vice versa) show the wide variation in diagnostic accuracy estimates. The figure also shows that studies evaluated a large range of different parental rating scales, with few studies reporting on the same tool.
The most frequently evaluated diagnostic tool was the CBCL (Child Behavior Checklist), either alone or in combination with other scales, using different cutoffs, and evaluating different subscales (the attention deficit/hyperactivity problems subscale most frequently). Reported sensitivity for the CBCL ranged from 71 percent in a study differentiating ADHD and oppositional defiance disorder331 to 84 percent in two studies, one using an outpatient pediatric medical clinic, the other one a sample of children with traumatic brain injury.190, 605 Reported specificity for this parental scale ranged from 33 percent587 to 93 percent190 in the pediatric medical clinic sample. The reported AUC ranged from 0.55344 to 0.93190 with three independent studies reporting estimates of 0.83 or 0.84 for this diagnostic measure for the CBCL.251, 331, 498 The evidence table in the appendix shows the results for all diagnostic and psychometric outcomes of interest for all identified studies.
Table 3 shows the findings for the outcomes of interest together with the number of studies and study identifiers for parental rating scales. For the main results, we report findings from population samples that differentiated ADHD from neurotypical developing children separately from results obtained in clinical samples, given that the study population was identified as one of the sources of heterogeneity in reported results as documented in KQ1c. Results are shown across studies and tools for the main analyses. Where at least two different author groups reported on the same rating scale, we provide results for a specific scale.
Parental ratings reported mainly on the sensitivity and specificity. A few studies reported perfect diagnostic performance for parental ratings for either sensitivity or specificity, but not both together. Little information was provided in these diagnostic studies regarding the reliability of the measures given the large range of different measures evaluated by study authors. We downgraded the strength of evidence for study limitation (lack of detailed reporting), imprecision (large variation in reported diagnostic performance) and for inconsistency (when consistency could not be assessed because no study was identified, or only one study was identified reporting on the test and outcome of interest and results have not been replicated by another author group, or only limited data points were available). None of the included studies provided information on the effect of misdiagnosis. None of the identified studies reported the costs associated with obtaining parental ratings.
4.3.2. Teacher Ratings
We identified 23 studies using Teacher ratings to diagnose ADHD.18, 119, 183, 218, 242, 299, 301, 314, 342, 359, 362, 391, 463, 479, 482, 491, 519, 527, 528, 558, 559, 587, 642 The earliest study meeting eligibility criteria was published 1998479 from four different English-speaking countries, primarily the United States.242, 299, 342, 359, 391, 463, 479, 482, 491, 519, 527, 528, 558, 559, 642 The populations studied were predominately males between the ages of three and 18. Two studies exclusively included children younger than seven years old519, 559 and two exclusively in children eight years or older.119, 359 For studies that distinguished between ADHD presentations, most of the participants were diagnosed with the combined or inattentive presentations. Almost all of the studies mention race and ethnicity demographics, with 14 studies where White participants made up greater than 70 percent of the sample, and one study in which over 85 percent of the participants were Black/African American.
ADHD participants with co-occurring disorders were not excluded from most of the studies. Studies were divided into clinical samples and those recruited from a less selective population. None of the studies included children who all had a dual diagnosis, such as ADHD and conduct disorder.
Studies reported a variety of outcomes, with sensitivity and specificity being the most frequently reported outcomes. Figure 8 plots the reported sensitivity and specificity for teacher rating scales.
The figure shows the large range in reported sensitivity and specificity. It also shows that studies have evaluated many different teacher rating tools.
The Teacher Report Form, alone or in combination with Conners teacher rating scales, and using the total or the subscale of attention problems, was evaluated in more than one study.242, 301, 342, 587 Reported sensitivity ranged from 72 percent301 to 79 percent.587 Reported specificity estimates ranged from 64 percent587 to 76 percent.242 Two of the studies reported on AUC and found 0.65342 for the attention problem subscale and 0.77301 in combination with the Conners 3 teacher short form. No two studies reported on rater agreement, internal consistency, or test-retest reliability for the same teacher rating scale.
Table 4 shows the findings for the outcomes of interest together with the number of studies and study identifiers.
Across all teacher rating studies, reported sensitivity in individual studies were up to 97 percent in a clinical sample, but the corresponding specificity was only 26 percent.314 We downgraded the strength of evidence for imprecision (large variation in reported diagnostic performance) and for inconsistency (when consistency could not be assessed because only one study was identified reporting on the test and outcome of interest and results had not been replicated by another author group). Identified diagnostic accuracy studies did not report on several of the other key outcomes.
4.3.3. Teen/Child Self-Reports
We identified six studies using teen/child self-reports to diagnose ADHD.142, 168, 231, 297, 491, 506 The earliest study was published in 2002506 and data came from two countries, the United States231, 297, 491 and Canada,142, 168, 506 respectively. Self-reports were primarily completed by adolescents, however one study provided a research assistant to help read the questions for participants under 11 years old.297 Only one study documented the ADHD presentation: 10 percent inattentive presentation, 4 percent hyperactive/impulsive presentation, and 25 percent combined presentation.491 Two studies mentioned race and ethnicity demographics. In one study, White participants made up 61 percent of the sample297 and one study reported 89 percent of the participants were Black/African American.491
Studies reported a limited number of outcomes, with sensitivity, specificity, and AUC being the most frequently reported outcomes. No two identified studies reported on the same self-report measure. Reported diagnostic success varied widely. Table 5 shows the findings for the outcomes of interest together with the number of studies and study identifiers. None of the tools was evaluated in more than one study.
The reported diagnostic performance of teen self-reports was limited. We downgraded for the domain inconsistency (inability to judge the consistency across studies because only one study was identified reporting on the test and outcome of interest). In several cases, our searches identified no studies and the strength of evidence is insufficient for the outcome.
4.3.4. Combined Ratings
We identified 13 studies that assessed the diagnostic performance of ratings combined across informants.18, 189, 277, 297, 303, 405, 467, 479, 527, 548, 559, 570, 600 The studies compared the information from multiple raters to the reference standard. Studies combined information sources in different ways, often selecting individual variable with the help of machine learning. Only one of these studies compared the performance when combining data from multiple informants to that of single informants: it found negligible improvement when combining youth self-report to the parent report alone using an adaptive testing questionnaire (AUC youth only 0.71; parent only 0.85; combined 0.86) in a treatment-seeking population.297
The studies reported only on selected accuracy measures. One study combined parent and teacher ratings on the Conners scales by requiring youth to meet diagnostic cutoffs (T-score ≥65) in one setting and substantial symptoms in the other setting (T-score ≥60). It reported a diagnostic sensitivity of 84 percent and specificity of 36 percent for the combined rating when distinguishing ADHD from other clinically referred youth.18 One study reported findings from a discriminant function analysis of mother, father, and teacher ratings on the Conners scale when distinguishing ADHD youth who were considered either intellectually gifted or not from typically developing, intellectually gifted youth. It found that the discriminant function using all three informants distinguished the typically developing youth from the two ADHD groups but did not distinguish the two ADHD groups from one another.277 A study in four to seven year old children used machine learning to combine parent and teacher ratings on the BRIEF in distinguishing youth with ADHD from typically developing controls. It reported an average diagnostic accuracy of 0.93, with teacher ratings being the most informative in the machine learning algorithm, though it did not formally compare accuracy for combined informants with accuracy for either informant alone. The study also found that the addition of neuropsychological test measures and cortical thickness measures to the machine learning algorithm did not meaningfully improved diagnostic performance over use of the BRIEF alone.467 The best AUC was reported by a machine learning supported study combining parent and teacher ratings (AUC 0.98).405
The studies did not report reliability measures for ratings combined across informants; studies reported only psychometric performance in individual informant groups. For example, one of the studies reported that individual ratings of the BRIEF using parent and teacher ratings found intraclass correlation coefficients (ICCs) from 0.31 to 0.59 across subscales.570Another study reported the range of Cronbach’s alpha estimates across teacher and parent ratings for individual scales, all indicating substantial internal consistency (with the lowed Cronbach’s also of 0.72, all other values were above 0.90).467
4.3.5. Clinician Tools
We identified 24 of studies evaluating additional tools that could be used by clinicians or the healthcare system (beyond neuropsychological tests; parent, teacher, or self-report ratings; biomarkers; or imaging) to aid the diagnosis of ADHD.27, 121, 167, 181, 298, 299, 311, 338, 355, 362, 385, 388, 389, 400, 403, 407, 416, 417, 434, 437, 499, 542, 566, 627 The earliest identified study was published in 2009.627 Evaluations were published in three different countries, including eight from the United States.27, 299, 311, 389, 400, 403, 542, 566 The populations studied were predominately males and included youth were between the ages of three and 18. Most studies did not distinguish between ADHD presentations but three studies restricted to the combined ADHD type.121, 416, 627 Where studies mentioned race and ethnicity demographics of the sample composition, the percentage of White children ranged from 52 to 100 percent, the number of Black or African American children ranged from two to 44 percent, Hispanic/Latino children three to 20 percent, and Asian children one to three percent.
Studies used different tools, including diagnostic interview guides and observation tools. Several studies measured child activity levels as an objective test, for example through an actometer or commercially available activity tracker121, 181, 298, 355, 400, 403, 416, 437, 627 and two evaluated direct observation as a diagnostic tool.167, 362 Three studies used insurance claim-based algorithms or medical health record indicators434, 542, 566 The remaining studies addressed unique interventions and questions, for example, one study focused on the clinical utility of International Classification of Diseases [ICD]-11 diagnostic guidelines499 and a clinician diagnosis combined with an assessment aid that involved integrating EEG and theta/beta ratio data.27
Studies are difficult to compare since they assess different tools and approaches. Studies reported a variety of outcomes, with sensitivity and specificity being the most frequently reported outcomes. Table 6 shows the findings for the key outcomes of interest together with the number of studies and study identifiers. Where all identified studies evaluated the same tool, the first column of the study indicates the tool, otherwise estimates are reported across all tools.
We downgraded the strength of evidence for imprecision (very large variation in reported diagnostic performance) and for inconsistency (when consistency could not be assessed because only one study was identified reporting on the test, and outcome of interest and results had not been replicated by another author group). The tools were difficult to compare and answered study-specific questions.
4.3.6. Biomarkers
We identified seven studies using proposed biomarkers obtained from biospecimen to diagnose ADHD.309, 501, 563, 583, 603, 635, 644 EEG and imaging approaches are reported in section 4.3.7 and the evidence table (Appendix C, Table C.1.) shows additional, more unique approaches using other approaches such as eye movement tracking to diagnose ADHD. Five identified studies used blood measures, including membrane potential ratio563 and erythropoietin/erythropoietin receptor,309 and three of these studies analyzed miRNA obtained from blood samples.603, 635, 644 The other studies evaluated urine indicators.501, 583 The earliest identified study was published in 2007.501 Evaluations were published in five different countries, including one from the United States.563
The populations studied were predominately males between the ages of six and 17. Most studies required participants to not be taking stimulant medication. For studies that distinguished between ADHD presentations, most of the participants were diagnosed with the combined presentation.563, 635, 644 Only two studies mentioned race and ethnicity demographics, one where all of the participants were Han Chinese603 and the other where the majority of participants were Black/African American.563 None of the studies used a clinical sample or children with a consistent co-morbidity.
Table 7 shows the findings for the outcomes of interest together with the number of studies and study identifiers. Given the clinical diversity of the biomarkers (e.g., differences in invasiveness and technological requirements of tests), we include results across all biospecimen evaluations, blood markers, miRNA specifically, and urine indicators where more than one study was identified that reported on the outcome.
Biomarker studies reported mainly on sensitivity and specificity. Selected studies achieved very high sensitivity.309 Little information was provided in the studies regarding the reliability of the markers or combinations of markers. None of the included studies provided information on the effect of misdiagnosis. None of the identified studies reported the costs associated with analyzing biomarkers.
4.3.7. EEG
We identified 45 studies using EEG markers to diagnose ADHD.27, 111, 115, 120, 124, 143, 157, 172, 179, 182, 186–189, 192, 197, 245, 312, 322, 340, 351, 356, 365, 366, 370, 394, 395, 397, 404, 408, 412, 413, 415, 420, 438, 449, 465, 468, 473, 487, 494, 546, 548, 592, 641 The earliest identified study was published in 2003.546 EEG evaluations were published in 17 different countries, primarily Iran and China, with four studies published in the United States.27, 412, 487, 548 The populations studied were predominately males between the ages of six and 17, with only three studies including children as young as four years old.157, 340 One study included only female participants,197 and seven studies included only males.111, 179, 412, 413, 449, 468, 473 In several studies, participants were required to demonstrate an IQ of 80 or higher and almost half of the studies required that participants not take stimulant medication or stop medication several days before testing. For studies that distinguished between ADHD presentations, most focused on the combined and inattentive presentations. Race and ethnicity demographics were not mentioned in most studies.
While ADHD participants with co-occurring disorders were not excluded from most studies, only a few studies purposely included specific co-occurring disorders to evaluate the diagnostic test performance in children with co-occurring conduct disorder or other behavioral disorders.143 The large majority of studies had unselected samples, i.e., comparing children with ADHD to neurotypical developing children.
Studies used EEG signals obtained during a resting state with eyes closed, eyes open, while performing neuropsychological tests, and/or recording event-related potentials. Studies varied in the reported detail (e.g., number of electrodes, channels, frequency and duration of the recording); the documented information is shown in the evidence table in the appendix. Two thirds of studies used machine learning algorithms to select parameter for classification. Several studies explicitly reported combining EEG data with specific demographic variables or rating scale results.27, 124, 143, 189, 192, 312, 351
Table 8 shows findings for the outcomes of interest together with the number of studies and study identifiers.
EEG studies predominantly reported accuracy estimates. Sensitivity in individual studies ranged widely from 46 percent197 to perfect sensitivity (corresponding specificities 71%);143, 413 the range was reduced in studies restricting to older children. Studies in clinical samples reported a reduced range of sensitivity and specificity compared to studies differentiating children with ADHD from neurotypically developing children, but the identified samples were either small or they augmented EEG predictions with demographic variables. Some studies combined EEG data with demographics; the achieved sensitivity was reported as 100 percent (corresponding specificity 100%) in one study.143 We downgraded the strength of evidence for imprecision (large variation in performance across studies). In addition, we downgraded for study limitations as diagnostic approaches were often not well described. For some outcome measures, no study was identified that assessed it and determining the effects associated with the test was not possible.
4.3.8. Imaging
We identified 19 studies using neuroimaging.28, 191, 282, 319, 400, 464, 467, 495, 518, 524, 549, 571, 580, 581, 591, 630, 631, 633 Studies were predominantly published in the U.S. and China. A publicly available dataset (ADHD-200) produced numerous analyses.191, 282, 495, 581 The populations studied were predominately males between the ages of six and 17, with one study including only male participants.630 In several studies, participants were required to demonstrate an IQ of 80 or higher to be included in the sample.495, 549, 571, 630, 631 A quarter of the studies required participants not be taking stimulant medication or to stop medication several days before testing.571, 630, 633 A third of the studies included only right-handed participants400, 495, 571, 630 In studies that distinguished between ADHD presentations, most focused on the combined and inattentive presentations. A minority specified including individuals with the hyperactive/impulsive presentation.191, 282, 549, 633 Nearly all studies did not include race and ethnicity demographics.
While ADHD participants with co-occurring disorders were not excluded from most of the studies, no studies specifically assessed test performance in children with specific co-occurring disorders. One study differentiated children with ADHD from those with dyslexia.524 One evaluated the diagnostic performance of an algorithm differentiating ADHD from autism.282 All studies used unselected, general samples, rather than clinical samples referred for further diagnostic workup (where a large proportion of children will either be diagnosed with ADHD, conduct disorders, autism, or depression).
All but two imaging studies used MRI to diagnose ADHD. However, studies utilized MRI in different ways. Some studies used functional MRI, some structural MRI, some used combinations of structural and functional MRI, with or without magnetic resonance spectroscopy. Two studies used near-infrared spectroscopy but the applications and diagnostic models differed.211, 631 Most of the imaging studies used a large number of indicators and utilized machine learning algorithms to detect markers to optimize the classifications. The reporting of the variable selection process varied, and it was often not clearly reported which exact indicators were included in the model used to determine diagnostic accuracy. Sone of the identified studies combined imaging parameter with demographic or other clinical data for the prediction model.191, 211, 282, 400, 467, 495, 631, 633
Reported diagnostic accuracy estimates varied widely. Table 9 shows the findings for the outcomes of interest, together with the number of studies and study identifiers. The table summarizing findings across all imaging studies, findings for MRI studies specifically, and imaging studies that combine imaging parameters with other variables (e.g., demographics) for predictions.
Studies reported primarily on sensitivity, specificity, and accuracy. Across all neuroimaging studies, reported sensitivity varied widely. We downgraded the strength of evidence for imprecision (large variation in performance reported across studies). In addition, we downgraded for study limitations as the individual diagnostic models were often not well described and the number and type of predictor variables feeding into the model was unclear. For some outcomes, no study was identified, and it was not possible to determine the effects associated with the diagnostic modality. Some studies combined neuroimaging data and demographics, though the relevance is unclear, since the only demographic characteristic that is likely associated with a diagnosis of ADHD is sex, with a higher prevalence in males.
4.3.9. Neuropsychological Tests
We identified 74 studies using neuropsychological tests, assessing executive function and/or encompassing a variety of cognitive assessments, including continuous performance tests, to diagnose ADHD.18, 21, 24, 112, 119, 135, 140, 141, 152, 153, 159, 162, 170, 177, 184, 185, 190, 198, 213, 237, 246, 253, 263, 267, 276, 284, 293, 298, 307, 315, 316, 323, 327, 346, 347, 351, 352, 379, 382, 393, 401, 402, 417, 421, 422, 436, 445, 446, 450, 462, 467, 469, 470, 475, 477, 482, 486, 493, 496, 500, 515, 537, 541, 543, 564, 576, 607, 614, 615, 625, 627, 632, 639, 647 Rating scales of executive function are described in the parent and teacher rating section in the beginning of the chapter.
The earliest study evaluating a neuropsychological tests as diagnostic tools was published in 1999496 and evaluations came from 18 different countries, primarily the United States. The populations studied were predominately males between the ages of six and 18. Three studies included three and four year old children.162, 315, 467 In several studies, participants were required to demonstrate an IQ of 70 or higher24, 346, 352, 365, 467, 469, 500 with some studies requiring IQ to be at least 8021, 152, 253, 647 or 85.379, 446, 486 Two thirds of the studies required participants not take stimulant medication or stop medication several days before testing. For studies that distinguished between ADHD presentations, most of the participants were diagnosed with the combined or inattentive presentations. About a third of the studies mentioned race and ethnicity demographics, with seven studies where White participants made up half or more of the sample,21, 162, 170, 263, 462, 607 one study where all of the participants were Asian,393 one study where over 50 percent were Black/African American,462 and one study where 83 percent of the participants were Hispanic or Latino.467
ADHD participants with co-occurring disorders were not excluded from most of the studies. Some studies used clinical samples with participants who were referred for diagnostic work-up where all children presented with attention issues or other symptoms indicative of ADHD or a different clinical diagnosis.24, 153, 162, 263, 315 One study specifically looked at distinguishing between children with ADHD, developmental dyslexia, and those who had both disorders.446 The remaining studies used samples of neurotypically developing children as controls rather than clinical samples.
ADHD participants with co-occurring disorders were not excluded from most of the studies. Some studies used clinical samples with participants who were referred for diagnostic work-up where all children presented with attention issues or other symptoms indicative of ADHD or a different clinical diagnosis24, 153, 162, 263, 315 One study specifically looked at distinguishing between children with ADHD, developmental dyslexia, and those who had both disorders.446 The remaining studies used samples of neurotypically developing children as controls rather than clinical samples.
Studies described a wide range of test batteries, but over 50 studies used continuous performance testing (CPT) to diagnose children and adolescents. CPTs provide multiple behavioral outputs relevant to ADHD, including omission errors (reflecting inattention), commission errors (reflecting impulsivity), and reaction time standard deviation (or reflecting moment-to-moment response variability). Studies varied in their use of traditional visual CPTs, such as the TOVA (Test of Variables of Attention), or more novel, multifaceted CPT approaches. These latter “hybrid” CPT paradigms included CPTs that combined auditory and visual attentional processing demands together in the same task, those that monitored physical movements during task administration, and virtual reality CPTs built upon environments designed to emulate real-world distractibility in a classroom setting. The included studies used idiosyncratic combinations of individual cognitive measures to achieve the best performance. However, multiple studies reported on attention and impulsivity measures included in the continuous performance tests.
Studies reported a variety of statistical parameters to determine the accuracy of the diagnostic approach. Sensitivity, specificity, and accuracy were the most frequently reported diagnostic measures. Table 10 shows the findings for the outcomes of interest together with the number of studies and study identifiers for all key outcomes. Where we found more than one study reporting on the same test or test component, the table also summarizes the performance for those, specifically.
Studies evaluating neuropsychological tests reported predominantly on sensitivity and specificity. Although selected studies reported perfect diagnostic performance for neuropsychological tests,152 those studies reported the diagnostic performance for composite measures (unique and study-specific combinations of individual cognitive measures), making it difficult to compare test performance across studies. The wide range in performance was narrower in studies restricting to children seven and above. Reliability measures were rarely reported in the identified studies. No study addressed the effects of misdiagnosis. Costs were reported in only one study. We downgraded the strength of evidence for imprecision (large variation in performance reported across studies). For some outcome measures, no study was identified, and it was not possible to determine the effects associated with the test.
4.4. KQ1a. What is the comparative diagnostic accuracy of approaches that can be used in the primary care practice setting or by specialists to diagnose ADHD among individuals younger than 7 years of age?
We identified only 12 studies that reported exclusively on children younger than seven years of age.162, 167, 189, 316, 331, 412, 416, 437, 467, 516, 519, 559 The earliest identified study was published in 2002559 and data came from the United States, Portugal, Spain, The Netherlands, Germany, Taiwan, and New Zealand. The percent female ranged from none to 41 percent, where reported, and the proportion of Caucasian children ranged from 54 to 90 percent. We identified three studies that explicitly reported on diagnostic performance data collected in primary care.162, 445, 605 Several studies used clinic populations of children referred for diagnostic purposes and children often presented with multiple co-occurring disorders.
Studies evaluated parent ratings, teacher ratings, combined ratings, activity, EEG, imaging, and neuropsychological tests. Studies reported a variety of outcomes, with sensitivity and specificity being the most frequently reported outcomes. Sensitivity achieved in this age group reached up to 97 percent in a study evaluating the use of activity ratings,416 while a study evaluated a continuous performance tests showed the lowest sensitivity (42%).189 Reported specificity was 91 percent in a study using parental ratings to diagnose ADHD331, but EEG data achieved only a specificity of 38 percent.189 Few of these diagnostic studies reported reliability measures. The results across studies for the key outcomes are shown in the summary of findings table at the end of the chapter, all other measures (where reported) are shown in the evidence table in the appendix. We did not identify any study reporting on the adverse effect following a misdiagnosis (not being diagnosed or being incorrectly diagnosed) in this age group. In addition, none of the diagnostic studies mentioned costs of tests in this subsample.
The summary of findings table at the end of this chapter shows the diagnostic performance in this young age group in more detail. The table summarizes the limited available evidence across identified studies, together with the strength of evidence. Strength of evidence was either low due to the limited evidence, or insufficient due to the lack of studies in this age group reporting on the outcomes of interest.
4.5. KQ1b. What is the comparative diagnostic accuracy of EEG, imaging, or approaches assessing executive function that can be used in the primary care practice setting or by specialists to diagnose ADHD among individuals aged 7 through 17?
We identified 61 studies that reported exclusively on children aged seven and older. The earliest identified study was published in 1989. Data came from 23 different countries, most frequently U.S. and Chinese studies. Six studies restricted to boys, but one study included 75 percent girls.446 The proportion of White children ranged from 44464 to 100112 percent. The proportion of Hispanic or Latino children ranged from one607 to 20400 percent. The proportion of Black or African American children ranged from five359 to 34607 percent. The proportion of Asian children ranged from one570 to 100641 percent. The proportion of multiracial youth (where reported) ranged from eight400 to 20464 percent.
Studies evaluated parent ratings, teacher ratings, combined ratings, teen/child self-report, continuous performance, executive functioning, activity, EEG, MRI imaging, and neuropsychological tests. Studies reported a variety of outcomes, with sensitivity and specificity being the most frequently reported outcomes. Few of these diagnostic studies reported reliability measures. We did not identify any study reporting on the adverse effect following a misdiagnosis (not being diagnosed or incorrectly diagnosed) in this age group. In addition, none of the diagnostic studies mentioned costs of tests in this subsample. The results across studies for the key outcomes and interventions are shown in the summary of findings table at the end of the chapter, all other measures (where reported) and results for other interventions evaluated in this age group are shown in the Appendix C, Table C.1.
4.5.1. Diagnostic Accuracy of EEG in Youth Aged 7 Through 17
We identified 16 studies that used EEG to diagnose youth.111, 120, 172, 245, 312, 351, 370, 394, 397, 408, 438, 449, 465, 494, 546, 641 The first study meeting eligibility criteria was published in 2003.111, 120, 172, 245, 312, 351, 370, 394, 397, 408, 438, 449, 465, 494, 546, 641 Study locations included 11 different countries, with several studies being conducted in China351, 394, 408, 641 and Iran245, 438, 494 The proportion of included girls ranged from none111, 449 to 56 percent394 Race and ethnicity was rarely reported, one study included 100% Asian youth.351 The ADHD presentation was often not reported but where reported, but two studies reported two thirds of children with combined presentation312, 465 and one study restricted to inattentive ADHD351 Studies did usually not exclude children with comorbidities but only one study specifically assessed the effect of ODD (oppositional defiant disorder) co-morbidity on diagnostic accuracy.370
Reported sensitivity, specificity, accuracy and AUC values ranged widely across studies as documented in the summary of findings table. Studies varied in how much detail they provided on the parameters that contributed to the diagnostic performance, which in combination with the wide range of reported diagnostic performance resulted in low strength of evidence statement for these outcomes of interest.
Studies did not report on rater agreement between EEG readers, internal consistency of measurements, or test-retest reliability. Identified studies also did not describe the impact of misdiagnosis and they did not mention costs. Hence, the evidence was determined to be insufficient for these outcomes of interest.
4.5.2. Diagnostic Accuracy of Imaging in Youth Aged 7 Through 17
We identified eight studies that used imagining for diagnosing in this age group, all evaluated the use of MRI.191, 282, 400, 464, 495, 518, 571, 581 The first studies meeting eligibility criteria published data in 2018191, 571 Study locations were the United States and China. The proportion of included girls ranged from 14571 to 45282 percent. Race and ethnicity was rarely reported, but in studies that provided a participant breakdown, the proportion of White children was 44 and 55 percent, Hispanic 19 and 20 percent, Black six and 14 percent, and Asian two and six percent in two U.S. studies.400, 464 Several studies stated that youth with all ADHD presentations were included. Studies typically did not exclude youth with other comorbidities, but only one study assessed the effect of autism on the diagnostic accuracy.518
The reported sensitivity, specificity, accuracy, and AUC values varied widely across studies. Given the wide range of reported diagnostic accuracy measures in this age group, strength of evidence was judged to be low regarding successfully diagnosing ADHD with imaging data. Rater agreement for human imaging readers, internal consistency, test-retest reliability, impact of misdiagnosis, and costs were not described. The strength of evidence was insufficient for evidence statements for these outcomes of interest.
4.5.3. Diagnostic Accuracy of Executive Function in Youth Aged 7 Through 17
While a number of studies evaluated neuropsychological tests in this age group, not all emphasized utilizing executive function characteristics for the diagnosis of ADHD. We identified 14 studies with an emphasis on executive function assessment.119, 153, 159, 213, 284, 351, 352, 379, 446, 465, 541, 607, 614, 625 The earliest study was published in 1989.159 Evaluations were conducted in six countries, with the United States being the most frequent country.159, 213, 607, 625 The reported proportion of girls ranged from none352, 614 to 74 percent446 across studies. Race and ethnicity was rarely reported, but several identified studies included only or predominantly White youth.112, 213, 607, 625 Several studies restricted to or predominantly included youth with combined ADHD presentation,119, 253, 352, 625 Studies typically did not exclude youth with comorbidities but none of the samples assessed the effect of a specific comorbidity on the diagnostic performance of the executive function test.
Sensitivity, specificity, accuracy, and AUC values ranged widely within and across the identified studies as documented in the summary of findings table. None of the identified studies assessed the performance of the same diagnostic test, and most of the studies described unique combinations of test components that were used to diagnose ADHD. All identified studies are documented in detail in the appendix. We determined the strength of evidence to be low for diagnostic outcomes of interest.
Studies did not report on rater agreement or internal consistency of the test components, but one study reported on temporal stability. The study reported correlations between tests on two occasions of 0.81 (p<0.05) for the total test score in a Tower of London–- Drexel task (assessing total move and rule violation scores), 0.79 (p<0.05) for total time violations, and 0.42 (p<0.005) for total rule violations.213 Studies did not report on the impact associated with a misdiagnosis or costs of the tests. Given the lack of studies or our inability to judge consistency reported in results across studies, we determined the strength of evidence to be insufficient.
4.6. KQ1c. For both populations, how does the comparative diagnostic accuracy of these approaches vary by clinical setting, including primary care or specialty clinic, or patient subgroup, including age, sex, or other risk factors associated with ADHD?
We did not identify studies comparing the accuracy in different settings in direct, head-to-head comparisons. Hence, we had to address this KQ in indirect analyses across studies. Our analyses were further limited by studies providing insufficient details on the accuracy of performance (e.g., reporting clearly on the false positives and false negatives) and could not be based on a meta-analytic model. Instead, we used the reported summary performance measures as reported by the study authors to explore potential effect modifiers. The most common reported diagnostic performance measures were sensitivity and specificity and most analyses were only possible for these outcomes.
Figure 9 plots reported sensitivity by setting.
The figure plots the sensitivity in different settings that are included in the dataset. It also shows the range within and across settings. Comparing the reported sensitivities, a simple regression analysis indicated that setting is associated with reported sensitivity (p 0.03). However, the result should be interpreted with caution, as it does not take study size or quality into account, and it was not established within a meta-analytic model. The corresponding reported specificities are shown in Figure 10.
Reported specificity values ranged considerably, within as well as across settings. Comparing the reported specificities, a simple regression analysis did not indicate that setting is systematically associated with reported specificity (p 0.70). However, the result should be interpreted with caution, as it does not take study size or quality into account, and it was not established within a meta-analytic model. The equivalent analyses for reported accuracy (p 0.006) indicated that the reported estimate is statistically significantly associated with setting. The analysis for AUC was not significant (p 0.28).
We also evaluated whether the studies in clinical samples (i.e., referred for a clinical diagnosis, oppositional defiance disorder, or autism) and those with primarily neurotypical developing children reported different diagnostic performance values. The figure plots the sensitivity results for the two participant populations (Figure 11).
Across studies, analyses detected a statistically significant difference in reported sensitivity results depending on whether a study reported on a clinical sample or children were compared to neurotypically developing children (p 0.04). On average, the sensitivity was lower in clinical samples compared to studies differentiating youth with ADHD from neurotypically developing youth (mean 75, SD 18 vs mean 81, SD 15). However, the analysis should be interpreted with caution, as it does not use a meta-analytic model for the analysis and uses reported sensitivity values as reported by the original authors.
Figure 12 plots the specificity stratified by population.
The analysis indicated that the reported specificity was associated with the population that was used to establish diagnostic accuracy (p<0.001). On average, clinical samples reported lower specificities than studies in neurotypical samples (mean 68, SD 24 vs mean 83, SD 14). The result suggests that the clinical population appears to be a source of heterogeneity seen in the studies. However, the result should be interpreted with caution as the data were not analyzed in a meta-analytical model and used the diagnostic performance data as reported by the original authors.
Figure 13 plots the AUC values reported in included studies stratified by clinical versus neurotypical samples.
The analyses also detected a statistically significant difference in the reported accuracy based on the population included in the evaluation sample (p<0.001). On average, the reported accuracy was lower in clinical samples than in studies that differentiated youth with ADHD from neurotypically development youth (mean 0.76, SD 0.13 versus mean 0.88, SD 0.09). However, the analysis should be interpreted with caution as it is not based on a meta-analytic model, and the number of included datapoints is smaller than for sensitivity and specificity. There were insufficient data available for analyses of other outcomes.
We further aimed to investigate whether the age of the participants is associated with the achieved diagnostic performance. Most studies included a range of ages, but studies differed in whether they included young children. Figure 14 plots sensitivity by minimum age in the sample.
Across studies, we did not detect a statistically significant linear association between samples including younger children versus not on reported sensitivity (p 0.54). However, it should be noted that the number of studies that included smaller children was low and thus hindered statistical power to detect differences and this is an indirect comparison across studies that also does not take study size into account and hence should be interpreted with caution. The equivalent figure for the specificity is shown in Figure 15.
Across studies, we did not detect a statistically significant linear association between samples including younger children or not on reported specificity (p 0.37). However, this analysis is an indirect analysis across studies which is also not based on the meta-analytic model and should therefore be interpreted with caution. We also categorized studies as younger versus older children. Using a dichotomous indicator differentiating between young (under 7) and older children (7 and over) also did not indicate a systematic effect for sensitivity (p 0.98), specificity (p 0.35), accuracy (p 0.09), or AUC (p 0.28).
We also analyzed the gender distribution in the identified studies, as the reported accuracy of a diagnosis may be associated with the gender of the participants. Figure 16 plots the percent female participants, the sensitivity, and specificity.
Across samples, the proportion of girls was not associated with reported sensitivity (p 0.63) or specificity (p 0.80). Analysis for reported accuracy also did not detect an effect (p 0.34) nor did an analysis of the reported AUCs (p 0.90) and there were insufficient data for further analyses. However, the number of female participants was small across studies, which lowers the statistical power to detect an effect.
There were insufficient numbers of studies to evaluate any other risk factors or participant variables on the diagnostic outcomes of interest.
4.7. KQ1d. What are the adverse effects associated with being labeled correctly or incorrectly as having ADHD?
Identified studies did not address consequence for patients correctly or not correctly receiving a diagnosis of ADHD or adverse effects associated with being labeled correctly or incorrectly as having ADHD. One study highlighted that a missed diagnosis has implications for accessing funding in the Australian healthcare system (e.g., national Disability Insurance Scheme) but provided no further empirical data.447 None of the included studies reported on stigma associated with being diagnosed or labeled with ADHD.
4.8. Summary of Findings. KQ1a–d
Table 11 provides a very broad overview of the identified research. Results of the individual studies are shown in the evidence table in Appendix C, Table C.1.
As documented in the summary of findings table, tests to diagnose ADHD were very diverse, and studies reported a large range of diagnostic and psychometric performance. Strength of evidence assessments for this group were low or insufficient for all outcomes. We downgraded results for study limitation (lack of details on the selected tests, employed machine learning algorithm used to select variables, and lack of details on the exact variables included in the final model contributing to the effect estimate), imprecision (large variation in reported diagnostic performance across studies), and/or lack of replication in more than one study assessing the same test (i.e., consistency could not be assessed). Few studies were available to diagnose ADHD in young children. More studies were available for the older children; however, studies did not report on all outcomes of interest. We downgraded the strength of evidence for study limitations where the evidence base consisted primarily of studies that provided insufficient detail on the diagnostic strategy (e.g., which cut offs, which variables exactly entered models). We downgraded for imprecision where studies reported a large range of possible diagnostic performance. The strength of evidence for other outcomes was downgraded for the domain inconsistency because consistency could not be assessed as no replication of the document effect has been identified.
Effect modifier analyses were hindered by the lack of reported detail needed to assess effects in meta-regressions. Indirect analyses using simple regression indicated that the diagnostic setting may influence diagnostic accuracy estimates. Further analyses assessing study population characteristics (e.g., whether the comparison is to neurotypical developing or was made in clinical samples) may affect estimates. Given that both aspects (e.g., clinical samples are seen in specialty care) may be associated with key outcomes for this review, we stratified the test-specific result presentation by neurotypical or clinical sample.
We did not identify studies reporting on the impact of correctly or incorrectly labeling youth as having ADHD or the impact of an incorrect diagnosis, and the strength of evidence is insufficient to make any evidence statements.
Publication Details
Copyright
Publisher
Agency for Healthcare Research and Quality (US), Rockville (MD)
NLM Citation
Peterson BS, Trampush J, Maglione M, et al. ADHD Diagnosis and Treatment in Children and Adolescents [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2024 Mar. (Comparative Effectiveness Review, No. 267.) 4, Results: Diagnosis of ADHD.