NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
Ross SD, Allen IE, Harrison KJ, et al. Systematic Review of the Literature Regarding the Diagnosis of Sleep Apnea. Rockville (MD): Agency for Health Care Policy and Research (US); 1999 Feb. (Evidence Reports/Technology Assessments, No. 1.)
This publication is provided for historical reference only and the information may be out of date.
In general, MetaWorks investigators used systematic review methods derived from the evolving science of review research (Cook, Mulrow, and Haynes, 1997; Mulrow and Oxman, 1997; Mulrow, Cook, and Davidoff, 1997; Sacks, Berrier, Reitman, et al., 1987). These methods were generally applied according to standard operating procedures at MetaWorks and are given in Flow Diagram 1.
Specific objectives were to establish the evidence base relevant to answering the following key questions in the diagnosis of SA: 1) what diagnostic and screening tests are presently available? 2) what is the strength of the evidence in support of each? 3) what is the predictive value of these tests in different populations? (which requires estimating the prevalence of SA in different populations) 4) what are the implications of certain PSG results in terms of serious clinical events occurring as comorbidities in association with a diagnosis of SA?
A general "causal pathway" depicting the sequence of test categories clinicians consider when evaluating patients with suspected SA is provided in Flow Diagram 2. The intrusiveness and difficulty of the possible tests generally increase as one moves through the diagnostic test pathway. The objective in this systematic review was to assess the evidence in support of test choices at each level, to see if there is sufficient evidence to use each test, or to stop testing and move straight to a PSG, or to stop and move directly to a definite positive or negative SA diagnosis before reaching the standard PSG.
MetaWorks investigators did not intend to review technical considerations of various tests and devices, which are beyond the scope of this project. Readers are referred to the American Sleep Disorders Association (ASDA) 1994 statement on portable devices for discussion of technical issues related to data acquisition, storage, retrieval, and analysis.
The review followed a prospective protocol that was developed a priori and shared with the nominating partners on the project (BC/BS of Massachusetts and the Sleep Disorders Centre of Metropolitan Toronto); a technical experts panel (TEP) (with representation from consumer groups and professional specialties: neurology, pulmonology, dentistry, otolaryngology, epidemiology, and nursing); and the Task Order Officers at the AHCPR. The protocol outlined the methods to be used for the literature search, study eligibility criteria, data elements for extraction, and methodological strategies to minimize bias and maximize precision during the process of data collection, extraction, and synthesis.
Literature Search
The published literature was searched from 1980 to present. The search cutoff date was November 1, 1997, and the retrieval cutoff date was January 30,1998. The search started with a broad Medline search using the terms "sleep apnea syndrome" and "monitoring, physiologic," "sleep apnea syndrome" and "airway resistance," and "human." Also, MetaWorks investigators searched "sleep apnea syndromes," "sleep apnea syndrome" and "index." In addition, the 1997 Current Contents® CD-ROM was searched ("sleep apnea") to the same cutoff date. All citations and abstracts were printed and screened at MetaWorks for any mention of diagnostic tests in adults with SA (Level I screening). Diagnostic studies which reported prevalence and clinical co-morbidities were also accepted. Abstracts were rejected at Level I screening for the following reasons: 1) treatment papers; 2) peripheral topics; 3) reviews; 4) case studies; 5) special populations of patients (e.g., patients with neuromuscular diseases or cerebral malformations, congenital or acquired structural abnormalities of the head or neck). All studies passing Level I screening were retrieved for second screening (Level II) applying the following eligibility criteria:
Inclusion Criteria
- Any diagnostic test or intervention to establish or support a diagnosis of SA.
- Inclusion of adult patients with any form of SA (obstructive, central, mixed, or not specified).
- At least 10 patients as total sample size.
- Studies reported in the following Western European languages: English, German, French, Spanish, or Italian.
Exclusion Criteria
- Reviews and meta-analyses, letters, case reports.
- Studies in children.
- Studies where diagnostic test results for patients with other potentially confounding diseases cannot be separated from SA patients' results (outcomes not extractable).
- Studies in languages besides those listed above.
The electronic searches noted above were supplemented by a thorough search of the reference lists of all eligible studies and relevant review articles. Relevant Internet sites posted by medical specialty societies and patient advocacy groups were contacted for identification of any additional pertinent information about current recommendations or guidelines for assessment of disease status in patients suspected of SA. These sites included Quietsleep; National Heart, Lung, and Blood Institute; American Sleep Apnea Association; American Sleep Disorders Association; Sleep Pages of the Brain Information Service on the Internet; The School of Sleep Medicine, Inc.; Neurology Forums at the Massachusetts General Hospital; Sleep Apnea Society of Alberta; Phantom Sleep Page; A.W.A.K.E. Of New York, Illinois, and Pennsylvania; The Sleep Well at Stanford University; American Academy of Neurology (AAN); Sleep Disorders Centre of Metropolitan Toronto; and the National Center on Sleep Disorders Research. The list of eligible studies was also subsequently shared with the project TEP for review and comment.
Rating the Evidence
All eligible diagnostic studies were rated by senior investigators (2 MDs, 1 PhD) in an attempt to assess internal and external validity of each study as a diagnostic test study prior to data extraction. A customized rating instrument was used, derived from 1) the assessment guide provided by Irwig, Tosteson, Gatsonis et al., 1994 for assessing validity of studies of diagnostic tests in general; and 2) features important to SA studies in particular, as suggested by Flemons and Remmers, 1996. In general, studies which used random order assignment of tests and PSG, with full PSG results as the gold standard against which a second test was evaluated, and blinding of the readers of each test to the results of the other tests, received the highest scores. Several other features of diagnostic study design, execution, and reporting were also rated.
Possible scores ranged from 0 to 44, with higher scores suggesting higher quality of diagnostic test evidence. Papers scoring less than 16 points (i.e., falling in the lowest 20 percent of the distribution of actual scores) were dropped from further consideration for data extraction and analysis. The evidence scoring included the following features (with points assigned): reference standard (PSG) included? (10 points); study test readers blinded to clinical status? (5 points); study test readers blinded to PSG results? (5 points); and design: randomized assignment of tests? (10 points). Other items scored included: patients both and without disease? (1 point); inclusion criteria reported? (1 point); patient selection process described? (1 point); statement of where patients were recruited from? (1 point); wide spectrum of patient's SA severity? (1 point); patient characteristics described? (1 point); patients eligible but not enrolled, described? (1 point); test description OK? (1 point); test performed appropriately? (1 point); results of study test do not determine who gets PSG? (1 point); outcomes clearly defined? (1 point); a priori estimate of sample size? (1 point); intention-to-treat analysis? (1 point); and results sufficient detail as to replicate? (1 point). Refer to Appendix A for a display of the Evidence Scoring Form. Further details regarding the development and testing of this rating instrument will be published in a separate manuscript, in progress.
Data Extraction and Database Development
Each study was extracted in duplicate by investigators using data extraction forms developed and tested for this project (see Appendixes B and C). For the diagnostic studies, one extractor used a blinded copy of each study report (masked as to source of financial support, authors, and journal). The data extraction forms, completed independently by the two investigators, were then compared, and differences were resolved by consensus, referring to the information in the original report as necessary. Any differences that could not be resolved by the two reviewers who extracted the data were resolved by a third reviewer.
Key data elements sought for extraction from each diagnostic study included the following:
Study Level Characteristics
- Publication date and first author.
- Study design.
- Total number of patients enrolled (and in each study arm).
- Geographic location.
- Language of report.
- Funding source (industry vs. not).
Patient Characteristics
- At entry: confirmed SA, suspected SA or sleep disorder, normal, or other (e.g., snorers).
- Gender.
- Age.
- Actual weight or percent ideal body weight (percent IBW), or body mass index (BMI).
- Severity of SA (and evidence and thresholds for same).
- History of MI, hypertension, heart failure, stroke, chronic obstructive pulmonary disease (COPD), smoking, diabetes, alcoholism, obesity.
- Asymptomatic versus presence of key symptoms: daytime sleepiness, involuntary falling asleep, nocturnal snoring, observed apneas.
Test Characteristics and Results
Only clearly reported aggregate results were extracted from studies. Results that were only given for individual patients and results that would require extrapolations from graphs or derivations from figures or tables were not captured.
- PSG type: full monitoring (indicates all monitoring channels used) vs. partial monitoring (list each component of the test), full night vs. partial night vs. daytime.
- PSG results: apnea index or hypopnea index (number of apneic or hypopneic episodes/Hour sleep), or apnea-hypopnea index. In most cases AHI refers to the total apneas plus hypopneas during total time asleep, divided by the number of hours asleep (the RDI is the same as the AHI).
- Portable devices: test metric, thresholds for diagnosis, results, site (home vs. laboratory), and conditions (full night vs. partial night vs. daytime).
- Methods of all sleep test analyses (computer vs. manual, sleep time vs. time in bed, or test time, and definition of apnea and hypopnea episodes).
- Non-sleep tests: clinical, radiologic, laboratory, questionnaires, etc., with test metric and thresholds for diagnosis or next action, and results.
- All tests: sensitivity, specificity, positive predictive value, negative predictive value, and correlation coefficients of each test relative to PSG results.
Data were entered from the data extraction forms into Excel spreadsheets. Prior to downloading to SAS for analysis, the entire computerized dataset was 100 percent quality-checked against the consensus version of the data extraction forms. Prevalence and comorbidity information was extracted onto separate forms.
Statistical Methods
The main objective of the analysis was to evaluate the diagnostic accuracy of alternatives to full PSG for the diagnosis of SA as compared to a full PSG. For the analyses, PSG was used as the gold standard. In order to be considered for the statistical analysis, studies had to report outcomes in terms of the sensitivity and specificity (or a function of these outcomes; i.e., likelihood ratios) of the new test as compared to the results (AI, AHI, RDI) of a standard PSG. The PSG was either stated to be "full" or "standard" by the authors, or included at least the following parameters: oximetry, thoracoabdominal respiratory excursions, airflow, submental EMG, EEG, and EOG. The full PSG often also included ECG and occasionally included tibial EMG, body position, and snoring. If the sensitivity and specificity were not reported, sufficient information on the performance of the test regarding the true positive and true negative outcomes had to be reported in order to calculate sensitivity and specificity, or, in some cases, a correlation coefficient between the alternative test and the diagnosis of obstructive SA by full PSG.
Initially, weighted averages using Mantel-Haenszel fixed effects models (Fleiss, 1973) combining the comparative summary statistics, were calculated and summarized for groups based on diagnostic test category (Irwig, Tosteson, Gatsonis, et al., 1994). Study and patient-level covariates were also summarized for each diagnostic category weighted by study size. Diagnostic evidence scores of the studies were examined and summarized by diagnostic category and overall.
A summary ROC curve was calculated for each diagnostic group where sufficient data were available (Littenberg and Moses, 1993; Littenberg, Mushlin, and the Diagnostic Technology Assessment Consortium, 1992; Moses and Shapiro, 1993). The resulting curve describes how the test's performance in those with SA (sensitivity or true positive rate [TPR]) varies with its performance in those without SA (false positive rate [FPR] or 1 - specificity).
The summary plot represents each study as a single point weighted by study size, and the curve represents the overall summary of all studies. Where the studies give similar results, the curve and 95 percent confidence bound will be close to the points. Differences among the reported accuracies may be due to several factors. A stricter threshold or cutoff to declare a test positive in some studies may result in lower sensitivity or a higher threshold may produce higher specificity. There may be random variations in the performance of the test between study sites or between publications resulting in heterogeneity. There may be differences in the clinical settings in which the test is employed, or wide variability in the patient characteristics of those tested. While all those differences may lead to heterogeneity among the eligible reports, which may be an argument against estimating one summary measure of common sensitivity and specificity using fixed or random effects models, these factors can be described using the summary ROCs, which both display and summarize the heterogeneity. The impact of covariates, which contribute to heterogeneity, is assessed in the sensitivity analysis.
All calculations were performed using SAS® software Version 6.12.
Peer Review
A group of 22 peer reviewers was assembled to review the draft final report describing this project. The peer reviewers were drawn from consumer groups and professional organizations (American Sleep Disorders Association, American Sleep Apnea Association, American Academy of Neurology), the nominating partners noted above, and the AHCPR. The reviewers represented several medical specialties (anesthesiology, dentistry, neurology, nursing, otolaryngology, pulmonology) and statistical methodologists. All reviewers were asked to complete a list of questions about the format and content of the report (see Appendix D) and also to provide any text comments. All reviewer comments were shared with AHCPR. The peer reviewers' comments were reviewed; and wherever feasible and within the scope of this project, the peer reviewers' suggestions were incorporated into the final report. Comments were ultimately received from 19 of the 22 reviewers who were invited to comment.
- Methodology - Systematic Review of the Literature Regarding the Diagnosis of Sle...Methodology - Systematic Review of the Literature Regarding the Diagnosis of Sleep Apnea
Your browsing activity is empty.
Activity recording is turned off.
See more...