6Limitations and Strengths of the Evidence Base

Publication Details

There are many limitations to this evidence base. In general the diversity of designs and study objectives was high and the methodological rigor of the diagnostic studies included was so low that, contrary to our usual practice of using evidence scores only in sensitivity analyses, MetaWorks investigators decided to use the diagnostic evidence rating instrument as a filter for selecting consistent studies for data extraction. MetaWorks investigators thereby rejected studies scoring in the bottom 20 percent of the distribution of scores. Even so, the studies that remained constitute Level III to IV (Cook, Guyatt, Laupacis, et al., 1992) evidence, that is, primarily derived from case series and observational studies. There were very few diagnostic studies that employed randomized assignment of tests, and very few studies performed blinded assessments of test results, both key features of rigorous diagnostic studies.

Other limitations that apply to the whole dataset are numerous. With regard to the gold standard PSG, there was considerable variability in how PSGs were administered, i.e., which measures were considered essential components of standard PSGs. As a consequence several questions are raised: Is the "standard" PSG really a gold standard for the diagnosis of SA? Does the ability to measure sleep stage improve diagnostic accuracy? Is an entire night necessary? Proof is lacking, and the reasonably high sensitivity and specificity of partial channel PSGs and partial time PSGs only serve to reinforce this uncertainty. There was considerable inconsistency in how apnea and hypopnea were defined, let alone what metric (AI or AHI) and what threshold (>5, 10, 15, 20, 30 per hour) was used to diagnose SA. There was inconsistency in the incorporation of clinical signs and symptoms with PSG results in diagnosing SA. Distinctions between types of SA were usually not made. Night to night reproducibility of the gold standard is still not well documented, and may also be different using different diagnostic thresholds. Few studies included normals, to achieve a broad spectrum of test subjects. In fact, there is still debate over what may be the frequency distribution of apnea during sleep in the general population. Is any amount of apnea ever normal? Lastly, authors appear to often be conflicted about the best screening approach: some seek tests to rule in the diagnosis, and others to rule it out. It was often unclear whether the intended use of the test was in high-risk populations, or low-risk general populations.

For the portable device studies, many of the issues noted above regarding PSG also apply. In addition, device features such as equipment failure rates, night to night reproducibility, price, compliance, and safety are rarely reported. Differences in method of analysis of data recordings were often not tested or appreciated (visual vs. automated). Most importantly, few portable devices intended for unattended use at home have been validated under those conditions of use. Portable studies are also typically not based upon sleep time, as are the standard PSGs to which they are compared. Therefore, the question remains as to the necessity of basing AI or AHI upon sleep time, and whether commonly used surrogates of sleep in these studies, such as body movement, are valid.

In the studies which report high frequencies of comorbidities, a causal association has not been shown. Much has been presumed on the basis of the physiologic observations of repeated hypoxemia contributing to systemic and pulmonary hypertension, and coronary and cerebrovascular insufficiency. However, studies reporting actual clinical consequences of certain AIs, with or without treatment, are not well represented in this literature base. Perhaps a review of treatment studies would yield more useful information in this regard. Also, the ongoing Sleep Heart Study (A. Pack, personal communication) may eventually clarify whether SA is an independent risk factor for cardiovascular events.

Estimates of prevalence of SA in general populations are weak due to the fact that these estimates are typically based upon unvalidated tests, not the gold standard PSG. The one stand-out in this set of studies is that of the ongoing Wisconsin Sleep Study Cohort (Young, Palta, Dempsey, et al., 1993).

With regard to strengths of the evidence base, there are several. This evidence base represents the best available evidence derived from relevant diagnostic literature in 5 languages. It is more comprehensive than previous reviews, and it is unlikely that any important diagnostic studies were missed. Restricting data extraction and data analysis to those studies with features most likely to yield useful diagnostic test information, specifically to those studies reporting sensitivity and specificity of diagnostic maneuvers, is consistent with study selection approaches previously employed by the ASDA Standards of Practice Committee in its 1994 statements (Ferber, Millman, Coppola, et al., 1994), in its 1997 statements (Chesson, Ferber, Fry, et al., 1997 and Polysomnography Task Force, 1997), and the Blue Cross/Blue Shield Technology Evaluation Committee 1996 assessment of portable sleep studies. Furthermore, this established database is now updatable. It serves as a valuable resource to practitioners and researchers, provided it is maintained current.

Another strength is the statistical approach. This is the first time summary ROC curves have been constructed for studies with sufficient data in SA. These curves indicate the degree of heterogeneity between the study sets, as well as the relationship between sensitivity and specificity within each study set.

Another strength lies in the fact that the review has been performed by investigators independent of the field of SA and hence is free of conflicts of interest. On the other hand, multiple stakeholders have had a voice in the project from its inception: consumers, government, insurers, sleep laboratory personnel, and clinicians have all had opportunity to contribute ideas and perspectives.