U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

ECRI Health Technology Assessment Group. Diagnosis and Treatment of Swallowing Disorders (Dysphagia) in Acute-Care Stroke Patients. Rockville (MD): Agency for Health Care Policy and Research (US); 1999 Jul. (Evidence Reports/Technology Assessments, No. 8.)

  • This publication is provided for historical reference only and the information may be out of date.

This publication is provided for historical reference only and the information may be out of date.

Cover of Diagnosis and Treatment of Swallowing Disorders (Dysphagia) in Acute-Care Stroke Patients

Diagnosis and Treatment of Swallowing Disorders (Dysphagia) in Acute-Care Stroke Patients.

Show details

5Future Research

In this section, we first discuss particular shortcomings of study design and research in the available literature; we then focus on the most important areas needing research and discuss the design of a trial that would answer the above questions.

Shortcomings of Available Research

Patient Selection

As mentioned above, most patients included in the studies thus far have been stroke patients; some researchers have examined exclusively stroke patients, while others have included a patient mix of many neurologic etiologies. Because stroke is a recuperative neurologic disorder, while many others that affect the elderly are degenerative, it creates problems to pool the outcomes data obtained from these patients. We suggest that in the future, researchers make sure to include patients with dysphagia of only one etiology in each study.

The lack of research on dysphagia in patients other than stroke victims ignores the considerable burden of illness resulting from these other diseases. More research on these latter kinds of patients is needed. It must be remembered that treatment recommendations made for a patient with Parkinson's disease, for example, on the basis of how a stroke patient with similar symptoms responded to that treatment is not necessarily appropriate.

Patients should all be at the same stage of disease, or plans should be made before a study is begun, to stratify patients by disease stage. The ambiguity caused by failure to use a homogeneous patient group or to stratify is illustrated by studies reporting that they examined stroke patients a mean of 15 days post-stroke. This could mean that some patients were seen immediately after the stroke while others were seen 2 months post-stroke. Symptoms may be different at these different time periods, making results difficult to interpret. The same holds true for other patients with neurologic disease: in neurodegenerative diseases, swallowing difficulties may be more pronounced as the disease progresses. Therefore, the efficacy of diagnosis and treatment should be examined by severity of disease.

Control Groups

Issues of study design are especially important with stroke patients, because they often regain their swallowing ability spontaneously as they recover after the acute stroke episode. Thus, in case series following the treatment of a group of stroke patients after the event, there is a high potential for time confounds; this makes it extraordinarily difficult to interpret results of such studies because they do not conclusively show what caused the recovery of these patients. Similarly, in degenerative neurological disease, swallowing function could deteriorate spontaneously, thus masking the effectiveness of treatment. Only controlled studies can resolve these problems.

Also, only controlled studies can determine whether a particular treatment is best for any given patient population. Although we have argued in the Methods section of this report that comparing the efficacy of different treatments is not as interesting as comparing the efficacy of different diagnosis and treatment programs, we also recognize that refinements in treatment strategies will most likely come about as a result of controlled trials comparing treatment efficacy. In some situations (e.g., patients with advanced Alzheimer's disease), it may even be desirable to determine whether treatment is more effective than doing nothing at all.

In other situations, it would not be ethical to include a control group that receives no treatment if it has already been found that patient condition improves with treatment, and there is some limited evidence in the literature that this is the case. In the case of diet modification, there is enough evidence of treatment effect in some patients that employing an untreated control group would not be ethical. In cases where it is unethical to include an untreated control group, the control group would have to be either a group receiving a different level of treatment or a historical control. Historical controls have been used in some of the treatment studies evaluating the implementation of a dysphagia treatment program. There are, however, limitations to this type of control group, as there is no way to ensure patient or management equivalency, and patients are not chosen in the same way (consecutive versus record review). However, a historical control within the same institution is preferable to controls from a different institution. One study has been conducted comparing different levels of intervention (DePippo, Holas, and Reding, 1994 ); in such a study design with randomization, it could be ensured that patient characteristics were similar in each trial arm, and therefore not affecting results. We therefore suggest that all future research evaluating treatment efficacy include at least two randomized treatment arms in each study so that even if absolute efficacy cannot be determined, relative efficacy can.

We also recognize that refinements in diagnosis will come about primarily as a result of trials that compare the efficacy of different diagnostic modalities. Designing such studies at the current time is problematic. Determining the performance of a diagnostic method (sensitivity and specificity) requires comparison to a gold standard diagnostic that identifies every positive and negative case of disease correctly. It has not been satisfactorily demonstrated that any of the available tools are, in fact, gold standards. Further, determining diagnostic test performance requires that false-positive rates be measured. This, in turn, means that some patients who are diagnosed as having a swallowing disorder should not receive treatment. Therefore, the only ethical kind of studies comparing diagnostic test performance would have to be conducted on a patient population known not to benefit from treatment. Because disease in such patients may be more severe, it is not clear that sensitivities and specificities from such studies would be generalizable to other patient populations.

Partly for these reasons, determining the percent agreement between a diagnostic test and the modified barium swallow (MBS)is not an appropriate measure by which to gauge the efficacy of the first test. It is, after all, possible for the results of two relatively poor tests to perfectly agree with each other, and high correlations between the results of two tests do not mean that the sensitivities and specificities of the two tests are similar. This latter point can be illustrated by a hypothetical experiment in which two tests are given to ten patients. Both tests yield one false positive and four true negatives, and therefore have a specificity of 80 percent. However, one of these tests yields four true positives and one false negative, while the other yields three true positives and two false negatives. The correlation between these tests (0.9) is relatively high, but the sensitivity of the former test in 80 percent and that of the latter 60 percent, a difference that could be clinically important in spite of the apparently "good" correlation between the two tests.

Outcomes

Meaningful outcomes to consider in any study are those outcomes that measure aspects of health that are important to the patient. However, it is also important to clearly report these outcomes. It has been reported that aspiration leads to increased risk of aspiration pneumonia, and incidence of pneumonia has been the most commonly reported long-term outcome in any of the studies reviewed in this report. However, it was uncommon to find aspiration pneumonia distinguished from general pneumonia when reported as an outcome in the dysphagia literature. It would be interesting to see studies explore the changing rate of pneumonia over time after an acute stroke. With this information, studies following patients for different lengths of followup can be compared using meta-analysis, decision analysis, or other mathematical calculations. Malnutrition, quality of life (QOL), and disease-specific mortality are other potentially important outcomes of interest that have been seldom reported.

The relationship of malnutrition to dysphagia has not been definitively determined, and this relationship needs to be explored. If it does turn out that malnutrition is a serious health concern resulting from dysphagia, then this endpoint should be reported as an outcomes measure in evaluation of dysphagia diagnostic and treatment technologies. Thus far, only two studies have been conducted, both on nursing home patients (Keller, 1993; Keller, 1995; Thomas, Verdery, Gardner et al., 1991); this research should be expanded to include patients in other care settings with different severity of disease.

QOL is an important, often neglected, measure that should be more seriously considered by researchers. No diagnostic or treatment program is worth doing if it results in a patient who is just as physically, socially, or psychologically impaired as before the program was undertaken. QOL is a subjective measurement that can be made in numerous different ways, and no one measurement method has been judged to be superior, thus making this endpoint difficult to address. However, it should not be ignored.

Many studies have been conducted with such a short followup period that mortality is not addressed. In those cases where mortality has been addressed, overall mortality has generally been the only measure. It would be interesting to know the causes of mortality in dysphagia patients; obviously, death from pneumonia is a serious consideration. Another question of interest may be whether a general weakening of the patient's system increases the risk of dying from other causes. If the link between dysphagia and malnutrition is substantiated, then this is an important concern as malnutrition in the elderly has been found to weaken the immune system (Chandra, 1989, 1990). These possible links need to be explored and reported.

Other short-term outcomes of interest have seldom been reported specific to particular treatments. In the field of noninvasive swallow therapy using different postural techniques and exercises, it would be interesting to see more on changes in nutritional intake (as measured by kilocalories) or weight changes. In the field of diet modification, short-term outcomes seldom reported include the ability to maintain the recommended diet safely, nutritional intake, and weight changes. Elimination of aspiration is a specific measure of interest, which should be evaluated over an extended duration of treatment; many studies have measured it during a barium swallow as the treatment is initially being tested, but few have followed this outcome for these patients after they have attempted to use the treatment independently.

Investigators should attempt to identify particular symptoms that could be used to determine an appropriate treatment plan. It is not clear whether one set of signs and symptoms would serve all patients, or whether different signs and symptoms should be used for patients with different primary diagnoses. It is also not clear that the same set of signs and symptoms should be acted upon in the same way for patients at different stages in their disease. For example, a given set of signs and symptoms in patients with early-stage Alzheimer's disease might lead to one course of action whereas the same set of signs and symptoms in a patient with advanced Alzheimer's disease may lead to another course of action, if any action at all is taken.

With such measures, studies could be conducted over a short period of time and still be able to report meaningful treatment response measures.

Followup

To compare the outcomes of any two groups of patients, they need to have been followed for the same length of time after the onset of disease. This is because the risks of morbidity and mortality resulting directly from that disease will change as time progresses and the total incidence of these endpoints will change. Similarly, the number of patients experiencing morbidity and mortality will accumulate over time. Therefore, the outcome of a patient followed for 4 weeks after stroke cannot be compared with a patient followed for 8 weeks.

Unfortunately, most studies on dysphagia management published to date have followed patients for a mean length of followup; in acute care, this is often simply the length of stay in the hospital, which is different for each patient. However, even in other care settings, followup time has not been standardized.

Any meaningful analysis comparing study results without such standardization is impossible. It clearly is very difficult for a researcher to follow patients once they have left the inpatient care setting, and obvious confounds arise when a care setting changes. Analysis would then need to take into account care setting, and perhaps compare the outcomes of patients released to the community versus those sent to continuing care (either in rehabilitation or a nursing home).

Specific Areas of Recommendation

The current literature contains several gaps on management of dysphagia. We suggest additional research is needed in the specific areas discussed below.

Clinical Signs and Symptoms

Individual signs and symptoms for prediction of pneumonia during noninstrumented exam have been found to be unreliable; all tested symptoms have been found to have either a very low sensitivity or a low specificity. However, researchers have not examined the co-occurrence of multiple symptoms to find an algorithm that may successfully predict pneumonia. Such research is suggested for the future in order to isolate those patients who would be best served by undergoing more extensive, instrumented diagnosis and treatment.

Some individual symptoms, in particular dysphonia, have been found predictive of aspiration. As with pneumonia, however, co-occurrences of several symptoms have not been fully explored, and such research may produce a method of noninstrumented diagnosis that is accurate enough to selectively choose the appropriate patients at risk to undergo further testing.

Diagnosis

There is currently no conclusive evidence about the superiority of instrumented diagnostic tools over noninstrumented ones. It has been assumed by clinicians that imaging technology such as videofluoroscopy or fiberoptic endoscopy must be superior to a noninvasive BSE because of additional information provided. However, published research has yet to prove this superiority. Researchers must document what information provided by instrumentation but not by BSE results in better outcomes for patients.

It would also be interesting to see if there are any symptoms on BSE that are reliably predictive of particular physiological dysfunction. As there is currently some limited information on the relationship of particular physiological abnormalities and appropriate treatment, such BSE predictive value would make it a better diagnostic tool in isolation.

Treatments

Most treatment trials on noninvasive swallow therapy have thus far examined results of the program overall, rather than broken out by specific techniques. For example, a study may report that patients underwent strengthening exercises or postural techniques, and then report outcomes for all these patients together. It would be interesting to see long-term results after individual techniques, such as the chin tuck or Mendelsohn Maneuver. (Such results have been reported for the palatal training appliance and tactile-thermal stimulation.) Such studies, as discussed above, would have to be constructed with a control or comparison group of some sort, either a historical control or comparisons among different intensities of treatment; comparison of different therapies is inappropriate unless patients with identical indications could appropriately be referred for completely different treatments.

Clinical studies examining outcomes with the use of feeding tubes have demonstrated a lot of problems. The most obvious problem has been that the overwhelming majority of these studies did not isolate patients with dysphagia from patients put on a tube for other causes. Thus, the results of those with dysphagia are mixed with those suffering from dementia or paralysis that makes self-feeding difficult. It is then impossible to determine what the effects of tube feeding are for dysphagia patients specifically. Only two studies have been identified that examined dysphagia patients specifically [Norton, Homer-Ward, Donnelly et al., 1996; Spiegel, Creed, Selber et al., unpub.(a)]; one reported mortality and the other weight change. More of this type of research is recommended.

As with the other treatment literature, the feeding tube literature suffers from inconsistent followup times. The literature would benefit if results were reported using survival curves (e.g., Kaplan-Meier curves) so that the percentage of mortality, morbidities, and complications were standardized on the basis of the number of patients actually followed for the specific time intervals. We also suggest that the rates of all possible feeding tube complications be reported, even if none occurred, because currently these complications are inconsistently reported and it is unclear when they are not reported whether none occurred or whether the researchers did not consider them important.

Clinical Trial Suggestion

This section contains our suggestions for a multicenter, randomized trial that would provide useful information for relative efficacy of different dysphagia management algorithms. We do not suggest that this trial would answer all questions that are of interest. In particular, it will not determine the sensitivities and specificities of the various diagnostic methodologies. We propose not to determine these measures of test performance because their determination would require determining the number of patients who receive false-positive diagnoses. This could only be done by denying treatment to certain patients who receive a positive diagnosis, a maneuver that would be unethical. Further, knowing these performances measures is not required for determining which diagnostic methodology provides greatest benefit to patients. It is more important to know which leads to the most favorable patient outcomes. In the lexicon of the evidence hierarchy we described in the Methods section, we will here describe a trial that provides Level 5 evidence. This trial compares patient outcomes after different diagnostic methodologies are used, to determine if any specific diagnostic results in a significantly better outcome than any other diagnostic. In our second supplemental analysis, we will describe additional information that could be collected to render this a trial that provides additional information about Level 6 evidence (cost-effectiveness), the highest level of evidence possible.

We also do not suggest that this trial is the perfect trial in all aspects. In particular, practitioners and patients would not be blinded to most (but not all; see below) of the diagnostic tests that patients received, nor would they be blinded to the type of treatment given. The reason for this is simple -- such blinding is not possible (for example, treatment is often provided during the initial MBS test). Thus, the trial we describe takes clinical reality into account.

Trial Design

The trial we suggest is a multicenter, multiarm trial, with patients randomly assigned to each arm. Randomization should be accomplished by accepted means (e.g., by using a table of random numbers), and not according to patient hospital number, the day of the week on which patients arrive at the hospital, or other means not accepted by methodologists.

This trial can consist of two to four groups, depending upon the number of questions one wishes to answer. For the purposes of this description, we will describe the trial as if it contained four groups, with the understanding that the number of groups can be reduced.

Patient Population

The patient population should be as homogeneous as possible. Thus, only patients in a certain age range should be enrolled. This is required inasmuch as very old patients may have had more co-morbidities (or have been otherwise more debilitated) than relatively younger ones. Additional patient inclusion criteria may also be desirable to ensure that the study is conducted on a homogeneous group. Regardless of the specific criteria, however, the characteristics of the enrolled patients should be recorded to determine, after the conclusion of the trial, whether the patients in any of the four groups were different from those in other groups.

Patients should be consecutively enrolled. We recommend that the trial be limited to patients with a single disease, preferably stroke. This ensures that the trial will study the population most likely to benefit from dysphagia programs. It also makes accruing patients to the trial easier than it would be if other patients were studied. This means that the trial can be concluded more rapidly. Further, we are unable to estimate how effective dysphagia diagnosis and treatment programs are in other patient populations. Enrolling patients from these other populations could, therefore, have an impact on our statistical power calculations (see below), making it possible that many more patients than we have specified might need to be enrolled. In practice, this could yield a worst-case scenario in which the trial found no effect because it was underpowered.

Diagnosis and treatment should begin at a uniform time, and as soon as is practical. Delaying diagnosis and treatment too long would mean they would be offered to patients increasingly less likely to benefit from a dysphagia program. Because the trial should be restricted to stroke victims immediately after the stroke event, only acute-care centers should participate.

Diagnostic Methods

All patients in each arm undergo the dysphagia diagnostic method to which they are randomized, regardless of whether they are exhibiting clinical symptoms of dysphagia. In the first arm (the control group), all patients receive diagnosis by a noninstrumented test and then treatment. Although it is commonly believed that noninstrumented tests are inferior to instrumented diagnostics, no definitive evidence of this exists and, in fact, there is some limited evidence that dysphagia programs using the BSE are effective in preventing aspiration pneumonia [Odderson, Keaton, and McKenna, 1995; Spiegel, Creed, Selber et al., unpub.(a)]. Therefore, there is no compelling evidence to suggest that inclusion of such a group would be unethical. If suspicions about such a group remain, it is important to understand that the trial we describe would be subject to the normal ethical stopping rules. Thus, accrual to this group could be immediately terminated should it be found during the trial that patients in this group fared significantly worse than did patients in another group. At the same time, we strongly recommend that this group be included, no matter how many groups the trial ultimately contains. Inclusion of this group answers a most fundamental question: Are instrumented exams more efficacious than structured formal BSEs?

Patients in the other three arms would also receive the noninstrumented test. The results of this exam, however, would not be used in patient management. In fact, physicians and other caregivers would be blinded to the results of this test (the reason for these BSEs are described below). We recommend blinding of providers to the results of these BSEs to ensure that interpretation of the results of the subsequent tests is not biased by the results of the BSE.

These patients would then randomly receive diagnosis with a single instrumented test (to be chosen by the researchers). Numerous instrumented tests are currently used in dysphagia management, and the clinical literature is currently equivocal about the superiority of any one over any other. We do not specifically recommend the inclusion of any particular instrumented tests, because to include only the widely used tests might be construed as exclusionary of new, emerging (and potentially superior) technologies; however, to recommend the newer technologies would discount the refinement and extensive development of the established technologies. We therefore defer to the clinical establishment to choose which instrumented exams to include. Treatment would be determined based upon the results of these diagnostic tests.

Because all patients in this trial receive noninstrumented diagnosis, it is important that this exam be standardized across centers. The specific components of this exam could be determined by a panel of experts (which, if desired, could include patient representatives). We recommend, however, that this exam include (but not be limited to) assessment of patients' oral hygiene habits, dysphonia, the ability to voluntarily cough (rated on a four-point scale), and the 3-ounce water test. As discussed previously in this report, limited data suggest these tests are effective at predicting aspiration. For reasons that will become apparent below, a thorough and standardized patient history should be taken as part of this test. Again, the expert panel could determine specific elements of this history.

Treatments

As with the diagnostics discussed above, treatments in different centers should also be standardized to the greatest extent possible; standardization should be accomplished by the same expert panel that standardizes the diagnostic methods. This is not meant to imply that all patients receive the same treatment; rather, patients are assigned to treatments based on symptomatology. However, the methods of determining the appropriate treatment (i.e., the symptom-based treatment choices) should be the same across the different diagnostic methods. We recognize that this is likely to be problematic. However, it is important to accomplish this standardization to reduce the variability of the trial's results (thus increasing its statistical power) and to ensure that apparent effects of diagnostics are not confounded by treatment differences.

Because this standardization should ensure that centers or individuals providing therapies are equally well trained, it is important that none of the sites (or providers) are in a startup phase of speech-language therapy, when results might be poorer than those obtained after more experience is gained.

Outcome Measures

The primary outcome measure of the trial should be pneumonia, as it is the most serious morbidity that may result from dysphagia. We recommend this as the primary outcome for several purely pragmatic reasons. First, the cost of treating pneumonia is among the greatest of the costs of treating morbidities. Second, although death rates due to dysphagia are of extreme importance, one could obtain low death rates by curing pneumonia, not by preventing it. Thus, any effect of dysphagia diagnosis and treatment programs observed in such a trial would be contaminated by the effectiveness of pneumonia treatment. Finally, there is so little information on other morbidities resulting from dysphagia, one cannot, on the basis of current data, even be assured that these other morbidities pose a major health problem. In fact, one of the purposes of this trial is to provide information about the health problem caused by these other morbidities. Thus, secondary outcomes should include, but not be limited to, measures of: (a) the number of patients placed on feeding tubes, (b) weight change, (c) body mass index, (d) serum albumin (e) dehydration (measured by the blood urea nitrogen:creatinine ratio), (f), morbidities resulting from feeding tubes, (g) dysphagia-related mortality, and (h) all-cause mortality. Data should be analyzed on an intent-to-treat basis.

The results of the BSEs can also be considered among the outcomes of this trial. These results will be used to provide two kinds of information. First, they will allow one to determine whether any signs and symptoms, and particularly any combination of signs and symptoms, predict morbidity (and thus whether the BSE can be used as a stand-alone diagnostic). Second, these results will allow one to determine, if warranted, which BSE results can be used to selectively refer appropriate patients for subsequent instrumented diagnostic tests. It may be possible to pool the BSE results from the three groups that receive subsequent diagnostics to enhance the statistical power of the analyses that will be required; such analyses will involve multivariate statistics, which are lower-power tests than univariate analyses. The fact that analysis of these data will be conducted using a form of multivariate statistics (and, more specifically, a form of multiple regression analysis) means that the results of any one trial, including the one we describe here, will not provide results completely generalizable to all settings. It is well-known that multiple regression equations conducted in one setting are less predictive when used in another (this reduction in predictive ability is termed shrinkage). It is even possible that some variables that appear to be predictive in this clinical trial will not be found predictive in later work. In other words, it is possible that the multivariate results obtained in a well-controlled clinical trial that employs a strict protocol may not be entirely generalizable to actual clinical practice, where protocols are often less strict. This does not imply that the results of these analyses are not worthwhile but it does mean that they will have to be checked, and possibly further refined based on how predictive these signs and symptoms are in actual clinical practice.

Followup

Patients should be followed for a uniform period of time, at least 1 year. This requires active monitoring of patients after they leave acute care. Following patients for an average (or mean) length of time is inappropriate. Means are distorted by outliers, and would be misleading if a few patients or any group(s) were followed for abnormally short or long periods. We recognize the difficulty in performing this type of followup, so it will most likely be prudent to offer patients remuneration to enhance compliance.

Even patients who receive negative diagnoses for swallowing disorders should be followed. This is required to ensure that results are analyzed on an intent-to-treat basis.

Statistical Power Issues

An a priori power analysis for this trial is required. The purpose of this analysis is to estimate the number of patients the trial should enroll. Our estimates of the number of patients required are shown in the following table:

Table 20. Number of Patients Required in Proposed Trial.

Table

Table 20. Number of Patients Required in Proposed Trial.

The lefthand column displays the rate of aspiration pneumonia that, for the purposes of our statistical power calculations, we assume will occur in patients whose treatment is based on only the results of the BSE (see our discussion of Question 1 in the Evidence Report and our first Supplemental Analysis for further information about this rate). In the next column from the left, we portray two hypothetical pneumonia rates that might occur in a dysphagia diagnosis and treatment program that employs an instrumented diagnostic. These hypothetical rates translate to 25 percent and 50 percent reductions in the rate of aspiration pneumonia, as shown in the third column from the left. The next column shows the number of subjects needed in each group of the trial for each of the two hypothetical reductions in pneumonia rates. The final three columns show the total number of patients the trial must contain for each of these two rate reductions for a two-, three-, and four-group trial, respectively. Thus, if a program results in a 25 percent decrease in the rate of aspiration pneumonia, 10,795 patients will be required for each group, meaning that the trial must contain 21,590 to 43,180 patients, depending on the number of groups. Similarly, for a program resulting in a 50 percent reduction in the pneumonia rate, 2,318 patients per group will be required, meaning that the trial must contain 4,636 to 9,272 patients.

For these preceding calculations, we sought a trial that would give 80 percent statistical power at an alpha level of 0.05, and assumed that 2 would be the test statistic (the reasons for this assumption are discussed below).

For this area of research, this is a relatively large trial. However, it is not large when compared with the size of other trials reported in the medical literature. The size of this trial, however, is one reason for our recommendation that this trial be a multicenter trial. Using a number of centers will make it easier to enroll this number of patients and will decrease the length of time required to complete the trial.

In constructing this table, we have chosen a 25 percent decrease in pneumonia as the lowest effect size of interest on the assumption that a smaller decrease in the aspiration pneumonia rate would be cost-prohibitive. This in fact may not be the case, so we believe it prudent to describe the calculations we used to reach this conclusion. In this way, others can both evaluate our results and use our calculations to employ their own judgment about what is and what is not cost-prohibitive. These calculations are based upon this trial's costs for preventing one case of aspiration pneumonia.

In a trial containing two groups, each patient would receive a BSE at a cost of $141. (The sources of this cost information are described in our first Supplemental Analysis.) Half of the patients in the trial would receive an instrumented exam at a cost of $218, so the average cost of instrumented exams would be $109 per trial enrollee. We also assumed that about 38 percent of all patients would receive diagnosis-directed treatment (derived from the estimate of aspiration in this population by Daniels, Brailey, Priestly et al., 1998), at a cost of $242 per treated patient, which yields an average cost per enrollee of $92. Summing the costs yields an approximate cost of $251 per enrollee. If a dysphagia program employing an instrumented diagnostic decreases the rate of aspiration pneumonia by 25 percent (i.e., from 2 to 1.5 percent), then the program will prevent one case of pneumonia in every 200 enrollees in a two-group trial. This means that the total cost of preventing one case of pneumonia in such a trial is $68,400. Similarly, in a trial with three groups, the cost to prevent one case of pneumonia is $75,600, and in a trial with four groups, the cost is $79,400. Although these numbers are admittedly crude, they far exceed the highest estimate of the cost of treating pneumonia, $19,000, in the literature (Aviv, unpub.). Hence, our assumption that a 25 percent program-related decrease in pneumonia is the smallest effect of interest is probably a liberal one.

Our estimates of the number of patients required for this trial are highly sensitive to the pneumonia rate predicted to occur on those patients whose treatment is determined solely by the BSE and to the anticipated magnitude of the effect of a dysphagia program. For example, the above table assumes that an in instrumented diagnostic will decrease pneumonia rates to 1 percent (instead of 1.5 percent), meaning that a four-group trial will require 33,908 fewer patients. Similarly, if we assume a 4 percent base pneumonia rate and, as above, a 25 percent decrease as a result of employing instrumented diagnostics, then only 21,100 patients are needed for the four-group trial, which is 22,080 fewer than if the base rate is 2 percent. This sensitivity arises because the power function is relatively steep in the area in which we are working, and not as a result of any inaccuracies in our calculations. Because of this sensitivity, it is wise to consider a pilot study to better estimate the base pneumonia rates and the anticipated difference in pneumonia rates between patients who receive only the BSE and those who receive an instrumented exam.

Our calculations of the number of patients needed for this trial assume that the statistical analysis of the data will be conducted to maximize power. This reduces the number of patients required for the study, and without analyses that maximize power, this number could become prohibitively large. The implication is that focused contrasts should be used. Taking the four-group trial as an example, a relatively low-power analysis could result if the analysis were conceived as a 2 analysis of a 2 X 4 design in which the four treatment groups are the column headings and the presence or absence of pneumonia are the row headings. On the other hand, it is possible to analyze these data using three orthogonal contrasts. These contrasts might, for example, ask whether the results of the BSE are different from those of all other tests, whether the results of the MBS exam are different from those of the two fiberoptic exams, and whether the results of the two fiberoptic exams are different from each other. By employing these contrasts, one reduces the degrees of freedom from three (for the omnibus 2 test) to one, and there is a resulting increase in statistical power. Obviously, the focused contrasts can also be used in the three-group trial and are not relevant to the one group trial, which is already analyzed on 1 degree of freedom.

Some Final Remarks

Although the trial need not contain all four groups we have described, using fewer groups should be approached cautiously. As a rule, data obtained from a single trial that directly compares the diagnostic strategies of interest are stronger that those of several trials, each of which only makes some of the comparisons of interest. This remains true even if the aggregate of all trials makes all of the desired comparisons. However, this would be feasible if the same centers took part in several, smaller studies, using the same documented diagnostic and treatment criteria.

It is also possible to add additional groups to this trial, groups that incorporate combinations of diagnostics. Similarly, some diagnostic combinations may be substituted for the groups we have outlined here.

The results of this trial can be extended to answer other questions of interest, even though it (or any other feasible clinical trial) will likely contain too few patients to yield appropriate statistical power. This can be accomplished by using the data from this trial to construct a decision analysis. Such an analysis is further discussed in Appendix F.

Views

  • PubReader
  • Print View
  • Cite this Page

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...