Who was left out?
This is a first draft of Chapter 2 of my book, "Statistical Evidence."
2.0 Introduction
Research studies often have a narrow focus, but sometimes it can be too narrow. When too many patients are left out, those who remain may not be not representative of the types of patients you will encounter. When you are trying to figure out who was left out and what impact this has, ask the following questions:
2.1 Who was excluded from the study?
2.2 Who refused to join the study?
2.3 Who dropped out during the study?
2.4 Who stopped or switched therapies?2.0.1 Case study: Nicotine patches
The Journal of Pediatrics published a study of adolescent smokers in 1996 (Smith 1996). The researchers recruited 22 volunteers from five public high schools in the Rochester, MN area for participation in a smoking cessation program involving behavioral counseling, group therapy, and nicotine patches. Researchers measured the number of cigarettes smoked, side effects, and blood levels of nicotine.
The purpose of the research was to evaluate "the safety, tolerance, and efficacy of 22 mg/d nicotine patch therapy in smokers younger than 18 years who were trying to stop smoking." The authors also listed a secondary goal, "to compare blood cotinine levels, nicotine withdrawal scores, and adverse experiences with those of adults obtained in previous patch studies." Cotinine is a metabolite of nicotine and provides a useful objective measure of cigarette smoking. It also allowed the authors to examine whether nicotine toxicity was an issue.
This study did not include major segments of the teenage smoking population. The study included only white subjects because there were too few minority students in the Rochester area. Subjects had to get parental permission, excluding smokers who wished to keep their habit secret from their parents. Subjects were also volunteers, and thus could be considered more motivated to quit than the typical teenage smoker.
The study also had a serious drop out rate. Of the presumably thousands of teenage smokers in the Rochester Minnesota area, only 71 volunteers responded to the initial call for subjects. Of the 71 volunteers, 55% met inclusion criteria. Of the remaining 39, 44% declined to attend the initial meeting. Of the remaining 22, 14% were non-compliant. Of the remaining 18, 39% failed to respond to the one year survey. Only 11 completed the entire study (50% of those who started the study; 28% of those meeting inclusion criteria; 15% of the initial volunteers.)
This study had a serious problem with who was left out. The large number of subjects who did not get into the study or who did not complete the study makes it hard to generalize the findings of this research.
2.1 Who was excluded at the start of the study?
Researchers, trying to minimize variation, will use exclusion criteria to create more homogenous groups. While minimizing variability is good, too much homogeneity can backfire. It’s difficult to extrapolate results from a very tightly controlled and homogenous clinical trial to the variation of patients seen in your practice. Ask yourself the question "How similar are my patients?"
Watch out for exclusion criteria that leave out large groups of patients. Sometimes this exclusion is subtle. For example, if you wanted to study adolescent drug use, you might consider a survey of high school students. This survey, though, would exclude anyone who dropped out of school. The dropouts have a far higher rate of drug usage than teenagers who stay in school. If you are interested in all adolescents, but your research design excludes dropouts, you will seriously underestimate drug use (Swaim 1997).
In a different situation, of course, this might not be a terrible problem. It depends on your perspective. A principal trying to understand patterns of drug use in her high school, for example, might actually prefer to exclude dropouts.
You might encounter other subtle exclusions based on the geographic location or the type of health care setting, which places restrictions on the type of patients seen. A study of Midwest hospitals will not have a representative number of Hispanic patients compared to the Southwest. Tertiary care centers only see patients that are extremely ill.
There are three very common and very serious exclusions in medical research, however, that deserve special attention: exclusion of elderly patients, exclusion of women, and exclusion of children.
2.1.1 Exclusion of elderly patients
If you are elderly, pat yourself on the back. Your demographic group drives the healthcare economy. You are, by far, the largest consumers of new medications and new therapies. Yet, far too often, these new medications and new therapies are tested on patients much younger (Bayer 2000).
There's a simple reason for this exclusion. When researchers design their experiments, they want a nice clean sample.
Researchers want patients who are ill with one and only one disease. But with older people, several things will break down at the same time (Schellevis 1993).
Researchers don't want patients who are taking a lot of other medications. But older people take so many different drugs that they often qualify for bulk discounts at Walgreen's.
Finally, researchers want patients who are likely to stay alive for the duration of the research study. But older people are likely to die from conditions unrelated to disease being studied.
Although the reasons for excluding elderly patients are understandable, they are still not justifiable. Research done on younger patients cannot be easily generalized to older patients.
2.1.2 Exclusion of women
Several decades ago, there was a large study of aspirin as a primary prevention against heart attacks (Physicians Health Study Research Group 1989). This study recruited over 20 thousand physicians and asked them to take either a small dose of aspirin every day or take a placebo. They had to follow these physicians for five to ten years because they wouldn't cooperate and have heart attacks faster. At the completion of the study, the researchers announced that aspirin was highly successful at preventing heart attacks.
There was one major problem with the research sample, though. Every single one of the physicians studied was male. Not a single female was included in the sample. It's not as this was a problem only for men. Heart disease kills more women than any other condition.
There are some legitimate concerns when testing drugs that might harm a developing fetus, but you can handle this with careful restrictions to women who are not sexually active and/or who are using an effective form of birth control. In addition, some conditions, such as prostate cancer cannot be tested in women.
There is some dispute over whether gender bias exists, with one study arguing that it still occurs (Ramasubbu 2001) and another arguing that it does not (Meinert 2001). When exclusions of women does occur, it raises troubling questions and hinder your ability to generalize the results of the research.
2.1.3 Exclusion of children
At the opposite extreme from the elderly are children. This group, sadly, is also left out too often (www.aap.org/advocacy/washing/offlabel.htm).
Children are not little adults. The liver in a child will process drugs quite differently from the liver of an adult. The nutritional demands of a growing child are quite different than those of a fully grown adult. And if you thought that your children became unpredictable as they went through puberty, try looking at them from a medical perspective!
No one wants to see our children used as guinea pigs, and there are special ethical reviews and safeguards that we must comply with when we study children.
Our failure, however, to study children in a careful controlled setting will end up subjecting all children to a large and uncontrolled experiment with no prospect of learning which treatments are safe for children and which ones are harmful.
2.1.4 Exclusions: What to look for.
Not all exclusions are bad. Here are some issues to consider.
- Are the excluded patients likely to have a worse prognosis?
- Are any important demographic groups left out or seriously underrepresented?
- Are any of the exclusion criteria artificial and unrepresentative of the patients that you normally see?
2.2 Who refused to join the study?
Quite often, the only patients we are able to study are those who volunteer to help out. The use of volunteers, however, may exclude important segments of the patient population.
Volunteers may differ from the normal population in several important ways. Volunteers for a study involving cash payments may come more often from economically challenged environments. If a free health check-up is included, volunteers may come more often from people worried about their health status. Volunteers for lengthy studies are less likely to be employed.
Smokers who volunteer for a smoking cessation study are quite different than smokers in general (Hughes 1997). It should be obvious, but sometimes it is easy to forget this important distinction. Sometimes you are interested in generalizing to all smokers and sometimes you are interested in generalizing to all smokers who are interested in trying to quit.
2.2.1 Volunteers for painful procedures
Recruiting controls is especially troublesome in a study that involves a painful procedure. A Swedish study documents volunteer bias in a study of personality (Gustavsson 1997). In this study, the researchers wanted to analyze cerebrospinal fluid in order to "examine the associations between personality traits and biochemical variables."
Now, how do you get cerebrospinal fluid? The technical term is lumbar puncture, but it's also called a spinal tap. A spinal tap is rather painful, I'm told, and it carries a small risk of some serious side effects. What sort of person would volunteer to submit to a spinal tap?
In this study, the subjects they recruited had already completed a complete personality profile in a previous research study. Of the 87 subjects, 48 declined to participate. There was one personality trait that was quite different between the "volunteers" and the "refusers". Can you guess what it is?
It turns out that the volunteers had scores roughly a half standard deviation higher on impulsiveness. They did not differ on other personality traits such as socialization and detachment. The large difference in the impulsiveness measurement would obviously cloud any attempt to correlate personality traits and biochemical measurements in spinal fluids among those who volunteered.
2.2.2 Professional volunteers
Many drug companies pay good money for healthy volunteers to test new drugs. If the study involves extensive observation and/or invasive procedures, the amount of money offered can add up. Some volunteers will return repeatedly for different studies. No one gets rich this way, and the amount of money offered can not be so large to be coercive. But serving as a research volunteer can still help pay a few bills and supplement your income.
Do these professional volunteers differ from you and me? You might suspect that these volunteers are poorer and less likely to have a full time job. There are some subtle differences, though, that are even more important.
Example: When genetic testing was done on a group of professional volunteers, there were almost no instances of a genetic variation that was associated with slow metabolism of certain drugs (Chen 1997). This slow metabolism would tend to be associated with a greater chance of side effects. This may not be too surprising. If you have a bad outcome with your first research study, you'll probably not come back for the next study. Unfortunately, this means that studies on professional volunteers could possibly to understate the likelihood and severity of side effects, as compared to the general population.
2.2.3 Refusals: What to look for.
Most studies use volunteers, so you can't just pooh pooh a study for this reason alone. Here are some questions you should ask.
- Are any incentives for participating related to important prognostic factors?
- What are the disincentives for participating? Are any of these important?
- Were the researchers able to characterize various aspects of those who did not volunteer? How similar were the volunteers and non-volunteers?
2.3 Who dropped out during the study?
It is inevitable that some patients will drop out during the study. If the number is more than a few, this is a cause for concern. Dropouts often have a different prognosis than those who stay. Ignoring the dropouts will often paint a rosier picture of the outcome. Was there any effort (financial inducement, follow-up reminders) made to minimize dropouts? Were the authors able to characterize the demographics of the dropouts?
2.3.1 Is the dropout caused by the treatment itself or a poor prognosis?
When the reason for dropping out is unrelated to the study, then you can ignore the dropouts without any serious problem. You lose a little bit of power and precision, but are otherwise okay.
If on the other hand, dropouts are related to prognosis, be careful. If someone drops out of a cancer study to take laetrile treatments down in Mexico, that's often because the therapy assigned as part of the research is not working well.
You might be tempted to think that dropping out because of a move out of town is unrelated to prognosis. Often it is, but keep in mind that you will see more mobility among poorer patients. These patients will often have to move for economic reasons. So if you leave these patients out, then you are excluding patients who are on the lower rungs of the socioeconomic ladder. These patients will often not do as well for a variety of reasons, and their loss will end up making a rosier and more optimistic sample than what you would encounter in the real world.
2.3.2 At what level should the number of dropouts be a concern?
There is no simple answer to this question. Smaller is better, of course, but there are no firm guidelines. I've seen some suggestions that if the rate is 10% then dropouts are not a serious issue. There is no empirical justification for this value, but it seems reasonable enough to me. The larger the rate, the more chance for problems. A dropout rate of 50% or more is almost always a sign of serious problems.
2.3.3 Inferring outcomes for dropouts.
In some contexts, you can infer the status of dropouts as treatment failures. For example, if someone stops attending a smoking cessation program, you have fairly strong justification for treating such a patient as if they were smoking again. In a study of weight loss programs, dropouts could be assumed to have regained any weight that they may have lost. This is not a perfect assumption, but it should work well in practice.
2.3.4 Nonresponse
An aspect of volunteering can occur in survey studies. People who volunteer to return a questionnaire are frequently quite different from those who refuse to fill out the survey. In particular, the non-responders tend to be more apathetic. Return rates for surveys vary by the type of survey, but if less than half of the subjects returned the survey, any results are of very limited value. Again, look for efforts to minimize non-response and/or efforts to characterize the demographics of non-responders.
Example: Two researchers examined general practitioners who routinely failed to return mail surveys (Stocks 2000). A follow-up telephone call assessed demographic characteristics of this group. They were older, less likely to have post graduate qualifications and were less likely to be involved with a teaching practice.
The use of email and the Internet to recruit and/or survey subjects is problematic, because not everyone owns or uses a computer. One study recruited cigarette smokers both by the Internet and by regular mail (Etter 2001). Those subjects recruited by the Internet differed in age, education, degree of smoking, and desire to quit. The authors of this report, however, argue that in spite of these demographic differences, the trends and associations found in the Internet recruited group matched those of the other group. For example, in both groups, light smokers were more likely than heavy smokers to adopt a "taking control" self-change strategy and less likely to adopt a "risk assessment" strategy.
Volunteer bias can be especially troublesome when you are examining issues that are considered by some people to be embarrassing or personal. Two American researchers examined the characteristics of people who were willing and unwilling to volunteer for studies about sexuality (Strassberg 1995). Volunteers had a more positive attitude towards sex, less guilt, and more sexual experiences.
2.3.5 Dropouts: What to look for.
It would be a rare research study that had absolutely no dropouts, so you don't want to be too fussy.
- First, you need to look for the proportion of patients who drop out.
- Second, look for a description of who dropped out. Is this group different from those who completed the study?
- Third, can you infer something about the dropouts and impute a reasonable value for their outcome?
2.4 Who stopped or switched therapies?
When you give a new drug to your patients, unless you watch them as they swallow the pill, you have no guarantee that they took the drug. This is also true for most research studies. The research subjects may not comply with the demands of the study. They may take only some of the medication, may stop taking the medication entirely, or may even switch to the competing medicine. Issues involving compliance are difficult to handle and there is no perfect way to analyze these patients.
Problems with compliance will usually end up diluting the impact of the new therapy. At the extreme, if 100% of your patients are non-compliant in both arms of the study, then you will surely see no difference between any two drugs. Although I discuss compliance from the perspective of a drug study, it is also an issue in non-drug studies. If a patient fails to show up for therapy sessions, or forgoes a required operation, that has the same issues and problems as noncompliance with a drug regimen.
2.4.1 Intention to treat
The intuitive approach is to remove from your study any patients who fail to comply with the protocol. This approach has its merits, but is generally avoided. What most researchers use instead is an "intention to treat" (ITT) approach. With ITT, the patients are analyzed in the groups to which they were originally randomized regardless of how much or how little medication they have taken. In fact, if some of the patients have the opportunity to switch to the competing drug (or therapy) and do so, with ITT, you still analyze them as if they took the drug they were originally assigned to.
There are several reasons why many researchers use ITT. First, researchers will often go to a lot of trouble to ensure randomized assignment in the study. Researchers in surgery have been known to take a sterilized coin into the operating room to choose which surgery to perform (Hollis 1999). When you go to such great lengths to use randomization, you don't want to abandon it without a fight. And when patient choices about whether they comply with the protocol start to determine who gets analyzed in which group, you lose randomization and all the benefits that it confers.
Second, with ITT, you get a more realistic picture of the new drug or therapy. If a drug or therapy is difficult to comply with, then that difficulty ought to be considered as part of the whole package. If noncompliance for a difficult to tolerate drug dilutes the impact of that drug, then that's worth knowing. Keep the noncompliant patients in because you will likely encounter the same patients among those who you regularly treat.
Third, ITT can prevent some serious biases in the research. Consider a new surgical therapy which is being compared to a standard non-surgical therapy. Some patients randomized to the surgical therapy might die prior to receiving the therapy. This is the most extreme form of non-compliance. These patients should still be analyzed as part of the surgical therapy group. Otherwise the rapidly dying patients will be excluded from the treatment group, but not from the control group, leading to serious bias.
As a general rule, noncompliant patients will usually have worse outcomes than compliant patients. In fact, there is solid evidence that patients who fail to comply with a placebo have worse outcomes than patients who comply with a placebo (Coronary Drug Project Research Group 1980; Horwitz 1990). I was quite amazed when I first saw evidence of this, but it actually makes sense. Patients who comply poorly with a placebo probably have other poor self care habits.
2.4.2 What an analysis that excludes noncompliant patients will tell you.
Even though ITT is widely used, there still is a place for the analysis that excludes noncompliant patients. This analysis answers the question, what will happen if I prescribe this drug to a group of patients who all take the drug regularly? The ITT analysis answers a different question: what will happen if I prescribe this drug to a group of patients that includes both compliant and noncompliant patients? It may help to know the answers to both questions.
Example: The MRFIT trial was a randomized comparison of a special intervention to usual care (Cutler 1991). The special intervention encouraged smoking cessation and dietary changes. A comparison of the groups as they were randomized to represented a comparison of special encouragement to change. A comparison of the groups that actually changed represented a different comparison, because some of the people in the special intervention ignored the advice and some of the people in the usual care group changed their habits on their own. This second comparison was of nonrandomized groups, since the patients themselves determined which group they belonged to. Nevertheless, it was interesting, because it involved a comparison, not of the encouragement itself, but of the actual changes that were being encouraged.
2.4.3 Excluding noncompliant patients before the study starts.
Since noncompliant patients can dilute the impact of a new drug, one dubious approach that researchers take is to not let these noncompliant patients into the study at all (Senn 1997). A placebo drug is given to all patients during a single blind run-in period, and anyone who does not comply with the placebo is excluded from the study.
The intent of this exclusion seems good on the surface. Problems with compliance will tend to dilute the effectiveness of a new therapy. At the extreme of 0% compliance, there is no possible way to distinguish the effectiveness. So excluding noncompliant patients before the study starts will avoid this dilution effect.
The problem is that the researchers have jumped from the frying pan of compliance problems into the fire of poor generalizability. Unlike the researchers, you do not have the option of only treating patients who are compliant. And you will not have any reasonable way to screen out those noncompliant patients for special handling. So excluding noncompliant patients causes the same problems as excluding children, women, or the elderly.
2.4.4 Intention to treat: What to look for.
When you are looking at compliance issues, consider the following issues:
- Was any attempt made to assess compliance?
- Was the compliance level similar to patients seen in your practice?
- Would additional analysis using the treatment actually received answer a different, but still important question?
2.5 Summary - Who was left out?
Exclusion of subjects can make the study biased or less generalizable.
Who was excluded at the start of the study? Excessively strict entry criteria in a research study can make it difficult to extrapolate to the types of patients that you normally see.
Who refused to join the study? Do the volunteers differ substantially from refusers in ways that might influence the outcome of the study?
Who dropped out during the study? Did these dropouts have a different prognosis?
Who stopped or switched therapies? If there are compliance issues, handle the non-compliant patients carefully.
This webpage was written by Steve Simon on (unknown date), edited by Steve Simon and Linda Foland, and was last modified on 2008-07-08. Send feedback to ssimon at cmh dot edu or click on the email link at the top of the page. Category: Statistical evidence
