Understanding External Validity | Emergency Physicians Monthly

A study’s external validity is threatened if there may be systematic error in the way its results can be applied to patients outside the precise study set. We should always be concerned about external validity if a study deals with a group of patients who are noticeably different in essential characteristics, relevant to the goal of the study, than the type of patients to whom we ourselves might be interested in applying the results.

Dr. Jerome Hoffman’s final installment on how to properly read medical literature.

In a study of patients with acute MI, those with more than 6 PVCs per minute were shown to have increased mortality. The authors concluded that all people with PVCs of similar frequency should receive long-term, anti-arrhythmic therapy. Because of the influential nature of the study, many physicians changed their practice, and countless patients were so treated. Several years later, a series of large studies showed that mortality is higher in patients with a history of MI who are treated with anti-arrhythmics than with placebo (even though they have many fewer PVCs on routine Holter monitoring).

This is true in part because anti-arrhythmics are not benign drugs and can cause sudden death; the benefit in controlling arrhythmias may be outweighed by the risks of treatment. But it is also true that while PVCs are a risk factor during acute ischemia, they are much less so, even in patients with ischemic disease, at other times. Thus, it is inappropriate to extrapolate from a group having an acute MI to all other patients. Such extrapolation did in fact create havoc with regard to PVCs, but it is done all the time when the study group is different than the universe of patients to whom the data is applied.

Numerous patients (though a small percent of all patients so treated) have had strokes, MIs or even died, when they were given acute treatment for asymptomatic “hypertensive urgencies,” based on inappropriate extrapolation of data. Long-range morbidity and mortality is indeed increased by failure to control blood pressure over time (and this is accentuated the higher the blood pressure in question). At the same time, short-range mortality is increased by failure to lower blood pressure acutely in patients with end-organ damage from hypertensive crisis. However, application of data from these 2 settings to asymptomatic hypertensives, even with very high pressures, is not appropriate since there is no evidence whatsoever that acute treatment is beneficial to them. Furthermore, chasing their blood pressure will lead to relative hypoperfusion of heart and brain in a few of them, resulting in potentially disastrous consequences.

Another example is the “benign” nature of diuretic-induced hypokalemia. Large series involving patients with chronic hypertension have shown no increase in the incidence of ambulatory arrhythmias on Holter monitoring in patients treated with diuretics, despite evidence of hypokalemia. What this fails to address is the risk to such patients not when they’re walking around asymptomatic, but rather when they develop acute ischemia. It is clear that at the time of acute ischemia, hypokalemia is a critical risk factor for life-threatening arrhythmias.

You also shouldn’t extrapolate “beyond the range of the data.” Thus, a drug that works in moderate hypertension cannot be assumed to work as well in severe, or mild, hypertension. Decreased ICP associated with lowering pCO2 to 25-30 mm Hg cannot be assumed to mean that lowering pCO2 further to 15 mm Hg would have added benefit.
One study involved autopsies on all patients who died in a given hospital. Chart reviews showed that 50% of those with pathologic evidence of MI were undiagnosed before death. The authors also found that such “missed MIs” were characterized by atypical histories and non-diagnostic EKGs. They concluded that 1/2 of all MI patients are missed by clinicians, and that in order to do better, we need to further evaluate all patients with atypical histories and non-diagnostic EKGs.

The fallacy of this of course is that in clinical practice we are not routinely evaluating dead patients. Perhaps, the MIs weren’t missed, but occurred as a terminal event in fatally ill patients. Perhaps various aspects of the atypical presentation (for example pain in the jaw rather than in the chest, or primary shortness of breath) may correlate with a more severe clinical picture that led to death in these patients, which don’t often occur in patients who don’t die. Perhaps clinicians only miss a very small percentage of MIs (there’s good evidence from large series that we miss somewhere between 1-5% of MIs), and the far higher number in this series of dead people partially reflects that mortality is heightened when the disease is unrecognized, which is why misses are over-represented in such a group. This calculates out quite nicely, in fact. If we miss 5% of 1000 MIs (50 patients), and half die (25 dead), while at the same time death occurs in about 25 (4% of those 975 patients) whose MIs were diagnosed (a reasonable outcome in the US), we’d find that half of the MIs discovered at autopsy were not recognized ante-mortem!
The important point is that it would be entirely inappropriate to suggest either that we’re missing half the patients who present with an MI, or that we would improve our accuracy in diagnosis of all patients with chest pain by evaluating the findings in this small and atypical group of patients who go on to die from MI.

Conclusions
While it is fairly easy to gather lots of “data,” it is much more difficult to perform meaningful research, particularly in the type of clinical settings where many factors are uncontrolled and many confounding variables may impact results. It is important for us, as readers, to understand basic concepts of validity and bias, basic statistical manipulations and the manner in which results from sample groups studied can be extrapolated to the universe of other patients that the rest of us see. Only by reading critically and being armed with these concepts will we be able to sift the wheat from the chaff, learning from the mass of fairly raw material with which are so regularly presented.

Leave A Reply Cancel Reply