Supply and Demand (in that order): Standard Errors Can Be Deadly

Whether you like or dislike the Affordable Care Act, it helps to understand the distinction between statistical significance and practical importance.

Taxpayers want to know whether the government programs they pay for actually make a difference, so measurement is a critical part of policy evaluation. That’s where econometric and statistical analysis comes in: gather data under various policy implementations and try to measure the differences.

Take the act’s major expansion of Medicaid benefits, going to able-bodied adults living at or below 133 percent of the poverty line (without regard to asset ownership), which will occur in most states at the beginning of next year. Will the expansion make adults healthier?

An important study, published in The New England Journal of Medicine in May, set out to help answer the question by examining an Oregon-specific Medicaid expansion in 2008. In their short summary of conclusions, the authors wrote, “Medicaid coverage generated no significant improvements in measured health outcomes.” Opponents of the Medicaid expansion quote that sentence, telling us that we might not want to expand a health program that doesn’t actually make people healthier. Proponents of the expansion acknowledge the conclusion, too, and try to help readers find encouraging results elsewhere in the study.

Among the proponents, Ezra Klein wrote, “The health care itself didn’t work as well as we hoped.” In same issue of The New England Journal of
Medicine where the study appeared, a separate editorial noted, “Medicaid coverage did not significantly improve blood-pressure control, cholesterol levels or glycated hemoglobin levels,” but insisted that the study demonstrated that Medicaid provided “considerable financial protection” and “dramatically improved access to care.” Josh Barro wrote that the study “did not find significant effects on the physical health measures that were tracked,” adding, “This is bad news for advocates of the Medicaid expansion.”

In the 2008 book “The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice and Lives,” Deirdre McCloskey and Stephen Ziliak warned that applications of statistics can and do go awry. They scold analysts who

look only for a probability of success in the crossing — the existence of a probability of success better than 0.99 or 0.95 or 0.90, and this within the restricted frame of sampling — ignoring in any spiritual or financial currency the value of the prize and the expected cost of pursuing it.

When a study measures a statistically significant difference, that only means that the difference probably is not an artifact of an insufficient study sample size, and that a similar difference would most likely be found in other samples. When a difference is not found to be statistically significant, it is possible that other samples will find differences in the opposite direction.

Statistical significance has little to do with “significance” as understood by laymen, who think of practical importance when they read that term. Professors McCloskey and Ziliak urge those citing statistics to recognize the distinction and pay attention to practical importance.

The commentators on both sides of the Medicaid debate made exactly this error. At best, they use “significance” in the statistical (and, in a journalism setting, esoteric) sense without cautioning people what it means, leaving practical importance as a footnote or side comment. Perhaps the commentators themselves misunderstand the study and think it found only practically unimportant differences. (I thank Darius Lakdawalla, a founding partner of Precision Health Economics, for bringing to my attention this issue with the Medicaid debate; see also commentaries by Jim Manzi and the Incidental Economist blog).

Economists are supposed to distinguish statistical significance from practical significance and sometimes do (as in a study by Jesse Rothstein of the University of California, Berkeley), but at least two economists have joined the commentators this time.

Table 2 in the study of Oregon published in The New England Journal of Medicine shows that 16.3 percent of the control sample without access to Medicaid had elevated blood pressure. The effect of Medicaid coverage is (with 95 percent confidence, according to the study) somewhere between -7.2 percentage points and +4.5 percentage points. By the standard of statistical significance, the study is consistent with the view that a Medicaid expansion would reduce elevated blood pressure prevalence from 16.3 percent to, say, 10 percent. (An effect of -6.3 is within what statisticians call the confidence interval.)

I’m not sure what Ezra Klein was hoping for, but a practical person could well interpret that kind of reduction as health care that does “work.” (Note, however, that the study is also consistent with the view that Medicaid has no effect, because 0.0 is also in the confidence interval.)

The same table shows that 5.1 percent of the control sample had glycated hemoglobin high enough to be considered an indicator that diabetes is severe and not well-controlled. The effect of Medicaid coverage on the prevalence of this condition is between -4.4 and +2.6 percentage points. Cutting prevalence from 5.1 to, say, 3 is practically important and consistent with the findings of the study.

To put it another way, the only way the study could have found a statistically significant result here would be for Medicaid to essentially eliminate an important symptom of diabetes in a two-year time frame. Medicaid coverage could be quite valuable without passing that standard (even the Supreme Court has looked at this issue and concluded that statistical significance is not the only reliable standard of causation).

The authors of the study appear to be aware of these issues, because they note toward its end, “Our power to detect changes in health was limited by the relatively small numbers of patients with these conditions.” If you are wondering why so much of the article, especially its front page, fails to qualify its use of “significance” as statistical, the authors tell me that the journal editors insist that authors use the word “significance” when they really mean “statistical significance,” in order to shorten the sentences. Perhaps that’s why the journal has a history of spawning misunderstandings like this; see Chapter 16 of “The Cult of Statistical Significance.” (I asked the journal’s editors to respond on this point and have not yet heard back.)

If the Oregon study prevents even one state from expanding its Medicaid program, Affordable Care Act proponents could assert that, as Professors McCloskey and Ziliak predicted, emphasis on statistical significance has proven to be deadly. Even if you think, as I do, that the law has fatal flaws, the Oregon study of Medicaid is not the place to find them.

Supply and Demand (in that order)

Wednesday, June 26, 2013

Standard Errors Can Be Deadly

No comments:

Post a Comment