On my way home from picking up my kids from school, I heard a story on NPR that included a line from one of the editors of the New England Journal of Medicine, which I thought was worth repeating here.
They were discussing an article in this month’s NEJM about [vioxx/rofecoxib][nejm]. The article is a correction to an earlier NEJM article that concluded that the cardiac risks of Vioxx were not really significant until around 18 months of continued use. With more data available, it appears that the 18 month was just an artifact, not a real phenomenon, and the data appears to show that the cardiac risks of Vioxx start very quickly.
[nejm]: http://content.nejm.org/cgi/reprint/NEJMc066260v1.pdf
As usual for something like this, the authors and the corporate sponsors of the work are all consulted in the publication of a correction or retraction. But in this case, the drug maker objected to the correction, because they wanted the correction to include an additional analysis, which appears to show an 18-month threshold for cardiac risk.
The editors response, paraphrased (since I heard it on the radio and don’t have the exact quote):
“That’s not how you do science. You don’t get all of the data, and then
look for an analysis that produces the results you want. You agree on the
analysis you’re going to do *before* you start the study, and then you use
the analysis that you said you were going to use.”
Hey [Geiers][geier], you listening?
[geier]: http://goodmath.blogspot.com/2006/03/math-slop-autism-and-mercury.html
I would say that you are slightly misrepresenting the situation, at least as I understand it. My understanding is that the Bressalier paper stated in the methods section that they were including time as a covariate in their regression models, when, in reality, they used log(time) (or it was the other way around, can’t remember). On the one hand I agree that this was a major boo-boo, but it was only a boo-boo… especially since the p-value changed modestly from 0.03 to 0.07. This is one of those times where I think it is hard to decide how real those p-values are. I’d love to see what the bootstrapped p-values are
That’s not how you “do science,” as most of us who do science know. That’s how you do a clinical trial for FDA regulatory purposes. Science, like mathematics, is finding patterns. To say that’s how you do science is like saying that you do mathematics by proving theorems. Most mathematicians I know figure things out by conjecturing and experimenting, not by setting down a proof method ahead of time or even first proving things.
martin is also correct, I believe. Whether you use log time or time is an analytical decision and this is a biostatistician’s worst nightmare that they straddle the imaginary dividing line of p = 05. This isn’t to defend Big Pharma, who I spend a good deal of time attacking on Effect Measure. But this isn’t the huge deal it is being made out to be. If this weren’t an FDA regulatory issue this wouldn’t be an issue. But it is. There is really no difference in the p values worth speaking of IMO. In neither case do these values support the null.
martin, revere:
As I said, I heard this on the radio with two kids in the car. I don’t know the specifics of the error that led to the authors publishing a correction; I’m just repeating what I recall as a way of setting up what I thought was really interesting, which was the response of the editor from the NEJM.
Obviously, you should take it in the context of the kind of work that he was talking about (that being clinical trials of drugs). But while there are many differences in the particular specifics of how you do different kinds of science, the underlying point is, I think, the distinguishing characteristic of real science versus crackpottery:
Real science looks at data about reality to try to understand it. The real world comes first – it’s the thing we want to understand. We look at information about the world, at data we collect, and at things we observe, and we analyze those to come to a conclusion. We start with the observation of reality, and try to explain it. We don’t start with an explanation and try to shape reality to fit our explanation.
The way that I interpret his statement is in exactly that vein: if you’re doing a clinical trial, you do a lot of things to ensure that the results of analyzing your data aren’t biased towards a predetermined conclusion. Since you can almost always find a way of analyzing and presenting data to make it say what you want it to, you decide, *in advance*, how the data is going to be analyzed. You eliminate that way of cheating the results – by establishing up-front how you’re going to collect and analyze your data.
If I may use a software engineering analogy: you write the tests before you write the implementation.
You’re simplifying this horribly! I should acknowledged that I don’t work on clinical trials, so there may be differences because of regulation, but in general a statistician has to be able to change the analysis after looking at the data. I don’t know the details either, but from martin’s comments, the change from time to log(time) is reasonable, and could be done to improve the fit of the model. We all do this, because it is bad statistics to report a model that doesn’t fit the data.
Of course, one needs to be able to justify the model used in the analysis: that’s why model checking is so important. In this case, a plot of the residuals against time in te model with time untransformed should be curved. Actually, I’ve log transformed covariates just because their distribution was so positively skewed that the largest values were dominating the relationship. Of course, one then checks the new model to see if it fits.
As for the p-values, the change from 0.03 to 0.07 is nothing: people should stop obsessing about it. Either eay there is weak evidence that the statistic does not equal 0.
Phew. That’s my therapy for today.
Bob
Bob:
Of course I’m simplifying horribly :-). This is a major topic in experimental work: to do justice to the topic, I’d need to write an entire textbook. No matter how carefully you phrase it in one or two sentences, you’re going to be vastly oversimplifying.
I still think that the basic idea of the original quote is great.
I’m no statistician, but I do teach statistics to biology students (there’s a dutch saying which goes literally: in the land of the blind, one-eye is king), and a lot of biology students come to me for advice on how to analyze their data. From this experience I can say that (1) students (and their supervisors!) should spend much more time thinking about experimental design, and (2) it’s usually impossible to completely specify the analysis before the data come in. One simply doesn’t know in advance whether, for example, variance-stabilizing transformations etc are going to be needed.
Well, let’s break it down:
“That’s not how you do science. You don’t get all of the data, and then look for an analysis that produces the results you want.”
True.
” You agree on the analysis you’re going to do *before* you start the study, and then you use the analysis that you said you were going to use.”
Not so much, except maybe in clinical trials. The second statement doesn’t follow from the first, although the structure of the argument suggests that it does.