[Originally posted at Natural Variation.]
I need to ask for the reader's indulgence, as this post is not about autism, except insofar as determining the merit of correlations has become a perseveration of mine. You see, it is trivial to come up with naive correlations of autism trends vs. practically anything about the modern world. The administrative prevalence of autism has been increasing almost always since records have been kept. Concurrent upward trends of nearly anything, from vaccines to environmental pollution, from trans fats to electromagnetic radiation, and so on, are easy to come by.
In my latest post at LB/RB I suggested that instead of correlating trends in a naive manner, we could attempt to correlate the residuals of time regression models of each trend. A residual is a delta or difference between an observed value and a modeled value. (Here's a concise explanation).
When modeling real world phenomena, regression models will never (or almost never) be perfect fits. For all sorts of reasons, even if simply random fluctuation, there will be deviations from a modeled trend. If there's a causative relationship between two trends, the residuals of (or deviations from) corresponding close-fitting regression models should correlate with one another as well. By this I don't mean that the residuals should always be in the same direction; but they should be in the same direction more often than not, in average.
The nice thing about this technique is that it is completely accessible to anyone with Excel installed. It can also be illustrated graphically, as the reader will see.
So it occurred to me to test this idea in a different field of science where there's controversy over correlation vs. causation. I thought global warming would be a great candidate. After all, the spoof about a decrease in the number of pirates correlating with many other arbitrary trends appears to originate in the global warming debate (see this).
To summarize what I found, there is a strong and statistically significant correlation between cumulative human CO2 emissions and northern hemisphere temperature anomalies. Because of the methodology used, I'm quite confident this cannot be explained by coincidence, data collection errors, solar output as a confound, or causation in the opposite direction.
Now, I fully recognize that I'm only superficially familiar with the debate over anthropogenic global warming. I am also not versed in climatology. Therefore, I cannot be entirely sure that this type of analysis hasn't been done before. Google and Google Scholar searches didn't seem to turn up anything, and given the importance of the topic, I thought it was not only prudent but necessary to put this evidence out there. As always, scrutiny and discussion are welcome.
Northern hemisphere temperature data from 1850 to 2004 was obtained from the Climatic Research Unit of the University of East Anglia, UK.
Global CO2 emission data was obtained from CDIAC. I did not use CO2 atmospheric concentration data because temperature increases can theoretically cause this concentration to increase. Human emissions are what we're interested in. More specifically, I calculated cumulative CO2 emissions for every year since 1850. Greenhouse temperature anomalies are presumably caused by the total amount of CO2 in the atmosphere, not by the emissions in any given year. Since CO2 stays in the atmosphere for 50 to 200 years (source) modeling the cumulative human contribution of CO2 should be adequate enough.
Figure 1 (click to enlarge) is a graph of the general time trends of these two sets of data. It also shows the modeled trend lines we will use to calculate residuals. In this analysis we're using third-order polynomial models. They seem to give a considerably closer fit than second-order polynomial models.
I calculated the residuals and built a scatter graph matching cumulative CO2 (X axis) and temperature (Y axis) residuals for each year from 1850 to 2004. As expected, the slope of a linear regression of the scatter was positive (1.9x10-5) and statistically significant (95% confidence interval 1.13x10-5 to 2.66x10-5).
[Note: Instructions on how to calculate the slope confidence interval of a linear regression with Excel can be found here.]
I suspected, however, that there should be lag between cumulative CO2 fluctuations and temperature fluctuations. It presumably takes some time for heat to be trapped. I proceeded to create a moving average trend line of the temperature residuals. It did in fact have a similar shape to the cumulative CO2 residuals graph, but it appeared to lag it by about 10 years. The reader should be able to roughly see this lag in Figure 1.
So I re-ran the whole analysis by only considering the years 1850 to 1997 and correlating CO2 residuals with residuals of temperature 10 years later. The correlation between these two sets of data is remarkable. Let's start with a bar graph of both sets of residuals, Figure 2.
Figure 2 is a good graph to get a subjective sense of the correlation. Let's see if the math confirms this. Figure 3 is the scatter graph of the residuals.
The slope of a linear regression of the scatter is 2.6x10-5, and it is statistically significant (95% confidence interval 1.88x10-5 to 3.33x10-5). Even the 99.99999999% confidence interval is entirely positive. Unless anthropogenic global warming is a reality, there is no apparent reason why the residuals of cumulative human CO2 emissions should correlate so well with the residuals of temperature 10 years later throughout the last 150 years.
The slope of the scatter is actually more steep than expected, if you consider the naive correlation between cumulative CO2 emissions and temperature. There are probably several reasons for this. The one I believe to be the most likely is that over time CO2 does get removed from the atmosphere. Adding this consideration to the analysis should produce a more accurate slope. The other potential reasons don't bode so well for our species.
[Update 2/22/2010: I have written a follow-up titled Statistical Proof of Anthropogenic Global Warming v2.0.]
What We Have Here is a Failure to Communicate
2 hours ago