Wednesday, December 23, 2009

On the Sun, Planets, CO2 and Models

Climate forcing due to solar irradiance is not that difficult to figure out. Per Stefan–Boltzmann law, we know that the equilibrium temperature (in degrees Kelvin) of a black body in the solar system is given by:

T' = k·S1/4

S is the solar irradiance. The value of S for planet Earth is often quoted as 1367.6 W/m2.

It turns out that the temperature formula above is generally applicable to almost all the planets in our solar system. Figure 1 is a scatter chart of solar irradiance (to the 1/4th power) vs. the mean temperature of planets in our solar system.



Except for the planet Venus, it's clear we have a correlation so strong that it could only be the reflection of a straightforward law of Physics. Planet Earth actually falls a little bit outside the model. It's about 8 degrees Celsius warmer than the formula would predict. (I understand this is called the natural greenhouse effect.) If we remove Earth from the analysis, the correlation coefficient improves to 0.9995 and the slope of the linear trend becomes 46.163. Hence, the formula for the equilibrium temperature of any arbitrary planet, provided nothing unusual is going on with it, is the following.

T' = 46.163·S1/4

Per Figure 1, again, this appears to be quite precise. With this formula we can begin to estimate what might happen if solar irradiance were to increase from, say, 1363.4 W/m2 to 1366.7 W/m2. This is the range of variation reported in the Lean (2000) solar irradiance reconstruction between the years 1610 and 2000.

With an increase of that magnitude, I estimate that equilibrium temperature would rise by only 0.2 degrees. It can't possibly explain a 0.8- to 1.6-degree anomaly by itself.

CO2 Model

When I first started to look at data and claims related to climate science, and noticed the concept of temperature sensitivity to CO2 doubling, my impression was that equilibrium temperature could be approximated by a formula such as this one:

T' = baseline + a·ln(C)

C is the concentration of CO2, most likely in ppmv, and a an unknown constant. Climate sensitivity would be given by a·ln(2).

There are other types of formulas that model how CO2 would affect radiative forcing, as opposed to equilibrium temperature. These work in a different manner, but in general these formulas can only be approximations of the real world. They may be roughly applicable to our planetary reality, but they won't work in general.

The first problem with the conventional doubling model is that it can't work for really small concentrations of the greenhouse gas. Imagine there are only 100 molecules of CO2 in the entire atmosphere, and we subsequently double this concentration to 200 molecules. Should we expect equilibrium temperature to increase by about 3 degrees as a result?

The second problem is that greenhouse forcing is temperature dependent. In particular, it can't work for really low temperatures. It's clear from Figure 1 that a planet receiving zero solar irradiance would have a temperature of absolute zero. It would not irradiate out any energy due to its temperature, and no (theoretical) greenhouse gas would change this.

Let me propose a model that subsumes the previous models and addresses the two problems outlined.

T' = 46.163·(S·(1 + a·ln(b·C + 1)))1/4

When b·C is sufficiently large, the extra greenhouse forcing is proportional to the logarithm of C. When C is zero, the formula is reduced to the one derived from Figure 1. When solar irradiance is zero, temperature is still zero, regardless of C. When C increases from 100 to 200 molecules, the effect is negligible.

That's still an approximation in other ways, no doubt, but I expect to use it in the future.

Monday, December 7, 2009

A trick to "hide the decline" in autism data

The SwiftHack controversy reminded me of something that is done routinely in epidemiology when considering prevalence by birth year data.

I was thinking of what is perhaps the most cited email of SwiftHack; this one by Phil Jones:

From: Phil Jones
To: ray bradley ,mann@xxxxx.xxx, mhughes@xxxx.xxx
Subject: Diagram for WMO Statement
Date: Tue, 16 Nov 1999 13:31:15 +0000
Cc: k.briffa@xxx.xx.xx,t.osborn@xxxx.xxx

Dear Ray, Mike and Malcolm,
Once Tim’s got a diagram here we’ll send that either later today or first thing tomorrow. I’ve just completed Mike’s Nature trick of adding in the real temps to each series for the last 20 years (ie from 1981 onwards) amd from 1961 for Keith’s to hide the decline. Mike’s series got the annual land and marine values while the other two got April-Sept for NH land N of 20N. The latter two are real for 1999, while the estimate for 1999 for NH combined is +0.44C wrt 61-90. The Global estimate for 1999 with data through Oct is +0.35C cf. 0.57 for 1998.

Thanks for the comments, Ray.

Cheers
Phil


Some people seem to be under the impression that the "decline" refers to the decline in temperatures shown in some databases when you cherry-pick 1998 as the starting year in temperature slope calculations. The email was written in 1999, so that can't be it, first of all.

Second, how exactly do you fudge data by adding in "real" data? That doesn't make any sense on the surface.

Once you parse the statement further, you realize that "Mike's Nature trick" is a reference to Figure 5b of Mann, Bradley & Hughes (1998), a paper published in Nature, and whose first author is Mike Mann. The figure follows.



The figure is not very clear, unfortunately, because it's in black and white, but you can see it's simply a graph of multiple time series: (1) A historical temperature reconstruction up to 1980, (2) The instrumental record (referred to as "actual data" in the figure, and "real temps" by Phil Jones), and (3) a 50-year low-pass filter of the reconstructed series. Really, there's nothing underhanded or nefarious about plotting multiple series in one graph. It's not much of a "trick" at all.

In addition to this, Phil Jones says he added the real temps to Keith Briffa's series, starting in 1961, in order to "hide the decline." That's the interesting part, isn't it? After some digging (hat tip Deltoid) it's clear the "decline" after 1961 refers to something related to tree-ring data that was documented in Briffa et al. (1998). Figure 6 of this paper follows.



It's pretty clear why you'd want to add the real temps starting at about 1961. Again, I don't think there's anything wrong with this.

Some people might insist, however, that throwing away the last part of a series because it doesn't show what you think it should show is just wrong. It's not valid scientific methodology, and so on and so forth, ad nauseum.

Is it wrong? Are you sure? Let me show you something that I bet most readers of this blog have not seen.



This is a graph of the administrative prevalence of autism by birth year. The X axis is the year of birth. For example, the administrative prevalence of autism for children born in 1992, according to special education counts, was 34.5 in 10,000 as reported in 2003. The prevalence is much lower for children born in the year 2000: 12 in 10,000.

Is it actually true that the prevalence of autism is that low for children born in 2000? Absolutely not. That's what the 2003 report said. The 2009 report will tell you something very different. You see, it takes time for children to get diagnosed with autism. If they are too young, they might not have any label at all, or they might be classified under a different label.

Autism prevalence by birth year series will always decline in recent years. They also change as the series are revised year after year. (That's why I suggest using prevalence at age by report year instead, but I digress.)

If you must use a prevalence by birth year series, there's a "trick" you can use to "hide the decline," and this "trick" has a name: You left censor the series. In an administrative autism series, you probably should not consider children below the age of 8.

I have used this "trick" myself in a blog post titled Is Precipitation Associated with Autism? Now I'm Quite Sure It's Not. I said: "I'm left-censoring autism caseload starting at 2000." So there you go. Joseph, from the Residual Analysis blog, has admitted to using a "trick" to "hide the decline." Feel free to quote me on that.

Once again, I think AGW deniers need to grow up a little.

Thursday, December 3, 2009

Statistical Proof of Anthropogenic Global Warming v2.0

Given that I'm again looking at climate data, I decided to write a follow-up of the post that launched this blog (originally here.) That was an analysis intended for skeptical laypersons, illustrated graphically at each step. Essentially, I demonstrated that you can associate CO2 emissions with temperatures in a way that eliminates the possibility of coincidence.

You see, it's very easy to associate CO2 with temperatures in a "naive" way. It's also very easy to associate the number of pirates with temperatures this way. In fact, you can associate any two trends in this manner.

I suggested removing the trends from the data, and then comparing the resulting detrended series. I explained it a little differently back then, but that's what it amounts to. Since that time I have learned this is called detrended cross-correlation analysis.

There were a number of criticisms of my analysis posted in comments. I didn't find any of them very convincing, to be honest, but they can be addressed as follows.
  1. Instead of using northern-hemisphere land temperatures, I will use global sea-surface temperatures (HadSST2 data set.) The criticism was that fluctuations of CO2 emissions could be associated with (i.e. confounded by) fluctuations of the urban heat island effect. I think that's a bit of a stretch, and this sort of objection has been addressed elsewhere, but I thought I would simply use sea-surface temperatures to eliminate the potential confound altogether.
  2. Instead of using cumulative CO2 emissions, I will use an ice core-based CO2 reconstruction from Etheridge et al. (1998), combined with Mauna Loa CO2 data. I will use Mauna Loa data only for 1979 onwards, and I will adjust it by subtracting 0.996 ppmv from each data point. There's an excellent match between the Etheridge et al. data and Mauna Loa data for the years 1958 to 1978, but there's a tiny offset of 0.996 ppmv between the two.
  3. I will use the logarithm of the CO2 concentration. This is theoretically more accurate. The equilibrium temperature of the planet depends on the concentration of green house gases, logarithmically. That's why climate scientists talk about sensitivity to CO2 doubling.

I will start by posting a graph of the original sea surface temperature (SST) and (logarithm of) CO2 series. This is Figure 1.

I said we would remove the trend from the series. What exactly is the trend, you might ask. There are many ways to model a trend. We could use a straight line, but then you could say that the series are not linear. The wobbles around each of the linear trends might also coincide. A polynomial trendline can model trends that are not linear. A 2nd-order polynomial trendline might be enough. I went straight for the cubic or 3rd-order polynomial trendline, which gives a slightly better fit. Those are the yellow lines that you see in Figure 1.

To detrend a series, you simply subtract the trendline from the series. The result of the operation can be seen in Figure 2 below.



The scales are a little different, but you should be able to tell, visually, how Figure 2 is obtained from Figure 1.

I added a vertical dashed brown line around the year 1890. I suspect data is simply wrong prior to 1890, for no other reason than the fact that it doesn't look good visually. I will analyze both the entire range (1850-2008) and the shorter one (1890-2008). Curiously, I had previously found something similar in a comparison of SSTs and named storms, but I assumed the storm data was the one in error.

Obviously, changes in CO2 won't be reflected in temperature fluctuations instantaneously. It takes time for heat to be trapped. Or to put in more technical terms, CO2 is logarithmically proportional to the equilibrium temperature. So there should be a lag.

Looking at the whole series, a statistically significant association between the detrended series starts to become significant with a lag of 6 years. The best lag (based on correlation coefficient) is 15 years. At a lag of 15 years, the association is significant with 99.997% confidence.

In my previous analysis I had found a lag of 10 years was the best lag, not 15. Interestingly, a lag of 10 years is exactly the best lag in the current analysis if I only consider data starting at 1890. In my estimation, 10 years is the true best lag, and this is completely justifiable theoretically by other means.

When you look at data from 1890 onwards, even a lag of zero will result in a statistically significant association. With a lag of 10 years, the association is significant with 99.99996% confidence. I think it's worth looking at a graphical representation of this association: the following scatter chart (Figure 3.)



In Figure 3, each dot represents a year. The graph tells us that the higher the detrended CO2 concentration in a given year, the higher the detrended sea-surface temperature, 10 years later. We can also calculate the 95% confidence interval of the slope of the trend, which is 13.7 to 29.7 in this case.

Given the methodology used, and the direction of the lag, this result can't be anything but indicative of a causal association between CO2 and sea-surface temperatures.

The association can't be explained by:
  • Coincidence.- Because we removed series trends, and because the associations are highly significant, mere-chance coincidences are exceedingly improbable.
  • Correlation is not causation.- For the same reasons, because of the lag, and because there appear to be no confounds, the only plausible explanation for the correlation is causation in this case.
  • Urban heat island.- The urban heat island effect should not be relevant to sea-surface temperatures.
  • Error and bias.- Errors would tend to hinder the analysis rather than help, just like we see with the data prior to 1890. Any systematic bias should be taken care of by the detrending step.
  • Conspiracy.- It would be preposterous to suppose that someone doctored the data sets (in just the right manner) anticipating this type of analysis several years in advance.
  • Assorted Non-sequiturs.- Arguments such as "Al Gore sucks" can be dismissed off-hand.

Wednesday, December 2, 2009

Skeptic's Circle #125

The latest Skeptic's Circle (#125) is up at Effort Sisyfus, which is run by TechSkeptic. I have a couple of posts in the circle this time around. Hint: Take the blue pill :)