Wednesday, December 23, 2009

On the Sun, Planets, CO2 and Models

Climate forcing due to solar irradiance is not that difficult to figure out. Per Stefan–Boltzmann law, we know that the equilibrium temperature (in degrees Kelvin) of a black body in the solar system is given by:

T' = k·S1/4

S is the solar irradiance. The value of S for planet Earth is often quoted as 1367.6 W/m2.

It turns out that the temperature formula above is generally applicable to almost all the planets in our solar system. Figure 1 is a scatter chart of solar irradiance (to the 1/4th power) vs. the mean temperature of planets in our solar system.

solar irradiance and temperature of planets

Except for the planet Venus, it's clear we have a correlation so strong that it could only be the reflection of a straightforward law of Physics. Planet Earth actually falls a little bit outside the model. It's about 8 degrees Celsius warmer than the formula would predict. (I understand this is called the natural greenhouse effect.) If we remove Earth from the analysis, the correlation coefficient improves to 0.9995 and the slope of the linear trend becomes 46.163. Hence, the formula for the equilibrium temperature of any arbitrary planet, provided nothing unusual is going on with it, is the following.

T' = 46.163·S1/4

Per Figure 1, again, this appears to be quite precise. With this formula we can begin to estimate what might happen if solar irradiance were to increase from, say, 1363.4 W/m2 to 1366.7 W/m2. This is the range of variation reported in the Lean (2000) solar irradiance reconstruction between the years 1610 and 2000.

With an increase of that magnitude, I estimate that equilibrium temperature would rise by only 0.2 degrees. It can't possibly explain a 0.8- to 1.6-degree anomaly by itself.

CO2 Model

When I first started to look at data and claims related to climate science, and noticed the concept of temperature sensitivity to CO2 doubling, my impression was that equilibrium temperature could be approximated by a formula such as this one:

T' = baseline + a·ln(C)

C is the concentration of CO2, most likely in ppmv, and a an unknown constant. Climate sensitivity would be given by a·ln(2).

There are other types of formulas that model how CO2 would affect radiative forcing, as opposed to equilibrium temperature. These work in a different manner, but in general these formulas can only be approximations of the real world. They may be roughly applicable to our planetary reality, but they won't work in general.

The first problem with the conventional doubling model is that it can't work for really small concentrations of the greenhouse gas. Imagine there are only 100 molecules of CO2 in the entire atmosphere, and we subsequently double this concentration to 200 molecules. Should we expect equilibrium temperature to increase by about 3 degrees as a result?

The second problem is that greenhouse forcing is temperature dependent. In particular, it can't work for really low temperatures. It's clear from Figure 1 that a planet receiving zero solar irradiance would have a temperature of absolute zero. It would not irradiate out any energy due to its temperature, and no (theoretical) greenhouse gas would change this.

Let me propose a model that subsumes the previous models and addresses the two problems outlined.

T' = 46.163·(S·(1 + a·ln(b·C + 1)))1/4

When b·C is sufficiently large, the extra greenhouse forcing is proportional to the logarithm of C. When C is zero, the formula is reduced to the one derived from Figure 1. When solar irradiance is zero, temperature is still zero, regardless of C. When C increases from 100 to 200 molecules, the effect is negligible.

That's still an approximation in other ways, no doubt, but I expect to use it in the future.

Monday, December 7, 2009

A trick to "hide the decline" in autism data

The SwiftHack controversy reminded me of something that is done routinely in epidemiology when considering prevalence by birth year data.

I was thinking of what is perhaps the most cited email of SwiftHack; this one by Phil Jones:

From: Phil Jones
To: ray bradley ,,
Subject: Diagram for WMO Statement
Date: Tue, 16 Nov 1999 13:31:15 +0000
Cc: k.briffa@xxx.xx.xx,

Dear Ray, Mike and Malcolm,
Once Tim’s got a diagram here we’ll send that either later today or first thing tomorrow. I’ve just completed Mike’s Nature trick of adding in the real temps to each series for the last 20 years (ie from 1981 onwards) amd from 1961 for Keith’s to hide the decline. Mike’s series got the annual land and marine values while the other two got April-Sept for NH land N of 20N. The latter two are real for 1999, while the estimate for 1999 for NH combined is +0.44C wrt 61-90. The Global estimate for 1999 with data through Oct is +0.35C cf. 0.57 for 1998.

Thanks for the comments, Ray.


Some people seem to be under the impression that the "decline" refers to the decline in temperatures shown in some databases when you cherry-pick 1998 as the starting year in temperature slope calculations. The email was written in 1999, so that can't be it, first of all.

Second, how exactly do you fudge data by adding in "real" data? That doesn't make any sense on the surface.

Once you parse the statement further, you realize that "Mike's Nature trick" is a reference to Figure 5b of Mann, Bradley & Hughes (1998), a paper published in Nature, and whose first author is Mike Mann. The figure follows.

The figure is not very clear, unfortunately, because it's in black and white, but you can see it's simply a graph of multiple time series: (1) A historical temperature reconstruction up to 1980, (2) The instrumental record (referred to as "actual data" in the figure, and "real temps" by Phil Jones), and (3) a 50-year low-pass filter of the reconstructed series. Really, there's nothing underhanded or nefarious about plotting multiple series in one graph. It's not much of a "trick" at all.

In addition to this, Phil Jones says he added the real temps to Keith Briffa's series, starting in 1961, in order to "hide the decline." That's the interesting part, isn't it? After some digging (hat tip Deltoid) it's clear the "decline" after 1961 refers to something related to tree-ring data that was documented in Briffa et al. (1998). Figure 6 of this paper follows.

It's pretty clear why you'd want to add the real temps starting at about 1961. Again, I don't think there's anything wrong with this.

Some people might insist, however, that throwing away the last part of a series because it doesn't show what you think it should show is just wrong. It's not valid scientific methodology, and so on and so forth, ad nauseum.

Is it wrong? Are you sure? Let me show you something that I bet most readers of this blog have not seen.

This is a graph of the administrative prevalence of autism by birth year. The X axis is the year of birth. For example, the administrative prevalence of autism for children born in 1992, according to special education counts, was 34.5 in 10,000 as reported in 2003. The prevalence is much lower for children born in the year 2000: 12 in 10,000.

Is it actually true that the prevalence of autism is that low for children born in 2000? Absolutely not. That's what the 2003 report said. The 2009 report will tell you something very different. You see, it takes time for children to get diagnosed with autism. If they are too young, they might not have any label at all, or they might be classified under a different label.

Autism prevalence by birth year series will always decline in recent years. They also change as the series are revised year after year. (That's why I suggest using prevalence at age by report year instead, but I digress.)

If you must use a prevalence by birth year series, there's a "trick" you can use to "hide the decline," and this "trick" has a name: You left censor the series. In an administrative autism series, you probably should not consider children below the age of 8.

I have used this "trick" myself in a blog post titled Is Precipitation Associated with Autism? Now I'm Quite Sure It's Not. I said: "I'm left-censoring autism caseload starting at 2000." So there you go. Joseph, from the Residual Analysis blog, has admitted to using a "trick" to "hide the decline." Feel free to quote me on that.

Once again, I think AGW deniers need to grow up a little.

Thursday, December 3, 2009

Statistical Proof of Anthropogenic Global Warming v2.0

Given that I'm again looking at climate data, I decided to write a follow-up of the post that launched this blog (originally here.) That was an analysis intended for skeptical laypersons, illustrated graphically at each step. Essentially, I demonstrated that you can associate CO2 emissions with temperatures in a way that eliminates the possibility of coincidence.

You see, it's very easy to associate CO2 with temperatures in a "naive" way. It's also very easy to associate the number of pirates with temperatures this way. In fact, you can associate any two trends in this manner.

I suggested removing the trends from the data, and then comparing the resulting detrended series. I explained it a little differently back then, but that's what it amounts to. Since that time I have learned this is called detrended cross-correlation analysis.

There were a number of criticisms of my analysis posted in comments. I didn't find any of them very convincing, to be honest, but they can be addressed as follows.
  1. Instead of using northern-hemisphere land temperatures, I will use global sea-surface temperatures (HadSST2 data set.) The criticism was that fluctuations of CO2 emissions could be associated with (i.e. confounded by) fluctuations of the urban heat island effect. I think that's a bit of a stretch, and this sort of objection has been addressed elsewhere, but I thought I would simply use sea-surface temperatures to eliminate the potential confound altogether.
  2. Instead of using cumulative CO2 emissions, I will use an ice core-based CO2 reconstruction from Etheridge et al. (1998), combined with Mauna Loa CO2 data. I will use Mauna Loa data only for 1979 onwards, and I will adjust it by subtracting 0.996 ppmv from each data point. There's an excellent match between the Etheridge et al. data and Mauna Loa data for the years 1958 to 1978, but there's a tiny offset of 0.996 ppmv between the two.
  3. I will use the logarithm of the CO2 concentration. This is theoretically more accurate. The equilibrium temperature of the planet depends on the concentration of green house gases, logarithmically. That's why climate scientists talk about sensitivity to CO2 doubling.

I will start by posting a graph of the original sea surface temperature (SST) and (logarithm of) CO2 series. This is Figure 1.

I said we would remove the trend from the series. What exactly is the trend, you might ask. There are many ways to model a trend. We could use a straight line, but then you could say that the series are not linear. The wobbles around each of the linear trends might also coincide. A polynomial trendline can model trends that are not linear. A 2nd-order polynomial trendline might be enough. I went straight for the cubic or 3rd-order polynomial trendline, which gives a slightly better fit. Those are the yellow lines that you see in Figure 1.

To detrend a series, you simply subtract the trendline from the series. The result of the operation can be seen in Figure 2 below.

The scales are a little different, but you should be able to tell, visually, how Figure 2 is obtained from Figure 1.

I added a vertical dashed brown line around the year 1890. I suspect data is simply wrong prior to 1890, for no other reason than the fact that it doesn't look good visually. I will analyze both the entire range (1850-2008) and the shorter one (1890-2008). Curiously, I had previously found something similar in a comparison of SSTs and named storms, but I assumed the storm data was the one in error.

Obviously, changes in CO2 won't be reflected in temperature fluctuations instantaneously. It takes time for heat to be trapped. Or to put in more technical terms, CO2 is logarithmically proportional to the equilibrium temperature. So there should be a lag.

Looking at the whole series, a statistically significant association between the detrended series starts to become significant with a lag of 6 years. The best lag (based on correlation coefficient) is 15 years. At a lag of 15 years, the association is significant with 99.997% confidence.

In my previous analysis I had found a lag of 10 years was the best lag, not 15. Interestingly, a lag of 10 years is exactly the best lag in the current analysis if I only consider data starting at 1890. In my estimation, 10 years is the true best lag, and this is completely justifiable theoretically by other means.

When you look at data from 1890 onwards, even a lag of zero will result in a statistically significant association. With a lag of 10 years, the association is significant with 99.99996% confidence. I think it's worth looking at a graphical representation of this association: the following scatter chart (Figure 3.)

In Figure 3, each dot represents a year. The graph tells us that the higher the detrended CO2 concentration in a given year, the higher the detrended sea-surface temperature, 10 years later. We can also calculate the 95% confidence interval of the slope of the trend, which is 13.7 to 29.7 in this case.

Given the methodology used, and the direction of the lag, this result can't be anything but indicative of a causal association between CO2 and sea-surface temperatures.

The association can't be explained by:
  • Coincidence.- Because we removed series trends, and because the associations are highly significant, mere-chance coincidences are exceedingly improbable.
  • Correlation is not causation.- For the same reasons, because of the lag, and because there appear to be no confounds, the only plausible explanation for the correlation is causation in this case.
  • Urban heat island.- The urban heat island effect should not be relevant to sea-surface temperatures.
  • Error and bias.- Errors would tend to hinder the analysis rather than help, just like we see with the data prior to 1890. Any systematic bias should be taken care of by the detrending step.
  • Conspiracy.- It would be preposterous to suppose that someone doctored the data sets (in just the right manner) anticipating this type of analysis several years in advance.
  • Assorted Non-sequiturs.- Arguments such as "Al Gore sucks" can be dismissed off-hand.

Wednesday, December 2, 2009

Skeptic's Circle #125

The latest Skeptic's Circle (#125) is up at Effort Sisyfus, which is run by TechSkeptic. I have a couple of posts in the circle this time around. Hint: Take the blue pill :)

Monday, November 30, 2009

DIY - Very Simple "Hockey Stick"

A while back, when I analyzed raw data from Mann & Jones (2003), I came up with my own "hockey stick" graph. I didn't post it then, but I thought it might be interesting now, in light of the CRU incident.

AGW "skeptics" are pushing the idea that Phil Jones' "trick to hide the decline", and the VERY ARTIFICIAL correction I discussed in a prior post are essentially evidence of tampering with the "hockey stick" reconstruction.

In reality, the artificial correction refers to rudimentary, probably temporary code (apparently marked with all-caps comments to caution CRU researchers not to use it as final code) that corrects temperatures derived from tree-ring widths, due to a problem known as "tree-ring divergence." It's highly improbable the artificial correction was ever used in any published paper.

This "hockey stick" does not require you to write any algorithms. The only "tricks" involved in producing it are the following:

  1. Temperature data up to 1980 comes from Mann & Jones (2003) (data made available by NOAA.)
  2. Temperature from 1981 onwards comes from the CRUTEM3v global data set.
  3. The red line is a 25-year central moving average of the temperature series.

You can try this yourself with different data sets. It's not very difficult. If you don't trust CRU temperature data, use GISSTemp. If you don't think the Mann & Jones (2003) reconstruction should be used, there are plenty of other historical reconstructions that use methods other than tree-rings. Do report back if it doesn't work. Comment moderation is never enabled here.

[Errata 11/30/2009: The post initially said the red line was a 75-year central moving average. It's actually a 25-year CMA.]

Thursday, November 26, 2009

VERY ARTIFICIAL quote-mining

The CRU stolen emails incident is a big mess, isn't it? What I mean is that it could easily hinder real work that needs to get done, not just in climate science.

I thought I'd lend a hand as a computer scientist who is entirely independent of the politics of AGW. As you can see, I have taken an interest in the topic in the past, but most of the time I'm involved in science debates of a completely different nature.

I've read through some of the quote-mined emails, and I don't really see anything that looks like conspiracy talk of any sort, attempts to cover up data and such. In my view, it mostly amounts to innocuous chatter among scientists, discussion of statistical techniques, speculation, and some honest skepticism. For example, that 2008 was a "cold-ish" year is no secret, and I myself had said that if 2009 is also a cold year, it's possible IPCC predictions might be falsified. (Incidentally, it's not.)

When it comes to accusations of "data cooking," there is one post that caught my attention: Hiding the Decline: Part 1 – The Adventure Begins by Eric S. Raymond. Admittedly, it initially caught my attention because I've thought highly of Mr. Raymond ever since I read The Cathedral and the Bazaar, about 10 years ago. Mr. Raymond opens the post with a snippet of IDL code:

; Apply a VERY ARTIFICAL correction for decline!!
valadj=[0.,0.,0.,0.,0.,-0.1,-0.25,-0.3,0.,- 0.1,0.3,0.8,1.2,1.7,2.5,2.6,2.6,$
2.6,2.6,2.6]*0.75 ; fudge factor
if n_elements(yrloc) ne n_elements(valadj) then message,’Oooops!’

This certainly looked dodgy to me at first glance. Basically, it looks like a correction that arbitrarily reduces temperatures in the 1930s, and increases them starting in the 1970s. Mr. Raymond posts a graph of the valadj array (which he calls "coefficients") and proclaims that "this isn’t just a smoking gun, it’s a siege cannon with the barrel still hot."

I decided to take a closer look. I found a copy of the file, still available at RapidShare. I'm not providing a link, because I'm not completely sure that's legal.

I will post the entire code from the file in question below. I'll highlight the snippet quoted by Mr. Raymond in yellow, and I'll highlight in green other parts of the file that I'd like to discuss.

; Now prepare for plotting

if ! eq 'X' then begin
endif else begin
/ystyle,yrange=[-3,3],ytitle='Normalised anomalies',$
; title='Northern Hemisphere temperatures, MXD and corrected MXD'
title='Northern Hemisphere temperatures and MXD reconstruction'
; Apply a VERY ARTIFICAL correction for decline!!
2.6,2.6,2.6]*0.75 ; fudge factor
if n_elements(yrloc) ne n_elements(valadj) then message,'Oooops!'


;legend,['Northern Hemisphere April-September instrumental temperature',$
; 'Northern Hemisphere MXD',$
; 'Northern Hemisphere MXD corrected for decline'],$
; colors=[22,21,20],thick=[3,3,3],margin=0.6,spacing=1.5
legend,['Northern Hemisphere April-September instrumental temperature',$
'Northern Hemisphere MXD'],$

Let's talk about the most important finding first. Notice the second section highlighted in green, right below the snippet quoted by Mr. Raymond. There are 4 statements there. The first two start with ";" which means they are commented out. Why is that important? The adjusted yearly data is assigned to variable yearlyadj. The only reference in the file to variable yearlyadj is in the first line that is commented out, where it says yyy+yearlyadj. Notice that the corresponding line that is not commented out only uses yyy in place of that. In other words, as this code stands, the adjusted yearly data is not used at all.

You might ask, why is this "VERY ARTIFICIAL" correction there at all then? I can only guess and speculate. When you're writing software, and you find bugs, a non-brute-force way to debug code is to propose hypotheses as to what is causing the bugs. Then you test these hypotheses. One way to test hypotheses might involve fudging code; trying out different ideas. When I do this I might add strings like "%%%%" to the code, so I know I need to remove that code later. I suppose adding something like "VERY ARTIFICIAL" would work as well.

My guess is that at some point the scientists wanted to see what the plot would look like with this correction, but this correction was obviously not part of the final version of the code.

Note also that this is code for plotting data. It's not code for producing a data set. Claims to the effect that the "data was cooked," seem spurious and overly dramatic. At worst, a graph might not have exactly reflected the raw data, but I wouldn't worry about raw data sets being compromised by the code above. (When it comes to the "hockey stick," what matters is what the raw data tells us.)

To borrow the words of blogger Skeptico regarding a similar incident 4 years ago, I think AGW deniers need to grow up a little.

Thursday, April 30, 2009

Early Swine Flu Trend in the US

I searched Google News for the number of confirmed US cases of Swine flu reported by the Associated Press each day from April 23 to April 28. For the April 29 data point, I used the count currently posted at the CDC website. The chart I came up with follows.

The trend could be exponential, which would be the mathematical expectation. An exponential fit gives a R2 (goodness of fit) of 0.94 (very good fit). If the exponential trend were to be maintained, by May 7 there should be over 4,000 confirmed cases of Swine flu in the US. If this prediction fails, it could be an indication that containment measures are having an effect. We'll see.

Update 5/7/2009

I've continued to follow the count of confirmed cases in the US. While it didn't go into the thousands by May 7, an exponential model continues to fit the series quite well.

While cases have reportedly plateaued in Mexico, the same is not true of the US just yet.