Monday, November 30, 2009

DIY - Very Simple "Hockey Stick"

A while back, when I analyzed raw data from Mann & Jones (2003), I came up with my own "hockey stick" graph. I didn't post it then, but I thought it might be interesting now, in light of the CRU incident.

AGW "skeptics" are pushing the idea that Phil Jones' "trick to hide the decline", and the VERY ARTIFICIAL correction I discussed in a prior post are essentially evidence of tampering with the "hockey stick" reconstruction.

In reality, the artificial correction refers to rudimentary, probably temporary code (apparently marked with all-caps comments to caution CRU researchers not to use it as final code) that corrects temperatures derived from tree-ring widths, due to a problem known as "tree-ring divergence." It's highly improbable the artificial correction was ever used in any published paper.

hockey stick temperature reconstruction

This "hockey stick" does not require you to write any algorithms. The only "tricks" involved in producing it are the following:

  1. Temperature data up to 1980 comes from Mann & Jones (2003) (data made available by NOAA.)
  2. Temperature from 1981 onwards comes from the CRUTEM3v global data set.
  3. The red line is a 25-year central moving average of the temperature series.


You can try this yourself with different data sets. It's not very difficult. If you don't trust CRU temperature data, use GISSTemp. If you don't think the Mann & Jones (2003) reconstruction should be used, there are plenty of other historical reconstructions that use methods other than tree-rings. Do report back if it doesn't work. Comment moderation is never enabled here.

[Errata 11/30/2009: The post initially said the red line was a 75-year central moving average. It's actually a 25-year CMA.]

Thursday, November 26, 2009

VERY ARTIFICIAL quote-mining

The CRU stolen emails incident is a big mess, isn't it? What I mean is that it could easily hinder real work that needs to get done, not just in climate science.

I thought I'd lend a hand as a computer scientist who is entirely independent of the politics of AGW. As you can see, I have taken an interest in the topic in the past, but most of the time I'm involved in science debates of a completely different nature.

I've read through some of the quote-mined emails, and I don't really see anything that looks like conspiracy talk of any sort, attempts to cover up data and such. In my view, it mostly amounts to innocuous chatter among scientists, discussion of statistical techniques, speculation, and some honest skepticism. For example, that 2008 was a "cold-ish" year is no secret, and I myself had said that if 2009 is also a cold year, it's possible IPCC predictions might be falsified. (Incidentally, it's not.)

When it comes to accusations of "data cooking," there is one post that caught my attention: Hiding the Decline: Part 1 – The Adventure Begins by Eric S. Raymond. Admittedly, it initially caught my attention because I've thought highly of Mr. Raymond ever since I read The Cathedral and the Bazaar, about 10 years ago. Mr. Raymond opens the post with a snippet of IDL code:

;
; Apply a VERY ARTIFICAL correction for decline!!
;
yrloc=[1400,findgen(19)*5.+1904]
valadj=[0.,0.,0.,0.,0.,-0.1,-0.25,-0.3,0.,- 0.1,0.3,0.8,1.2,1.7,2.5,2.6,2.6,$
2.6,2.6,2.6]*0.75 ; fudge factor
if n_elements(yrloc) ne n_elements(valadj) then message,’Oooops!’
;
yearlyadj=interpol(valadj,yrloc,timey)

This certainly looked dodgy to me at first glance. Basically, it looks like a correction that arbitrarily reduces temperatures in the 1930s, and increases them starting in the 1970s. Mr. Raymond posts a graph of the valadj array (which he calls "coefficients") and proclaims that "this isn’t just a smoking gun, it’s a siege cannon with the barrel still hot."

I decided to take a closer look. I found a copy of the FOI2009.zip file, still available at RapidShare. I'm not providing a link, because I'm not completely sure that's legal.

I will post the entire code from the file in question below. I'll highlight the snippet quoted by Mr. Raymond in yellow, and I'll highlight in green other parts of the file that I'd like to discuss.

;
; Now prepare for plotting
;

loadct,39
multi_plot,nrow=3,layout='caption'
if !d.name eq 'X' then begin
window,ysize=800
!p.font=-1
endif else begin
!p.font=0
device,/helvetica,/bold,font_size=18
endelse
def_1color,20,color='red'
def_1color,21,color='blue'
def_1color,22,color='black'
;
restore,'compbest_fixed1950.idlsave'
;
plot,timey,comptemp(*,3),/nodata,$
/xstyle,xrange=[1881,1994],xtitle='Year',$
/ystyle,yrange=[-3,3],ytitle='Normalised anomalies',$
; title='Northern Hemisphere temperatures, MXD and corrected MXD'
title='Northern Hemisphere temperatures and MXD reconstruction'
;
yyy=reform(comptemp(*,2))
;mknormal,yyy,timey,refperiod=[1881,1940]
filter_cru,5.,/nan,tsin=yyy,tslow=tslow
oplot,timey,tslow,thick=5,color=22
yyy=reform(compmxd(*,2,1))
;mknormal,yyy,timey,refperiod=[1881,1940]
;
; Apply a VERY ARTIFICAL correction for decline!!
;
yrloc=[1400,findgen(19)*5.+1904]
valadj=[0.,0.,0.,0.,0.,-0.1,-0.25,-0.3,0.,-0.1,0.3,0.8,1.2,1.7,2.5,2.6,2.6,$
2.6,2.6,2.6]*0.75 ; fudge factor
if n_elements(yrloc) ne n_elements(valadj) then message,'Oooops!'
;
yearlyadj=interpol(valadj,yrloc,timey)

;
;filter_cru,5.,/nan,tsin=yyy+yearlyadj,tslow=tslow
;oplot,timey,tslow,thick=5,color=20
;
filter_cru,5.,/nan,tsin=yyy,tslow=tslow
oplot,timey,tslow,thick=5,color=21
;

oplot,!x.crange,[0.,0.],linestyle=1
;
plot,[0,1],/nodata,xstyle=4,ystyle=4
;legend,['Northern Hemisphere April-September instrumental temperature',$
; 'Northern Hemisphere MXD',$
; 'Northern Hemisphere MXD corrected for decline'],$
; colors=[22,21,20],thick=[3,3,3],margin=0.6,spacing=1.5
legend,['Northern Hemisphere April-September instrumental temperature',$
'Northern Hemisphere MXD'],$
colors=[22,21],thick=[3,3],margin=0.6,spacing=1.5
;
end

Let's talk about the most important finding first. Notice the second section highlighted in green, right below the snippet quoted by Mr. Raymond. There are 4 statements there. The first two start with ";" which means they are commented out. Why is that important? The adjusted yearly data is assigned to variable yearlyadj. The only reference in the file to variable yearlyadj is in the first line that is commented out, where it says yyy+yearlyadj. Notice that the corresponding line that is not commented out only uses yyy in place of that. In other words, as this code stands, the adjusted yearly data is not used at all.

You might ask, why is this "VERY ARTIFICIAL" correction there at all then? I can only guess and speculate. When you're writing software, and you find bugs, a non-brute-force way to debug code is to propose hypotheses as to what is causing the bugs. Then you test these hypotheses. One way to test hypotheses might involve fudging code; trying out different ideas. When I do this I might add strings like "%%%%" to the code, so I know I need to remove that code later. I suppose adding something like "VERY ARTIFICIAL" would work as well.

My guess is that at some point the scientists wanted to see what the plot would look like with this correction, but this correction was obviously not part of the final version of the code.

Note also that this is code for plotting data. It's not code for producing a data set. Claims to the effect that the "data was cooked," seem spurious and overly dramatic. At worst, a graph might not have exactly reflected the raw data, but I wouldn't worry about raw data sets being compromised by the code above. (When it comes to the "hockey stick," what matters is what the raw data tells us.)

To borrow the words of blogger Skeptico regarding a similar incident 4 years ago, I think AGW deniers need to grow up a little.