Monday, November 30, 2009

DIY - Very Simple "Hockey Stick"

A while back, when I analyzed raw data from Mann & Jones (2003), I came up with my own "hockey stick" graph. I didn't post it then, but I thought it might be interesting now, in light of the CRU incident.

AGW "skeptics" are pushing the idea that Phil Jones' "trick to hide the decline", and the VERY ARTIFICIAL correction I discussed in a prior post are essentially evidence of tampering with the "hockey stick" reconstruction.

In reality, the artificial correction refers to rudimentary, probably temporary code (apparently marked with all-caps comments to caution CRU researchers not to use it as final code) that corrects temperatures derived from tree-ring widths, due to a problem known as "tree-ring divergence." It's highly improbable the artificial correction was ever used in any published paper.

hockey stick temperature reconstruction

This "hockey stick" does not require you to write any algorithms. The only "tricks" involved in producing it are the following:

  1. Temperature data up to 1980 comes from Mann & Jones (2003) (data made available by NOAA.)
  2. Temperature from 1981 onwards comes from the CRUTEM3v global data set.
  3. The red line is a 25-year central moving average of the temperature series.


You can try this yourself with different data sets. It's not very difficult. If you don't trust CRU temperature data, use GISSTemp. If you don't think the Mann & Jones (2003) reconstruction should be used, there are plenty of other historical reconstructions that use methods other than tree-rings. Do report back if it doesn't work. Comment moderation is never enabled here.

[Errata 11/30/2009: The post initially said the red line was a 75-year central moving average. It's actually a 25-year CMA.]

Thursday, November 26, 2009

VERY ARTIFICIAL quote-mining

The CRU stolen emails incident is a big mess, isn't it? What I mean is that it could easily hinder real work that needs to get done, not just in climate science.

I thought I'd lend a hand as a computer scientist who is entirely independent of the politics of AGW. As you can see, I have taken an interest in the topic in the past, but most of the time I'm involved in science debates of a completely different nature.

I've read through some of the quote-mined emails, and I don't really see anything that looks like conspiracy talk of any sort, attempts to cover up data and such. In my view, it mostly amounts to innocuous chatter among scientists, discussion of statistical techniques, speculation, and some honest skepticism. For example, that 2008 was a "cold-ish" year is no secret, and I myself had said that if 2009 is also a cold year, it's possible IPCC predictions might be falsified. (Incidentally, it's not.)

When it comes to accusations of "data cooking," there is one post that caught my attention: Hiding the Decline: Part 1 – The Adventure Begins by Eric S. Raymond. Admittedly, it initially caught my attention because I've thought highly of Mr. Raymond ever since I read The Cathedral and the Bazaar, about 10 years ago. Mr. Raymond opens the post with a snippet of IDL code:

;
; Apply a VERY ARTIFICAL correction for decline!!
;
yrloc=[1400,findgen(19)*5.+1904]
valadj=[0.,0.,0.,0.,0.,-0.1,-0.25,-0.3,0.,- 0.1,0.3,0.8,1.2,1.7,2.5,2.6,2.6,$
2.6,2.6,2.6]*0.75 ; fudge factor
if n_elements(yrloc) ne n_elements(valadj) then message,’Oooops!’
;
yearlyadj=interpol(valadj,yrloc,timey)

This certainly looked dodgy to me at first glance. Basically, it looks like a correction that arbitrarily reduces temperatures in the 1930s, and increases them starting in the 1970s. Mr. Raymond posts a graph of the valadj array (which he calls "coefficients") and proclaims that "this isn’t just a smoking gun, it’s a siege cannon with the barrel still hot."

I decided to take a closer look. I found a copy of the FOI2009.zip file, still available at RapidShare. I'm not providing a link, because I'm not completely sure that's legal.

I will post the entire code from the file in question below. I'll highlight the snippet quoted by Mr. Raymond in yellow, and I'll highlight in green other parts of the file that I'd like to discuss.

;
; Now prepare for plotting
;

loadct,39
multi_plot,nrow=3,layout='caption'
if !d.name eq 'X' then begin
window,ysize=800
!p.font=-1
endif else begin
!p.font=0
device,/helvetica,/bold,font_size=18
endelse
def_1color,20,color='red'
def_1color,21,color='blue'
def_1color,22,color='black'
;
restore,'compbest_fixed1950.idlsave'
;
plot,timey,comptemp(*,3),/nodata,$
/xstyle,xrange=[1881,1994],xtitle='Year',$
/ystyle,yrange=[-3,3],ytitle='Normalised anomalies',$
; title='Northern Hemisphere temperatures, MXD and corrected MXD'
title='Northern Hemisphere temperatures and MXD reconstruction'
;
yyy=reform(comptemp(*,2))
;mknormal,yyy,timey,refperiod=[1881,1940]
filter_cru,5.,/nan,tsin=yyy,tslow=tslow
oplot,timey,tslow,thick=5,color=22
yyy=reform(compmxd(*,2,1))
;mknormal,yyy,timey,refperiod=[1881,1940]
;
; Apply a VERY ARTIFICAL correction for decline!!
;
yrloc=[1400,findgen(19)*5.+1904]
valadj=[0.,0.,0.,0.,0.,-0.1,-0.25,-0.3,0.,-0.1,0.3,0.8,1.2,1.7,2.5,2.6,2.6,$
2.6,2.6,2.6]*0.75 ; fudge factor
if n_elements(yrloc) ne n_elements(valadj) then message,'Oooops!'
;
yearlyadj=interpol(valadj,yrloc,timey)

;
;filter_cru,5.,/nan,tsin=yyy+yearlyadj,tslow=tslow
;oplot,timey,tslow,thick=5,color=20
;
filter_cru,5.,/nan,tsin=yyy,tslow=tslow
oplot,timey,tslow,thick=5,color=21
;

oplot,!x.crange,[0.,0.],linestyle=1
;
plot,[0,1],/nodata,xstyle=4,ystyle=4
;legend,['Northern Hemisphere April-September instrumental temperature',$
; 'Northern Hemisphere MXD',$
; 'Northern Hemisphere MXD corrected for decline'],$
; colors=[22,21,20],thick=[3,3,3],margin=0.6,spacing=1.5
legend,['Northern Hemisphere April-September instrumental temperature',$
'Northern Hemisphere MXD'],$
colors=[22,21],thick=[3,3],margin=0.6,spacing=1.5
;
end

Let's talk about the most important finding first. Notice the second section highlighted in green, right below the snippet quoted by Mr. Raymond. There are 4 statements there. The first two start with ";" which means they are commented out. Why is that important? The adjusted yearly data is assigned to variable yearlyadj. The only reference in the file to variable yearlyadj is in the first line that is commented out, where it says yyy+yearlyadj. Notice that the corresponding line that is not commented out only uses yyy in place of that. In other words, as this code stands, the adjusted yearly data is not used at all.

You might ask, why is this "VERY ARTIFICIAL" correction there at all then? I can only guess and speculate. When you're writing software, and you find bugs, a non-brute-force way to debug code is to propose hypotheses as to what is causing the bugs. Then you test these hypotheses. One way to test hypotheses might involve fudging code; trying out different ideas. When I do this I might add strings like "%%%%" to the code, so I know I need to remove that code later. I suppose adding something like "VERY ARTIFICIAL" would work as well.

My guess is that at some point the scientists wanted to see what the plot would look like with this correction, but this correction was obviously not part of the final version of the code.

Note also that this is code for plotting data. It's not code for producing a data set. Claims to the effect that the "data was cooked," seem spurious and overly dramatic. At worst, a graph might not have exactly reflected the raw data, but I wouldn't worry about raw data sets being compromised by the code above. (When it comes to the "hockey stick," what matters is what the raw data tells us.)

To borrow the words of blogger Skeptico regarding a similar incident 4 years ago, I think AGW deniers need to grow up a little.

Thursday, April 30, 2009

Early Swine Flu Trend in the US

I searched Google News for the number of confirmed US cases of Swine flu reported by the Associated Press each day from April 23 to April 28. For the April 29 data point, I used the count currently posted at the CDC website. The chart I came up with follows.
swine flu trend series
The trend could be exponential, which would be the mathematical expectation. An exponential fit gives a R2 (goodness of fit) of 0.94 (very good fit). If the exponential trend were to be maintained, by May 7 there should be over 4,000 confirmed cases of Swine flu in the US. If this prediction fails, it could be an indication that containment measures are having an effect. We'll see.

Update 5/7/2009

I've continued to follow the count of confirmed cases in the US. While it didn't go into the thousands by May 7, an exponential model continues to fit the series quite well.



While cases have reportedly plateaued in Mexico, the same is not true of the US just yet.

Friday, August 15, 2008

Graph of NH SSTs and Named Storms Questioned

I have written about the association between the number of named storms in the Atlantic basin and Northern Hemisphere sea surface temperature anomalies several times now (last time here). I am quite confident there's a causal association there (even considering the possibility of coincidental trends).

The problem is that my posts on the subject have begged disbelief. You see, the scientific literature is not clear on the matter, and not even top climate scientists seem to agree on whether the association exists. That's why I'm making this spreadsheet available.

In particular, there is a graph that is very difficult to deny. Sometimes you can express doubt about mathematical analyses on technical grounds, but clear and easily reproducible graphs are difficult to argue with. The graph in question is that of 17-year central moving averages of northern hemisphere sea surface temperature anomalies, and the number of named storms in the Atlantic basin, from the 1850s to the present time.

In the new spreadsheet I'm making available, I calculated both 15-year and 21-year moving averages of both data sets. You will find comments in column headers with the URLs of where the raw data comes from. Having to do this seems over the top, but there really are people who apparently don't believe the original graph is real; plus they seem to be misunderstanding the graph completely, as you can see in the comments section of this post at AccuWeather.com.

The 15-year and 21-year CMA graphs are posted below, in that order.





Comment Policy

I will state my comment policy here, for future reference. I do not enable comment moderation. The only comments I delete are those that are clearly in violation of Blogger's Content Policy. Scrutiny is more than welcome. If you believe I made a mistake, tell me. If you believe I'm making things up, you absolutely should tell me, but you better be right.

Tuesday, August 12, 2008

NOAA Study Seems To Confirm Observation From 07/14 Post

No so long ago I wrote a follow-up to an earlier analysis on the association between the number of named storms in the Atlantic basin and northern hemisphere sea surface temperatures. At the end of the post I listed a number of conclusions, one of which was the following.

The graph provides support for the contention that old storm records are unreliable. I would not recommend using storm counts prior to 1890.


I had posted a graph of 17-year central moving averages of NH sea surface temperature and named storm series, reproduced below. You will note I had placed a vertical line around the year 1890 in order to indicate there was some sort of point of change there.



I didn't use any mathematical analysis to determine that 1890 was in any way special. It was simply obvious, visually, that something was not right in the named storms series prior to 1890. Of course, the central moving average smoothing helped in terms of being able to see that.

Enter Vecchi & Knutson (2008), a NOAA study of North Atlantic historical cyclone activity. The authors determined, based on known ship tracks, that early ships missed many storms, especially in the 19th century.

Now, this study is being touted as evidence that global warming and the number of storms in the Atlantic are not associated. Clearly, that is nonsense, if you just look at the figure above. If you'd like to see some Math, I have done a detrended cross-correlation analysis as well. All that is necessary to demonstrate an association is to do a linear detrending on series that go from 1900 to the present time. The detrending should take care of any problems related to unreliability of old storm counts. I can further report that even after detrending the series based on 6th-order polynomial fits, a statistically significant association is still there, provided storms are presumed to lag temperatures by at least one year.

About The Disingenuous "Global Warming Challenge" by JunkScience.com

I read somewhere that JunkScience.com had issued a "global warming challenge" some time back that is promoted as follows.

$500,000 will be awarded to the first person to prove, in a scientific manner, that humans are causing harmful global warming.


That's also what people will say whenever they tout the "challenge." If you are certain anthropogenic global warming is real, you should be able to prove it. Who wouldn't want to make $500,000?

But as you can imagine, there's a catch. You need to falsify two hypotheses.


UGWC Hypothesis 1

Manmade emissions of greenhouse gases do not discernibly, significantly and predictably cause increases in global surface and tropospheric temperatures along with associated stratospheric cooling.

UGWC Hypothesis 2

The benefits equal or exceed the costs of any increases in global temperature caused by manmade greenhouse gas emissions between the present time and the year 2100, when all global social, economic and environmental effects are considered.


Now, hypothesis #1 should be falsifiable now. The only issue I have with it is that they have made it unnecessarily difficult (to cover their asses no doubt) by including stratospheric cooling as a requirement. Don't get me wrong. I'm sure stratospheric cooling is an important matter to climate scientists, but why does it matter to the challenge? Isn't surface temperature warming due to anthropogenic causes interesting enough?

Technically, the issue is that there's not a lot of data on stratospheric temperatures, as far as I know. Considering lags and so forth, it's probably difficult to demonstrate an association in a decisive way. I haven't run the numbers, but this is my preliminary guess.

Hypothesis #2 is not falsifiable right now. We'd have to wait until about 2100 to either validate it or falsify it. Peak oil is probably looming or behind us, so we can't say what might happen by 2100. There are policy decisions to consider. There might be technological advances that change the general outlook. If we make certain assumptions, then sure, it's theoretically possible to give confidence ranges on certain predictions, such as sea level rises or changes in storm intensity.

Clearly, the "challenge" is designed such that it's impossible or nearly impossible to win. Despite its name, JunkScience.com is not a site about junk science. If you visit it you will see it's nothing but a propaganda outlet for global warming denialism books and videos. A site that is truly about junk science would probably discuss things like the paranormal, Homeopathy, the vaccine-autism hypothesis, etc. JunkScience.com does not.

In fact, what is the evidence that JunkScience.com has $500,000 to give out? Have they been collecting pledges? If they have collected funds, and there's no winner to their challenge, which I can almost certainly assure you there won't be, will they keep the money?

Call me cynical, but I doubt JunkScience.com is either capable or willing to give out $500,000 to anybody, regardless of the entries they receive.

Counter-Challenge

Here's a counter-challenge for JunkScience.com. Reduce the stakes if you need to. Then change the requirements of the challenge to include a single hypothesis to falsify, as follows.

Manmade emissions of greenhouse gases do not discernibly, significantly and predictably cause increases in global temperatures.


What's there to fear, JunkScience.com?

Friday, August 8, 2008

Just in case there are any doubts about anthropogenic influence in atmospheric CO2

You would think this is the least controversial aspect of the global warming debate, but you'd be surprised. I realized this after reading some of the comments in a post by Anthony Watts about a recent correction in the way Mauna Loa data is calculated (see also reactions by Tamino and Lucia).

Tamino subsequently wrote an interesting post on differences in CO2 trends as observed in three different sites: Mauna Loa (Hawaii), Barrow (Alaska) and South Pole station. Most notably, there's a pronounced difference in the annual cycle between these stations, which according to Tamino, is explained by there being more land mass in the Northern Hemisphere. I would imagine higher CO2 emissions in the Northern Hemisphere might also play a role, but I'm speculating.

In this post I want to show that available data is quite clear about anthropogenic influence in atmospheric CO2. Additionally, I want to discuss how we can tell that excess CO2 stays in the atmosphere for a long time.

I will use about 170 years of data for this. There's a reconstruction of CO2 concentrations from 1832 to 1978 made available by CDIAC, and derived by Etheridge et al. (1998) from the Law Dome DE08, DE08-2, and DSS ice cores. You will note that there's an excellent match between these data and Mauna Loa data for the period 1958 to 1978. Mauna Loa data has an offset of 0.996 ppmv relative to Etheridge et al. (1998), so I applied this simple adjustment to it in order to end up with a dataset that goes from 1832 to 2004.

CDIAC also provides data on global CO2 emissions. What we need, however, is an estimate of excess anthropogenic CO2 that would be expected to remain in the atmosphere at any given point in time. We could simply calculate cumulative emissions since 1751 for any given year, but this is not necessarily accurate. Some excess CO2 is probably reclaimed by the planet every year. What I will do is make an assumption about the atmospheric half-life of CO2 in order to obtain a dataset of presumed excess CO2. I will use a half-life of 24.4 years (i.e. 0.972 of excess CO2 remains after 1 year). I should note that I have tried this same analysis with half-lifes of 50, 70 and 'infinite' years, and the general results are the same.

Figure 1 shows the time series of the two data sets.

co2 concentration and emissions

The trends are clear enough. CO2 emissions appear to accumulate in the atmosphere and are then observed in ice cores (and at various other sites like Mauna Loa). Every time we compare time series, though, there's a possibility that we're looking at coincidental trends. A technique that can be used to control for potentially coincidental trends is called detrended cross-correlation analysis (Podobnik & Stanley, 2007). In our case, the detrended cross-correlation is obvious enough graphically, and we'll leave it at that. See Figure 2. Basically, we take the time series and remove their trends, which are given by third-order polynomial fits. You can do the same thing with linear fits or second-order first. The third-order fit is a better fit and produces more fluctuations around the trend, which makes the correlation more obvious and less likely to be explained by coincidence.

detrended residuals co2 concentration emissions

With that out of the way, how do we know that excess CO2 stays in the atmosphere for a long time? First, let's check what the scientific literature says on the subject, specifically, Moore & Braswell (1994):

If one assumes a terrestrial biosphere with a fertilization flux, then our best estimate is that the single half-life for excess CO2 lies within the range of 19 to 49 years, with a reasonable average being 31 years. If we assume only regrowth, then the average value for the single half-life for excess CO2 increases to 72 years, and if we remove the terrestrial component completely, then it increases further to 92 years.


In general, it is widely accepted that the atmospheric half-life of CO2 is measured in decades, not years.

One type of analysis that I have attempted is to select the half-life hypothesis that maximizes the Pearson's correlation coefficient of the series from Figure 1. If I do this, I find that the best half-life is about 24.4 years. Nevertheless, I had attempted the same exercise with the Mauna Loa series (1958-2004) previously, and the best half-life then seems to be about 70 years. It varies depending on the time frame, and there's not necessarily a trend in the half life. This just comes to show that there's uncertainty in the calculation, and that the half-life model is a simplification of the real world.

Another approach we can take is to try to estimate the weight of excess CO2 currently in the atmosphere, and see how this compares to data on emissions. The current excess of atmospheric CO2 is agreed to be roughly 100 ppmv. If by 'atmosphere' we mean 20 Km above ground (this is fairly arbitrary) then the volume of the atmosphere is about 1.03x1010 Km3. This would mean that the total volume of excess CO2 is 1.03x106 Km3, or 1.03x1015 m3. The density of CO2 is 1.98 kg/m3, so the total weight of excess CO2 should be about 2.03x1015 Kg, or 2,030,000 millions of metric tons.

Something is not right, though. If we add all annual CO2 emissions from 1751 to 2004, we come up with 334,000 millions of metric tons total. This can't be. I'd suggest that CDIAC data does not count all sources of anthropogenic emissions of CO2. It obviously can't be considering feedbacks either. Furthermore, our assumptions in the calculations above might not be accurate (specifically that a 100 ppmv excess is maintained up to an altitude of 20Km). In any case, it's hard to see how these numbers would support the notion that the half-life of CO2 is low.