Saturday, June 26, 2010

Sprawling Cities Getting Hotter Faster

An interesting new paper has been published (ahead of print) in Environmental Health Perspectives: Stone et al. (2010). It's being widely reported in the media (e.g Yahoo! News.) It finds that:
Our results find the rate of increase in the annual number of extreme heat events between 1956 and 2005 in the most sprawling metropolitan regions to be more than double the rate of increase observed in the most compact metropolitan regions.
The primary author, Brian Stone, is also quoted by OurAmazingPlanet as follows:
"These findings show that the pace of climate change is greater in sprawling cities than in others, which has not been shown before."
The study uses "urban sprawl" as the independent variable, which is more sophisticated than simply using a city's population size. That said, these findings essentially confirm what I stated in my post titled Urban Heat Island Effect - A Model. Briefly, I had determined that the UHI effect on the 130-year temperature trend depends logarithmically on the size of a station's associated town, but only if the town's population is greater than about a million people.

Wednesday, April 7, 2010

Intensity or Frequency?

I have previously argued that – in my estimation – there's a strong causal association between sea-surface temperatures and the number of named storms (or tropical cyclones) in the Atlantic Basin. Statistically, the association is quite significant, and graphically, it is evident once you apply very simple smoothing filters.

This is not the prevailing scientific view, which essentially says that the intensity of storms should increase with global warming, and the frequency of storms should actually decline. This prevailing view, largely based on computer modeling and not observations, is best summarized by the IPCC in 4AR WGI

I was wondering if both could be true: global warming increases the frequency and intensity of tropical cyclones. How can we test this idea using available observations? You can't just look at, say, Accumulated Cyclone Energy (ACE.) If the frequency of storms increases, ACE should also increase, even if the average intensity of each storm doesn't change.

It occurred to me that a much better test would be to look at the ratio of hurricanes to all named storms, and the ratio of major hurricanes to storms. I've done this, but I'll leave it as an exercise for the reader. It's a very easy analysis. You can use the named storm count data from the Hurricane Research Division of NOAA. If you have concerns that tropical storms were under-counted in the past relative to hurricanes (a reasonable assumption), you can use data starting in 1944, which is when systematic aircraft recognizance started. But remember, causality matters more than the trend in this case.

To make a long story short, observations do not appear to support the view that global warming will cause storm intensity to increase. The historical data is telling me the opposite of what the IPCC claims. What, if anything, am I missing? Could it be that things will work differently in the future?

Tuesday, April 6, 2010

Urban Heat Island Effect - A Model

In the addendum of my last post on the Urban Heat Island (UHI) Effect, I noted that GHCN v2 apparently does contain data that we can use to verify the existence of the effect, even though UHI doesn't seem to have a discernible impact on global temperature trends. This is interesting because it's at odds with some well known findings from the literature, such as Peterson et al. (2003), and it addresses a "mystery" of sorts about the instrumental temperature record.

I wrote some code in order to carry out a more thorough analysis of a possibly systematic effect in the raw data, hypothesizing the effect depends on the size of the station's associated town. Basically, I divided stations in population size groups, using 1.25-fold increments. That is, the first group consists of towns whose population is between 10,000 and 12,500. The second group has between 12,500 and 15,625 people, and so on. The last group consists of towns with populations between 15.8 million and 19.7 million. For each group, I got a global temperature series, in a way equivalent to how GHCN Processor would produce them. This is what I came up with:

This is a highly significant effect. It doesn't even make sense to post a confidence level, because it's exceedingly close to 100%.

It is obvious from the graph, nonetheless, that the number of cities declines rapidly with population size. It's a good idea in these cases to look at a logarithmic scale of the X axis.

This logarithmic relationship is clearly a good candidate for segmented regression. When the population is less than about 1.04 million, there is no discernible effect. A linear regression of the left-hand "segment" has a slight downward slope, which is not statistically significant. The average temperature slope between 1880 and 2009 is 0.0056°C/year (which is what the red line represents.)

We can thus derive a straightforward model for UHI, applicable to the GHCN v2 raw data file, which follows.

C = -0.0039·[ln(P + 1) - ln(1042)]

  • C is a correction (in °C/year) that should be added to the temperature slope of a station only if the population of its associated town is greater than 1.04 million.
  • P is the population of the town associated with the station, in thousands.

Monday, April 5, 2010

Urban Heat Island Effect - Probably Negligible

Previously I had discussed the difference between rural and urban temperature stations in the U.S. Commenter steven argued that population assessments (R, S and U) in GHCN v2 might be outdated and – in general – not very good proxies of what we really want to measure.

I then compared rural stations in the Mid-West (a low-population-density region of the U.S.) with all rural stations. There wasn't a major difference between these two sets of stations either. Commenter steven was not convinced, however. He posted some satellite pictures of rural stations that are located in what appear to be sub-urban areas.

How could we measure the impact of human populations on station temperature with the data available to us? It's clearly not enough to express doubt and speculate about what might be going on.

Here's what I came up with. There's a vegetation property in the station metadata. If you look at stations in regions that are forested (FO), marshes (MA) or deserts (DE), they appear to be actually rural. I looked at a subset of such stations in Google Maps, and they are not close to human settlements, with few exceptions. The GHCN Processor command I used to obtain a temperature series is the following.

ghcnp -dt mean -include "population_type=='R' && (vegetation=='FO' || vegetation=='MA' || vegetation=='DE')" -o /tmp/global-rural-plus.csv

575 stations fit these characteristics. For comparison, I got temperature series for big cities (population > 0.5 million), and small towns and cities (population <= 0.5 million.) I calculated 12-year moving averages in each case, which is what you see in the figure below.

There might be some differences, but they are always small, and we've compared several different stations sets now, globally and at the U.S. level.

An argument could also be made that small human settlements increase the albedo of an area, so they might have a cooling effect.

Addendum (4/5/2010)

Here's an actual UHI finding of interest. I compared cities of population over 2 million with towns whose population is between 10,000 and 15,000. The difference is more pronounced in this case.

The overall effect is still negligible, nevertheless. The number of cities decreases exponentially with population size.

Tuesday, March 30, 2010

US Rural vs. Urban Temperature Stations

There's an article by Edward R. Long titled "Contiguous U.S. Temperature Trends Using NCDC Raw and Adjusted Data For One-Per-State Rural and Urban Station Sets." It claims to show that in the raw/unadjusted NCDC data, urban U.S. stations have a warming trend that diverges from that of rural stations, whereas in the adjusted data, the rural trend has been adjusted to "take on the time-line characteristics of urban data." In not so many words, it claims that NCDC data has been fudged. Not surprisingly, Long's article appears to be quite popular in "sceptic" circles.

The methodology of the article is peculiar. First, why only analyze the U.S.? Even though the U.S. has more stations than any other single country, its surface area is only 2% that of Earth.

More importantly, why pick only one rural and one urban station from each state? What was the criteria used to pick each state's stations? Was it random? The article does not clarify, so it lends itself to accusations of cherry-picking.

It's fairly easy to verify Long's claims with GHCN Processor. A quick verification takes perhaps 10 minutes if you're familiar with the tool's options.

First, we can get rural and urban temperature anomaly series for the U.S. from the adjusted data, with the following commands, respectively:

ghcnp -include "country eq 'UNITED STATES OF AMERICA' && population_type eq 'R'" -reg -o /tmp/us-rural.csv

ghcnp -include "country eq 'UNITED STATES OF AMERICA' && population_type eq 'U'" -reg -o /tmp/us-urban.csv

GHCN Processor informs us that it processed 867 rural stations, and 117 urban stations. That's plenty more than the article analyzes. Let's take a look at a graph of both series.

This is still consistent with Long's claims. What we want to confirm is whether the rural trend is significantly less steep in the raw unadjusted data. To get rural and urban series from the unadjusted data file, I used the following commands, respectively:

ghcnp -dt mean -include "country eq 'UNITED STATES OF AMERICA' && population_type eq 'R'" -reg -o /tmp/us-rural-raw.csv

ghcnp -dt mean -include "country eq 'UNITED STATES OF AMERICA' && population_type eq 'U'" -reg -o /tmp/us-urban-raw.csv

In this case, the tool informs us that 1046 rural stations have been analyzed, compared to 392 urban stations. A graph of the series follows.

To be thorough, let's also get 12-year running averages of the unadjusted series. That's what the article does.

This graph is very different to Figure 6 from Long's article, and it doesn't support Long's conclusions by any stretch of the imagination. It's also clear that while Long's urban trend is roughly correct, the 48 rural stations he picked are not representative of the 1046 stations GHCN Processor retrieves out of the raw data file. Why they are not representative can only be speculated upon, but I have some ideas.

The divergence between urban and rural stations that exists prior to the reference period (1950 to 1981) might be something that needs to be looked into further, but it's not too surprising. The farther back you go, the fewer the number of stations that report. There's simply more error in older data.

Addendum (3/31/2010)

In comments, steven suggests that GHCN v2 population assessments are old and may no longer be applicable. For example, a station might be near a town that used to have less than 10,000 people, and classified as 'R', but then the town grew.

Intuitively, it doesn't seem likely that this would be sufficient to explain away the findings, and it certainly doesn't address Dr. Long's choice of only 48 rural stations in the U.S. But I try to keep an open mind about these types of arguments, within reason.

Fully addressing steven's objection would take substantial work, but we can do the next best thing. Steven indicates that population density is what really matters. Let's take a look at a population density map of the United States (from Wikimedia):

Here's another such map. A portion of the U.S. (basically, the mid-west) has considerably lower population density than the rest of the country. Let's define this low-density region as that bounded by longitudes -95° and -115°, which excludes California. A longitude map of the US can be found here.

A typical rural station in the low-density region should not be as likely to be near an urban area as a rural station in high-density areas of the U.S. Additionally, the population density around the station should be lower for rural stations in the low-density region, in average. Does that make sense?

With GHCN Processor we can easily obtain an unadjusted temperature anomaly series only for rural stations in the low-density region, as follows.

ghcnp -dt mean -include "country eq 'UNITED STATES OF AMERICA' && population_type eq 'R' && longitude < -95 && longitude > -115" -reg -o /tmp/us-really-rural-raw.csv

I've calculated 12-year running averages of the rural low-density series, and plotted it along with the full-rural series.

Things haven't really changed, have they?

There seems to be somewhat of an offset between both station sets, which is interesting to some extent. Apparently, "really rural" stations were warm relative to all rural stations during the baseline period (1950 to 1981.)

BTW, there are 432 "really rural" stations in the unadjusted data file.

Sunday, March 28, 2010

The Average Temperature of Earth

If you Google the average temperature of Earth, you'll find a couple of frequent estimates: 13°C and 15°C. GISTemp data files carry a note that says:
Best estimate for absolute global mean for 1951-1980 is 14.0 deg-C or 57.2 deg-F...

GHCN Processor 1.1 has a -abs option that causes the tool to write out "absolute measurement" averages, as opposed to temperature anomalies. Additionally, simulations I've run indicate that the tool's default cell and station combination method (the linear-equations-based method) is adequate for this sort of application.

You can get a global absolute measurement series with the following command.

ghchp -gt seg -abs -o /tmp/ghcn-global-abs.csv

In this case the tool says 4771 stations have been analyzed. GHCN v2 has considerably more stations, however. A lot of them are dropped in the adjusted data set because they don't have enough data points. To get a series based on the raw data, you can run:

ghcnp -dt mean -gt seg -abs -o /tmp/ghcn-global-abs-noadj.csv

The tool now analyzes 7067 out of 7280 stations. (GHCN Processor 1.1 has an implicit filter that drops any station without at least 10 data points in any given month.)

It probably shouldn't come as a surprise that there are differences in the results we obtain with each data set. With the adjusted data, the average for the baseline period 1950 to 1981 is 14.7°C. With the raw data, the average is 14.1°C. The reason for the discrepancy probably has to do with the sorts of stations that don't make it into the adjusted data set, typically because they haven't reported long enough. They might be cold stations, like stations near Antarctica, a region with a substantial scarcity of stations in the adjusted data set.

Let me post a graph of both series, just so there's no confusion.

So it's basically an offset difference between the two. If I'm correct about it being due to stations near Antarctica, one caveat would be elevation. It appears that stations in Antarctica are high up, and this is a problem. We could filter stations by elevation, but then we basically drop Antarctica, like the adjusted data set does.

I believe 14.1°C is probably a low estimate. It's also pretty clear that GHCN v2 is land-biased.

There does seem to be a slight slope difference (0.15°C/century) between the adjusted and raw series. This can't be anywhere near significance, and again, it probably has to do with the sorts of stations that don't make it into the adjusted data set. I wouldn't be surprised if "sceptics" manage to make a big deal of it, though.

GHCN Processor 1.1

Version 1.1 of GHCN Processor is now available for download. Relative to Version 1.0, the highlights are:

  • Implemented the first difference station combination method (Peterson et al. 1998.)
  • Added -abs option, which causes the tool to produce an "absolute measurement" series, as opposed to an "anomaly" series.
  • Added a -ccm option (similar to -scm) that sets the grid cell/box combination method.
  • The default station and grid cell combination method is the linear-equation-based method (olem.) This decision was based on simulations I ran comparing all 3 supported methods, involving series with equal and different underlying slopes.
  • Added -og option, which lets you write grid partitioning information to a CSV-format file. This feature is informative if you use the -gt seg option. (Zeke suggested producing a map. This is the next best thing.)

See the release-notes.txt and readme.txt files for additional details.

Monday, March 22, 2010

How GHCN Processor Combines Stations

To combine temperature series from stations that occupy the same grid cell/box, it's normally not a good idea to simply calculate an average of the measurements. Actually, this would work just fine if all stations had complete data for the entire period of interest (e.g. 1880 to 2009.) They typically do not, however. Moreover, the absolute temperature measurements at one station can be systematically higher or lower than those of another station, even if the stations are close to one another. Because of this, if you just average stations, you will probably end up with "step" changes that do not accurately reflect reality.

Tamino proposed an optimal solution to the problem not long ago. Then RomanM wrote about a "more optimal" method, which he later revised in his Plan C post.

The problem both Tamino and RomanM took on involves finding an offset for each station-month (μi for station i.) Once you find the right offsets, you can add each temperature series to their corresponding offsets in order to "normalize" the stations, so to speak. Then you can average out the "normalized" series.

I believe that's the right problem to solve. My approach to solving it was different, nonetheless, and rather simple, in my view. The approach is already implemented and shipped with GHCN Processor 1.0. You can browse the source code of the implementation in CVS. (You need to pass a -scm olem option to the tool in order to use the method, otherwise it just uses a simple "temperature anomaly" method.)

Let's take as a given that you can always find a difference or delta between two temperature stations. Between stations i and j, let's call the difference ΔTi,j.

Note that the difference between offsets μi and μj is equal to -ΔTi,j. That's obvious enough, isn't it? The solution is easier to explain with an example. Let's take a grid cell with only 3 stations. The following two equations hold for such a set of stations:

(1) μ0 - μ1 = -ΔT0,1
(2) μ0 - μ2 = -ΔT0,2

Additionally, the sum of all offsets should be zero. This way, if you have data from all stations in a given year, the sum of the "normalized" measurements will be the same as the sum of the original absolute measurements. Hence, we have a third equation:

(3) μ0 + μ1 + μ2 = 0

Equations 1, 2 and 3 are nothing more than a system of linear equations: 3 equations with 3 variables. It's actually simpler that a typical system of linear equations. Notice that if you simply add the equations, you can easily solve for μ0:

μ0 = -(ΔT0,1 + ΔT0,2) / 3

Once you have μ0, you can easily solve for μ1 (with equation 1) and μ2 (with equation 2).


You might have noticed that there's one more equation we could have used: one involving μ1 and μ2. In fact, a valid alternative system of 3 equations is the following:

(1) μ1 - μ2 = -ΔT1,2
(2) μ1 - μ0 = -ΔT1,0
(3) μ0 + μ1 + μ2 = 0

While both equation systems are valid, the first equation system was biased toward station 0, whereas the second equation system is biased toward station 1. If you only solve one system of n equations biased toward station i, the solution is similar to the reference station method (Hansen & Lebedeff 1987.) The main limitation of this method is that it's dependent on the reliability of a single station: the reference station.

The full algorithm solves n systems of n equations each, with station i (from 0 to n-1) being the reference station in each case. Then it averages out the offsets that result from solving each of the equation systems. A straight average would be completely unbiased. The implementation uses a weighted average, where the weight of a reference station is the number of years with available data. The solution is thus biased, but only to the extent that some stations have more complete data than other stations.


I thought that was a sound plan, but I wondered if it worked in practice. I wrote a simulation to find out.

The simulation consists of 15 stations with a common non-linear trend, and some common noise. On top of that, each station has noise and seasonality unique to it. Stations start out with complete data, and we calculate a "expected" temperature series with these data. Then we remove data points at random from each station – anywhere from 0% to 80% of them. We then apply a station combination algorithm and come up with a "estimated" temperature series. I have uploaded the source code of the simulation into CVS, where you will find all other details.

Let's start by evaluating how a simple "temperature anomaly" method does with this simulation. I'm interested in how well it predicts the absolute temperature measurements, not just the anomalies. It's not too bad. A linear regression of "expected" vs. "estimated" temperatures has a correlation coefficient R2 of 0.987, with a slope of 0.99. Ideally, the slope should be 1. There's also an intercept of -0.251. This should ideally be zero.

Now let's see how the linear equation-based method does. The correlation coefficient R2 is 0.997 in this case. The slope is basically 1, and the intercept is basically zero. It's nearly an ideal solution.

Saturday, March 20, 2010

GHCN Processor 1.0

As widely reported, Tamino debunked a key "sceptic" claim to the effect that the drop-out of stations in GHCN around 1990 introduced a false warming trend. Tamino's result has been replicated multiple times.

There's probably no point in producing yet another replication of Tamino's work. The theoretical problem itself did peak my interest in an academic sense, nevertheless. In particular, Tamino had mentioned in his preliminary results post that he was making all grid rows 10° tall, except for the northernmost row, which was 20° tall. However, he was counting its surface area as if it were 10° tall. Evidently, this has to do with the fact that there are relatively few temperature stations near the Arctic. That row is not equivalent, in a statistical sense, to other rows.

It occurred to me that an alternative way to partition a grid would be to create grid cells (or grid boxes, if you prefer) that have the same number of stations in them. I call them statistically equivalent grid cells. So I started to write some code.

I ended up writing not just a script, but a full-fledged command-line tool that can produce a temperature record for a pretty-much arbitrary set of stations. For example, it can produce a temperature series for urban areas of Brazil, or hilly regions of the Northern Hemisphere. All you do is pass a simple expression to the tool. Additionally, it is written in a modular manner, so it's relatively easy for others to contribute new grid partitioning methods, new adjustments, new station combination methods, station filters, and output formats. At least a couple alternatives are already supported in each case.

I have set up a SourceForge project page for this project, and other projects and code that I might release through my blogs in the future.


GHCN Processor is open source software. It is released under the terms of the Apache License 2.0.


You need to have a Java Runtime Environment 1.5+ installed and in your PATH. It's not uncommon for PCs to already have it. You can check with:

java -version


GHCN Processor can be downloaded from the Files Section of the project page.


You can unzip the ZIP file in a directory of your choice. Then you need to add the bin directory of the distribution to your PATH environment variable. In Windows, you need to right-click My Computer, then select Properties / Advanced / Environment Variables.


Documentation on tool options can be found in the readme.txt file shipped with the GHCN Processor distribution. Any other posts I write about GHCN Processor should be available via the ghcn processor tag.

Global Examples

A default global temperature record can be created with:

ghcnp -o /tmp/ghcn-global.csv

The first time you run the tool, it will download inventory and data files from the GHCN v2 FTP site. Note that if you happen to interrupt the download, you could end up with incomplete files. In this case, run the tool again with the -f option.

The output is in CSV format, readable by spreadsheet software. By default it will contain one year per row, with 14 columns each. You can also produce a "monthly format" file with one month per row, as follows.

ghcnp -of monthly -o /tmp/ghcn-global-monthly.csv

By default, the tool uses the adjusted data set from GHCN v2. To use the raw, unadjusted data, try:

ghcnp -dt mean -o /tmp/ghcn-global-raw.csv

To use what I call "statistically equivalent grid cells", try:

ghcnp -gt seg -o /tmp/ghcn-global-seg.csv

The default combination method for stations that are in the same grid cell is a simple "temperature anomaly" method, with a base period if 1950 to 1981 by default. I've implemented a experimental "optimal" method that you can use as follows.

ghcnp -scm olem -o /tmp/ghcn-global-olem.csv

I'll discuss this in more detail on another occasion.

Regional Examples

To produce a regional temperature record, you should always pass a -reg option. The grid partitioning algorithm works differently when it's not attempting to produce a global grid. The following command produces a temperature series for airports in the United Kingdom.

ghcnp -reg -include " country=='UNITED KINGDOM' && is_airport " -o /tmp/uk-airports.csv

This alternative syntax also works:

ghcnp -reg -include " country eq 'UNITED KINGDOM' && is_airport " -o /tmp/uk-airports.csv

The expression after the -include option should be enclosed in double-quotes, and its syntax is that of EL (basically the same as Java or C expression syntax.) String literals in the expression must be enclosed in single quotes. The names of countries must be written exactly as they appear in the GHCN v2 country codes file.

As another example, let's get a temperature series for stations in the southern hemisphere that are in forested areas.

ghcnp -reg -include "latitude < 0 && vegetation eq 'FO' " -o /tmp/sh-forested.csv

GHCN Processor in this case tells us that there are only 7 such stations with temperature data in the adjusted data set.

Tamino Reproduction

In addition to station properties you find in the inventory file of GHCN, there are a number of properties added by GHCN Processor: slope, variance, first_year, last_year, and mean_temperature. The properties first_year and last_year tell you the years when the station started and finished reporting, respectively.

To reproduce Tamino's work, we can simply run the following commands:

ghcnp -include "last_year <= 1991" -o /tmp/wattergate-dropped.csv

ghcnp -include "last_year > 1991" -o /tmp/wattergate-kept.csv

GHCN Processor tells us that 2899 stations reported only until 1991 or earlier, whereas 1872 stations were reporting after 1991. That's in the adjusted data set. Notice that they are completely disjoint sets of stations. I can assure you; any half-decent grid partitioning method and station combination method will produce essentially the same results as Tamino's.


GHCN Processor relies on the following third-party libraries and environments:

Of course, this tool wouldn't be possible without the GHCN v2 database.

Friday, March 5, 2010

Essentially The Same Graph, Different Smoothing

This is probably the last post I will write on the topic of tropical cyclones in a long time. I don't intend to make this an all-tropical-cyclones-all-the-time blog. I promise.

I thought I should mention a paper I found – Mann & Emanuel (2006) – that contains a graph of storm counts and sea surface temperatures, shown below.

The smoothing is somewhat different to that of the graphs I've produced. It's decadal smoothing. They apparently use SST data for the Atlantic averaged over August-September-October. (I did something like this in my first analysis, but I do not believe it makes sense anymore.)

Mann & Emanuel then compare the decadally smoothed series, and find a correlation coefficient R of 0.73 (p < 0.001 one-tailed.) Based on these results, the authors concluded that:
There is a strong historical relationship between tropical Atlantic SST and tropical cyclone activity extending back through the late nineteenth century.

The way I see correlation coefficients, 0.5 is "not bad," 0.75 is "pretty good," 0.9 is "very good," and 0.99+ is "law of physics."

The confidence level is what really matters when it comes to causality, in my view. In my statistical analysis, where data had all the original noise, I found a detrended association with 99.993% confidence (equivalent to p < 0.00007) – allowing for a lag of 1 year.

I wondered, nevertheless: What would the correlation coefficient be like if we compare the 15-year central moving averages of the data I've been using? Let's do this only considering data since 1870, like Mann & Emanuel do. Let's also allow for a lag, given that such is evident in the graph.

I found something surprising. The optimal lag is 7 years, not a few years like I had presumed. The correlation coefficient R at this lag is 0.924 – well within "very good" territory.

You know what's interesting about the 7-year lag? 2005 minus 1998 is 7. I might come back to that some other time.

I conclude that Mann & Emanuel were on the right track, but they didn't make a strong enough case. There's a presentation by Tom Knutson of NOAA where he mentions the graph from Mann & Emanuel (2006), apparently thinking it's interesting, but wonders if the storm record is reliable enough to produce a graph like that. Then he goes on to discuss the long-term trend, which may or may not be statistically significant (as I already explained, causality and trend significance are different things), and the projections of computer simulations.

Thursday, March 4, 2010

Literature on the Number of Tropical Cyclones

It's not exactly true that my claims about the number of tropical cyclones tracking SSTs are at odds with observational science. They might be at odds with some models, as discussed in 4AR WGI

See, for example, Webster et al. (2005). This paper has some interesting graphs too, but the time span is short.
We conclude that global data indicate a 30-year trend toward more frequent and intense hurricanes, corroborated by the results of the recent regional assessment.

I should also note that determining causality and determining whether there's a statistically significant upward trend are technically different endeavors. You can have one without the other. This applies to other effects of AGW as well, like sea level rise. Does that make sense?

Wednesday, March 3, 2010

Recent Storm Seasons Have Been Mild, Right?

OK. The graph with the 15-year (or 17-year, or 21-year) smooth NH SST and Atlantic storm series looks good, almost too good. I was reminded of that fact recently. But then, you might ask, how do we reconcile this with the fact that Atlantic storm seasons after 2005 have been mild? Should I believe my lying eyes, or... my lying eyes?

Mild compared to what? Anything is mild when you compare it to the Katrina season.

You can search Wikipedia for each year's count, e.g. google "2006 Atlantic Hurricane Season." These are the figures:

2006: 9
2007: 15
2008: 16
2009: 9

The average is 12.25. The average between 1944 and 1990 was 9.8. I'm not saying recent seasons are significantly more busy, but you can't claim they are milder than the 1944-1990 average either.

Clearly, 2007 and 2008 were both considerably more busy than normal. They probably went unnoticed because, again, what could possibly compare to the 2005 season?

Note that I never claimed storms resulting from global warming were going to be the end of us all, or anything of the sort. I haven't even attempted to model it yet. Any projection I made carried a caveat: that the association would have to be linear. But it can't really be linear, can it? I can't say, for example, if the number of storms saturates after a certain temperature is reached.

I never even claimed that there has been a statistically significant rise in the number of storms. It seems reasonable this is the case, but I just haven't done this analysis. What I did is determine whether storms track sea-surface temperatures, and they do.

I will say one thing, though. Do not be surprised if the 2011 or 2012 seasons are quite busy. I'm saying that because 2010 is an El Niño year. It's not the case that busy seasons always follow El Niño years, or that all El Niño years end up producing busy seasons. But based on a quick comparison, it often does happen that way.

Too Easy To Be True?

Back in June of 2008, I wrote a statistical analysis of the association between northern hemisphere sea surface temperatures and the number of named storms in the Atlantic basin. I determined, with 99.993% confidence, that indeed there was such an association. I had controlled for coincidental trends (otherwise known as a spurious correlation) by means of detrending.

A commenter over at Climate Audit tacitly admitted reproducing my analysis, but pointed out that if he detrended the series using 6th-order polynomial trendlines, the association no longer held. I noted that if you allow for a lag of 1 year between the series, even the 6th-order detrending resulted in a statistically significant association, despite the loss of information that necessarily results from such a detrending. The existence of the lag between temperatures and named storms would soon become crystal clear to me and my (not very many) readers.

During the back and forth with the Climate Audit commenter, I realized that if you simply smooth out the noise from both series, the association becomes graphically evident, and a lot more convincing – I thought – than a statistical analysis. The first graph I posted used nothing more than a 21-year central moving average for smoothing. The results were remarkable and the graph was remarkably easy to produce.

As it turns out, it was also remarkably difficult to believe. A few months went by, and a reader (paulm) posted a link to my analysis over at the GW blog. The graph was met with a got-to-be-fake type of response.

When I found out, this obviously made me upset. You can tell I was upset as I was explaining my very lenient comment policy (see sidebar) in my response to the incident. I even posted the spreadsheet for verification. There were no further falsification accusations after I did this, and I thought that was the end of it.

Fast forward a year and a half. Deltoid has a recent post on the topic, quoting various IPCC statements, and I basically commented that the IPCC was wrong, in my view, in regards to the number of tropical cyclones not changing in response to global warming. The data told me otherwise.

I guess questioning an IPCC claim was a mistake, wasn't it? I might have also broken some social norm I'm not aware of or something. Some of the regulars started to talk to me as if I were one of the resident trolls, like El Gordo or Spangled Drongo. They basically accused me of fraud and trying to deceive, in a way that is not dissimilar to how the CRU team are accused of fraud and so forth.

Things calmed down after I, once again, posted the spreadsheet. I appreciate Bernard J's semi-apology.

For reference, below are the links to the data and the spreadsheet. I posted the spreadsheet at a more permanent location.

What else can I do? You've seen my comment policy. Should I post screenshots too?

I'd hate to think that my most interesting graphs are assumed to be fake a priori. Is my graph of Red Sea sea level and Vostok temperatures a fake? What about my graph of the natural spline interpolation of the Law Dome CO2 data? What about the one with the Mann et al. (2008) reconstruction and CO2 at the time of the industrial revolution? What about the graph with the Greenland temperature reconstruction?

I realize I often post claims that you can't just Google to confirm, and I realize that people are sometimes paranoid. That's why I have the comment policy I have.

Sunday, February 28, 2010

Sea Level Rise - Part 3

Thermal Expansion

In Sea Level Rise - Part 2, I had estimated the magnitude of sea level rise (SLR) due to ice melt alone, since the last glacial maximum (LGM). It was 97 meters. This was based on an ice coverage and ice topography reconstruction made available by Peltier (1993).

Total SLR since the LGM is about 130 meters according to Fleming et al. (1998). The IPCC figure is 120 meters (4AR WGI FAQ 5.1.) The Red Sea reconstruction provided by Siddall et al. (2003) also indicates it's in the order of 120 meters.

So we're missing at least 23 meters of SLR. Could this difference be the result of thermal expansion? As it turns out, no.

I will estimate a ballpark figure for SLR since the LGM due to thermal expansion alone. For this, I will use a simplified model of the ocean: A rectangular pool of water with 3.5% salinity, a surface area of 361·106 km2, and a depth of 3,600 meters. The temperature at the surface of the pool has increased from 12C to 16C. The temperature at a given depth is calculated using the ocean water temperature profile model I derived in the last post. I will also assume that water pressure does not affect our calculations significantly.

In order to carry out the calculation, I divided our simplified ocean in 36 layers of 100 meters each. For each layer, I calculated current temperature and LGM temperature.

You can also calculate the density of water given its temperature and salinity. The following graph illustrates the relationship between temperature and the inverse of density (or volume of 1 kg of water in liters.)

The density of water with 3.5% salinity is, thus, well approximated by:

T is the temperature in degrees Celsius. The units for density are kilograms per liter.

With this equation, we can calculate the density of each layer of our simplified ocean, at present and during the LGM. The volume of each layer at present is 3.61·107 km3. The volume of each layer in the LGM can be calculated by multiplying 3.61·107 km3 by the current density and dividing it by the LGM density.

The total volumetric difference turns out to be 9.5·104 km3. This translates to 0.26 meters of SLR, or 26 cm.

This should not be taken as a precise figure. The point is that it's nowhere near 23 meters. Clearly, the change in ice volume I calculated from the Peltier (1993) reconstruction must be an underestimate.

Ice melt is obviously a much more significant problem, in the long term – meaning thousands of years – than thermal expansion. However, if the temperature is rising rapidly, thermal expansion can temporarily contribute more SLR than ice melt. Presumably, the effects of thermal expansion are immediate. Ice melt is slow.

Friday, February 26, 2010

The Temperature of Ocean Water at a Given Depth

In order to corroborate the results of the last post on sea level rise, I would like to estimate the impact of thermal expansion alone since the last glacial maximum. (I anticipate I will need to post a correction, but first things first.) In order to do this properly, I need to know how the temperature of ocean water varies with depth. I've tried to look for a straightforward formula that solves this problem, to no avail. (I did find a graph that was illustrative.)

So I went ahead and derived such a formula, based on ocean water characteristic data from the Argo database. More specifically, I used the ensemble mean grid made available by Asia-Pacific Data-Research Center of the University of Hawaii.

For each depth file, I calculated temperature means for 3 different latitude groups, which I've called 0S, 30S, and 60S. The 0S group includes latitudes -15S to 15N. The 30S group includes -45S to -15S. The last group includes -75S to -45S. Data is not available for all latitudes, so the mean values obtained should not be considered latitudinal averages; that's not the purpose of this exercise.

Let's start by looking at a graph of mean temperature at a given depth for each of the groups.

It doesn't look very easy, does it? Clearly, the temperature of ocean water must depend on variables other than the sea surface temperature and depth. But I was able to come up with a model that fits the data quite well (R2=0.988.) The formulas follow.

  • S is the sea surface temperature plus 0.338, in degrees Celsius.
  • D is the depth in meters.
  • T(D) is the temperature at a given depth D, in degrees Celsius.


The model was empirically derived. The first thing I noticed is that the depth D is approximately inversely proportional to T(D). In fact, the following formula is a rough approximation of temperature at a given depth.

The problem with this formula is that it doesn't work very well at shallow depths. You might have noticed in the figure that the temperature is roughly stable at a depth of 30 meters or less. In order to fix this problem, the approach I came up with is to transform D into D', where D' tends to zero when D is small, but tends to D when D is big.

If we multiply D by a logistic function, we obtain something close to the desired result for D'.

Once I had the general form of the equation, all I had to do is figure out the 4 coefficients involved. I used genetic programming to solve this.

Monday, February 22, 2010

The Greenland Canard

A favorite argument of AGW "sceptics" has to do with Greenland and what is known as the medieval warm period (MWP). The idea is that if the Earth was warmer some time in the past, this would undermine AGW theory. I don't find this line of argumentation to be robust either way, but let's examine it, shall we?

The Vikings colonized Greenland from 986 AD. They were able to farm, fish and raise cattle. The settlements disappeared by the 15th century, presumably because of the little ice age (LIA).

The first thing that needs to be pointed out is that Greenland was still a very cold place during the MWP. It was not a green paradise of any sort. Skeptical Science and A Few Things Ill Considered have the details on that.

Another counter-argument I've encountered is that Greenland might have been considerably warmer than it is now during the MWP, but this was most likely a local or North-Atlantic phenomenon, not a global one. Nearly all climate reconstructions that cover the MWP (and there are many of them, by many different authors, using several different methods) do not show it to be warmer than today.

Is there a way to corroborate that Greenland was indeed unusually warm during the MWP, unlike the rest of the northern hemisphere? Absolutely.

Alley (2000) provides a 50,000 year temperature reconstruction from Central Greenland whose resolution is not bad at all. The reconstruction is ice-core based, and it provides temperatures as absolute values. In order to calculate a "temperature anomaly" for purposes of graphical comparison, I added 31.29 to the temperature values provided by Alley (2000).

Let's start by looking at data from 200 AD to 1850 AD. I will use the CPS Northern-Hemisphere reconstruction from Mann et al. (2008) for comparison.

The difference between the MWP and the LIA in Greenland was around 1.6°C. Interestingly, the MWP temperature peak in Greenland occurs almost exactly at the time of the Viking colonization. Presumably, that's not a coincidence. Additionally, there are already some indications in this graph that the climate of Greenland experiences abrupt changes from time to time.

Not convinced? Let's look at 50,000 years of data. For comparison, I will use the temperature reconstruction from Vostok station, Antarctica made available by Petit et al. (1999). This is also an ice-core based reconstruction.

Here we see that the temperature of Greenland fluctuates in a manner that is not matched by the temperature of Antarctica. In particular, notice the effect of the Younger Dryas stadial on each of the series.

In synthesis, the climate of Greenland is quite peculiar, and as a result, it should not be thought of as a proxy of global or even hemispheric climate.

Wednesday, February 17, 2010

Sea Level Rise - Part 2

Ice Melt

In the previous post on Sea Level Rise, I argued there is a clear association between the Red Sea sea level reconstruction from Siddall et al. (2003) and the Vostok temperature reconstruction from Petit et al. (2000), with sea level lagging temperatures by about 4,700 years, at least for the last 50,000 years. This is corroborated by other reconstructions, such as the SL reconstruction from Arz et al. (2007) for the time span 83,000 years to 13,000 years before present.

The magnitude of SLR since the last glacial maximum 20,000 years ago is about 130 meters (427 feet), which again, is non-trivial. If we had similar SLR today, essentially all coastal cities around the world would be under water.

Much of that SLR has to be the result of ice melt. The rest, if any, would have to be the result of thermal expansion. Ice that is floating in the ocean should not cause SLR when it melts. What matters in this sense is ice over land.

In this post I'd like to estimate the magnitude of sea level rise due to ice melt alone.

NOAA provides a reconstruction contributed by Peltier (1993), along with applets that let you visualize ice coverage and ice topography in a world map. The resolution is 1,000 years, and the data goes all the way back to the last glacial maximum.

Fortunately, most of the ice coverage is over land, as you can see. So I took the 21K-year-old ice coverage data, and used it as a baseline for the calculation of changes in ice volume. The topography data includes land topography, evidently, but we're interested in ice volume change (not so much absolute ice volume) so it will do for now. In a footnote I will provide the processed data, along with an explanation of how it is obtained. The following graph shows the ice volume data along with the Vostok temperature reconstruction.

So we're talking about a change in ice volume in the order of 3.9·107 km3 in the last 21K years. The density of glacial ice is about 90% that of ocean water. (The tip of an iceberg is about 10% of the iceberg.) So the change in ice volume translates to 3.5·107 km3 of additional ocean water.

Now, the surface area of the ocean is 3.61·108 km2. If sea level rises, this surface area will change in a manner that is negligible, even if some areas are flooded. Does that make sense? Therefore, a good approximation for the change in ocean volume (ΔV) that results from a rise (ΔL) in sea level, is:

ΔV = 3.61·108·ΔL

Solving for ΔL, we have that:

ΔL = 0.097 km

That is 97 meters. It follows that thermal expansion must be responsible for the remaining 33 meters of SLR since the LGM.

[Correction 2/28/2010: As it turns out, SLR due to thermal expansion can't be anywhere near 33 meters. See Part 3 of the series.]

What if we lost all ice?

Let's assume, conservatively, that only 2.5·107 km3 of ice remain over land. If we managed to melt all of it, the ocean would get an extra 2.24·107 km3 of water. Dividing this volume by the surface area of the ocean, we get 62 meters of SLR.

That scenario is, of course, not something we'll see in our lifetimes, by a long shot. Even if we raised the temperature of Earth enough, it would take thousands of years to be realized, no doubt.

It's not an impossible scenario, however. About 50 million years ago, during the Paleocene-Eocene Thermal Maximum, global sea level was about 200 meters higher than today.

  • Summarized Ice Coverage and Ice Volume from Peltier (1993)
    This is obtained by processing all ice*.asc and top*.asc files in "ASCII special" format. The area of a grid cell is calculated by dividing the area of a spherical ring 1-degree wide (corresponding to each latitude in the data) by 360. Only grid cells with ice coverage in the 21K-year-old data are considered when calculating land+ice volume.

Thursday, February 11, 2010

Sea Level Rise - Part 1

I have begun to take a closer look at available sea level data. Frankly, I'm a bit surprised. Of course I've heard of sea level rise (SLR) before, but it was like a fuzzy abstraction. I didn't have specific figures to ponder.

In this particular post I will only discuss paleoclimate data. There's some data that I imagine is pretty well known. The figure to your right, for example, comes from Wikipedia. It's a sea level reconstruction from Fleming et al. (1998) of the last 20 thousand years or so. That's roughly the time span since the last glacial maximum (LGM), when the global mean temperature was 5°C lower than today, give or take. As the LGM ended, and the planet entered the current interglacial period, sea level rose by about 130 meters (427 feet.) Of course, much of that is probably the result of ice melt, and there was a lot of ice in the LGM.

Then there's data that is not well known as far as I can tell. For example, there's a 380,000-year reconstruction contributed by Siddall et al. (2003) based on oxygen isotope records from Red Sea sediment cores. The following graph shows this sea level reconstruction along with the Vostok station temperature reconstruction provided by Petit et al. (2000).

If there were any doubts that temperature drives sea level, I believe the graph above is enough to dispel them. This is the basic premise I wanted to establish with this first post.

The data has some features that I wish to explore further, but not today. In particular, notice that sea level lags temperature by several thousand years. (This is not so clear with sea level data older than about 250,000 years. I'm guessing there's some sort of dating error either in the Red Sea data or in the Vostok data once you go that deep.) If I only consider data for the last 50,000 years, I'm estimating that the best lag is about 4,700 years. I'm not sure I can emphasize enough how important this lag is, but I'll certainly try.

[In Part 2 I estimate SLR due to ice melt alone since the LGM.]

Friday, February 5, 2010

The Twin GHGs Paradox

The means by which a greenhouse gas (GHG) forces climate change is sometimes called radiative forcing. You can think of it as the additional irradiance necessary to bring the system back into balance after a change in the concentration of the greenhouse gas.

Apparently, the radiative forcing contribution of different greenhouse gases is typically added to come up with the total contribution. For example, if radiative forcing between 1750 and 1998 is 1.46 W/m2 for CO2 and 0.48 W/m2 for methane, then the total for the two gases is 1.94 W/m2. This makes complete sense on the surface, does it not?

But if the effect of a change in the concentration of a GHG is non-linear, is it really correct to linearly add such effects? I was thinking this would be similar to trying to add decibels.

After some digging around, my impression was that the IPCC had addressed this issue in section 2.8.4 of 4AR WGI. If you read the papers cited, nevertheless, it appears that they refer to the equivalence of the addition of forcings and the addition of temperature changes. I don't doubt these operations are roughly equivalent for relatively small effects. But that's not what I'm talking about at all. I'm questioning the validity of either operation, considering that both forcings and temperature shifts are non-linear responses.

In order to illustrate the problem, I've come up with a thought experiment that I've dubbed the twin GHGs paradox.

Imagine there are two GHGs that happen to be identical in their effects. If you have trouble picturing it, suppose one gas is industrially-produced CO2 and the other gas is naturally-produced CO2. Their effects are logarithmic.

Let's say the concentration of both gases has changed from 100 ppmv to 200 ppmv. Then their combined radiative forcing would be given by:

ΔF = k·ln(200/100) + k·ln(200/100) = 2·k·ln(2)

Since the gases are identical, the change just described should be equivalent to, say, that of one gas going from 190 ppmv to 390ppmv and the other gas remaining at 10 ppmv, not changing at all (i.e. the combined total is 200 ppmv initially and 400 ppmv later.) Therefore,

ΔF = k·ln(390/190) + k·ln(10/10) = k·ln(2.05)

Combining both equations, we have that:

ln(2.05) = 2·ln(2)

Evidently, we end up with reductio ad absurdum.

To get more technical...

When you're looking at the absorption bands of two GHGs, it seems clear that it matters whether the bands overlap or not. The net absorption calculation will depend on this.

Wednesday, January 6, 2010

Smoothing Splines and Law Dome CO2 Data

I've been reading about a paper by Ernst-Georg Beck where it is claimed that CO2 levels in the past 150 years have fluctuated widely. Apparently, when Mauna Loa measurements started, CO2 levels magically became more stable. It's not surprising these claims have been made fun of. But that's not what I really want to discuss.

Beck's paper got me thinking about the Law Dome ice-core data I've been using to associate CO2 with temperatures. I suspected it might actually be too smooth. You see, Etheridge et al. (1998) provides two convenience data sets: One is a 20-year smooth series spanning 1832-1978. The other is a 75-year smooth series spanning 1010-1975. They are obtained using smoothing splines.

You can actually see that the Etheridge et al. 20-year smooth data is even more smooth than Mauna Loa data (e.g. in this figure, before and after 1978.) The problem with smoothing out noise is that you can easily lose information.

I went ahead and calculated the natural spline interpolation of the raw data from Etheridge et al. (which I'm making available HERE.) The natural interpolation only uses the spline function to estimate data for years that are missing. It does not smooth out the raw data that is known. In the Etheridge et al. data set, what this will mean is that recent data will have most of the original noise, while old data will be smooth, but not 75-year smooth, hopefully.

Let's take a look at the 1850-1978 20-year smooth series provided by Etheridge et al. (1998), along with the natural interpolation of the raw data.

Interesting, isn't it? I think it makes a lot of sense. In particular, notice the flat CO2 trend between the years 1933 and 1952. That's 20 years without an increase in atmospheric CO2. With the smooth series, this is not so evident. You see perhaps 12 years of pause or so.

Is this important? Let's take a look at the HadCRUT3 temperature series along with the logarithm of the new CO2 series, from 1850 to 2008. (Mauna Loa data is added after 1978, minus a small offset.)

It would appear that the climate reacts rapidly to CO2 fluctuations, which again, argues for a small amount of "warming in the pipeline."

There are other things you can't see very well with the overly smooth data. Let's take a look at the time span between 1700 and 1900. For temperature, I will use the NH-SH average from the CPS temperature reconstruction of Mann et al. (2008).

Keeping in mind that both the temperature and CO2 data in the figure are reconstructed, I think this is pretty interesting. It suggests there might have been slight warming of about 0.1C right at the beginning of the industrial revolution. This is consistent with estimates of climate sensitivity.

Tuesday, January 5, 2010

Warming in the Pipeline

How much "warming in the pipeline" is there? Is there a way to be sure? NASA GISS findings suggest it's about 0.6C.

I was contemplating a method of estimating "warming in the pipeline" from available temperature and CO2 data. It's sort of a heuristic method, with all this implies, but it's interesting that I get somewhat different results.

CO2 appears to be the main causal agent of recent temperature shifts. I've demonstrated that temperature fluctuations lag CO2 fluctuations by about 10 years. It comes to reason that observed temperature fluctuations lag equilibrium temperature fluctuations by about 10 years as well.

Imagine you have an equilibrium temperature series that looks like a sinusoid. Observed temperature will lag the hypothetical series, also looking like a sinusoid, by some number of years. If we use Newtonian cooling as an approximation, the expected rate of temperature change (R) will be given by:

R = r·(T' - T)

T' is the equilibrium temperature and T is the observed temperature. The constant r is something I will call the rate coefficient.

The question is: If you know the lag between the sinusoid series, can you estimate the value of the rate coefficient r? Then, if you know r, can you estimate "warming in the pipeline"? I think the answer is yes.

I gave up trying to solve it with calculus. Perhaps a reader can give it a shot. I instead solved it by means of a Monte Carlo simulation.

It turns out that the rate coefficient r depends on the period of the sinusoid and the lag between the series, but not the amplitude of the sinusoid.

The period of the CO2 sinusoid that results from the 3rd-order detrending of the CO2 series is about 85 years (pulsation is 0.074.) You can see that in the figure above. For this period, and a lag of 10 years, my simulations indicate that the rate coefficient r should be just about 0.08 in units of year-1.

Let's assume that the current rate of temperature change (without weather noise) is about 0.018 degrees Celcius per year. Then we have that:

0.018 = 0.08·(T' - T)

So the temperature imbalance is:

ΔT = T' - T = 0.018 / 0.08 = 0.23°C

This is not too bad. If correct, I'd take it as good news.

Climate Sensitivity

I'm essentially claiming that the global equilibrium temperature is knowable and that it's perhaps 0.7°C relative to the HadCRUT3 baseline. It also appears (based on various reconstructions) that the temperature in the 18th century was fairly stable at about -0.4°C. We can probably assume that's an equilibrium temperature. The difference is 1.1°C.

The concentration of CO2 in the 18th century was about 277 ppmv. The concentration as of 2008 is more like 385 ppmv. This means that:

1.1 = k·ln(385/277)


k = 3.34

We can estimate the climate sensitivity to CO2 doubling as follows:

λ = k·ln(2) = 2.32°C

This is actually a tad lower than the best estimates available. There are some uncertainties in the estimate, to be sure. For example, were temperatures really stable at -0.4°C in the 18th century? Then there are some errors that are immediately obvious. Some of the warming could be due to methane and other greenhouse gases. You also have cooling due to aerosols, which would confound the estimate in a manner whose magnitude is not well understood.