Saturday, June 26, 2010

Sprawling Cities Getting Hotter Faster

An interesting new paper has been published (ahead of print) in Environmental Health Perspectives: Stone et al. (2010). It's being widely reported in the media (e.g Yahoo! News.) It finds that:
Our results find the rate of increase in the annual number of extreme heat events between 1956 and 2005 in the most sprawling metropolitan regions to be more than double the rate of increase observed in the most compact metropolitan regions.
The primary author, Brian Stone, is also quoted by OurAmazingPlanet as follows:
"These findings show that the pace of climate change is greater in sprawling cities than in others, which has not been shown before."
The study uses "urban sprawl" as the independent variable, which is more sophisticated than simply using a city's population size. That said, these findings essentially confirm what I stated in my post titled Urban Heat Island Effect - A Model. Briefly, I had determined that the UHI effect on the 130-year temperature trend depends logarithmically on the size of a station's associated town, but only if the town's population is greater than about a million people.

Wednesday, April 7, 2010

Intensity or Frequency?

I have previously argued that – in my estimation – there's a strong causal association between sea-surface temperatures and the number of named storms (or tropical cyclones) in the Atlantic Basin. Statistically, the association is quite significant, and graphically, it is evident once you apply very simple smoothing filters.

This is not the prevailing scientific view, which essentially says that the intensity of storms should increase with global warming, and the frequency of storms should actually decline. This prevailing view, largely based on computer modeling and not observations, is best summarized by the IPCC in 4AR WGI

I was wondering if both could be true: global warming increases the frequency and intensity of tropical cyclones. How can we test this idea using available observations? You can't just look at, say, Accumulated Cyclone Energy (ACE.) If the frequency of storms increases, ACE should also increase, even if the average intensity of each storm doesn't change.

It occurred to me that a much better test would be to look at the ratio of hurricanes to all named storms, and the ratio of major hurricanes to storms. I've done this, but I'll leave it as an exercise for the reader. It's a very easy analysis. You can use the named storm count data from the Hurricane Research Division of NOAA. If you have concerns that tropical storms were under-counted in the past relative to hurricanes (a reasonable assumption), you can use data starting in 1944, which is when systematic aircraft recognizance started. But remember, causality matters more than the trend in this case.

To make a long story short, observations do not appear to support the view that global warming will cause storm intensity to increase. The historical data is telling me the opposite of what the IPCC claims. What, if anything, am I missing? Could it be that things will work differently in the future?

Tuesday, April 6, 2010

Urban Heat Island Effect - A Model

In the addendum of my last post on the Urban Heat Island (UHI) Effect, I noted that GHCN v2 apparently does contain data that we can use to verify the existence of the effect, even though UHI doesn't seem to have a discernible impact on global temperature trends. This is interesting because it's at odds with some well known findings from the literature, such as Peterson et al. (2003), and it addresses a "mystery" of sorts about the instrumental temperature record.

I wrote some code in order to carry out a more thorough analysis of a possibly systematic effect in the raw data, hypothesizing the effect depends on the size of the station's associated town. Basically, I divided stations in population size groups, using 1.25-fold increments. That is, the first group consists of towns whose population is between 10,000 and 12,500. The second group has between 12,500 and 15,625 people, and so on. The last group consists of towns with populations between 15.8 million and 19.7 million. For each group, I got a global temperature series, in a way equivalent to how GHCN Processor would produce them. This is what I came up with:

This is a highly significant effect. It doesn't even make sense to post a confidence level, because it's exceedingly close to 100%.

It is obvious from the graph, nonetheless, that the number of cities declines rapidly with population size. It's a good idea in these cases to look at a logarithmic scale of the X axis.

This logarithmic relationship is clearly a good candidate for segmented regression. When the population is less than about 1.04 million, there is no discernible effect. A linear regression of the left-hand "segment" has a slight downward slope, which is not statistically significant. The average temperature slope between 1880 and 2009 is 0.0056°C/year (which is what the red line represents.)

We can thus derive a straightforward model for UHI, applicable to the GHCN v2 raw data file, which follows.

C = -0.0039·[ln(P + 1) - ln(1042)]

  • C is a correction (in °C/year) that should be added to the temperature slope of a station only if the population of its associated town is greater than 1.04 million.
  • P is the population of the town associated with the station, in thousands.

Monday, April 5, 2010

Urban Heat Island Effect - Probably Negligible

Previously I had discussed the difference between rural and urban temperature stations in the U.S. Commenter steven argued that population assessments (R, S and U) in GHCN v2 might be outdated and – in general – not very good proxies of what we really want to measure.

I then compared rural stations in the Mid-West (a low-population-density region of the U.S.) with all rural stations. There wasn't a major difference between these two sets of stations either. Commenter steven was not convinced, however. He posted some satellite pictures of rural stations that are located in what appear to be sub-urban areas.

How could we measure the impact of human populations on station temperature with the data available to us? It's clearly not enough to express doubt and speculate about what might be going on.

Here's what I came up with. There's a vegetation property in the station metadata. If you look at stations in regions that are forested (FO), marshes (MA) or deserts (DE), they appear to be actually rural. I looked at a subset of such stations in Google Maps, and they are not close to human settlements, with few exceptions. The GHCN Processor command I used to obtain a temperature series is the following.

ghcnp -dt mean -include "population_type=='R' && (vegetation=='FO' || vegetation=='MA' || vegetation=='DE')" -o /tmp/global-rural-plus.csv

575 stations fit these characteristics. For comparison, I got temperature series for big cities (population > 0.5 million), and small towns and cities (population <= 0.5 million.) I calculated 12-year moving averages in each case, which is what you see in the figure below.

There might be some differences, but they are always small, and we've compared several different stations sets now, globally and at the U.S. level.

An argument could also be made that small human settlements increase the albedo of an area, so they might have a cooling effect.

Addendum (4/5/2010)

Here's an actual UHI finding of interest. I compared cities of population over 2 million with towns whose population is between 10,000 and 15,000. The difference is more pronounced in this case.

The overall effect is still negligible, nevertheless. The number of cities decreases exponentially with population size.

Tuesday, March 30, 2010

US Rural vs. Urban Temperature Stations

There's an article by Edward R. Long titled "Contiguous U.S. Temperature Trends Using NCDC Raw and Adjusted Data For One-Per-State Rural and Urban Station Sets." It claims to show that in the raw/unadjusted NCDC data, urban U.S. stations have a warming trend that diverges from that of rural stations, whereas in the adjusted data, the rural trend has been adjusted to "take on the time-line characteristics of urban data." In not so many words, it claims that NCDC data has been fudged. Not surprisingly, Long's article appears to be quite popular in "sceptic" circles.

The methodology of the article is peculiar. First, why only analyze the U.S.? Even though the U.S. has more stations than any other single country, its surface area is only 2% that of Earth.

More importantly, why pick only one rural and one urban station from each state? What was the criteria used to pick each state's stations? Was it random? The article does not clarify, so it lends itself to accusations of cherry-picking.

It's fairly easy to verify Long's claims with GHCN Processor. A quick verification takes perhaps 10 minutes if you're familiar with the tool's options.

First, we can get rural and urban temperature anomaly series for the U.S. from the adjusted data, with the following commands, respectively:

ghcnp -include "country eq 'UNITED STATES OF AMERICA' && population_type eq 'R'" -reg -o /tmp/us-rural.csv

ghcnp -include "country eq 'UNITED STATES OF AMERICA' && population_type eq 'U'" -reg -o /tmp/us-urban.csv

GHCN Processor informs us that it processed 867 rural stations, and 117 urban stations. That's plenty more than the article analyzes. Let's take a look at a graph of both series.

This is still consistent with Long's claims. What we want to confirm is whether the rural trend is significantly less steep in the raw unadjusted data. To get rural and urban series from the unadjusted data file, I used the following commands, respectively:

ghcnp -dt mean -include "country eq 'UNITED STATES OF AMERICA' && population_type eq 'R'" -reg -o /tmp/us-rural-raw.csv

ghcnp -dt mean -include "country eq 'UNITED STATES OF AMERICA' && population_type eq 'U'" -reg -o /tmp/us-urban-raw.csv

In this case, the tool informs us that 1046 rural stations have been analyzed, compared to 392 urban stations. A graph of the series follows.

To be thorough, let's also get 12-year running averages of the unadjusted series. That's what the article does.

This graph is very different to Figure 6 from Long's article, and it doesn't support Long's conclusions by any stretch of the imagination. It's also clear that while Long's urban trend is roughly correct, the 48 rural stations he picked are not representative of the 1046 stations GHCN Processor retrieves out of the raw data file. Why they are not representative can only be speculated upon, but I have some ideas.

The divergence between urban and rural stations that exists prior to the reference period (1950 to 1981) might be something that needs to be looked into further, but it's not too surprising. The farther back you go, the fewer the number of stations that report. There's simply more error in older data.

Addendum (3/31/2010)

In comments, steven suggests that GHCN v2 population assessments are old and may no longer be applicable. For example, a station might be near a town that used to have less than 10,000 people, and classified as 'R', but then the town grew.

Intuitively, it doesn't seem likely that this would be sufficient to explain away the findings, and it certainly doesn't address Dr. Long's choice of only 48 rural stations in the U.S. But I try to keep an open mind about these types of arguments, within reason.

Fully addressing steven's objection would take substantial work, but we can do the next best thing. Steven indicates that population density is what really matters. Let's take a look at a population density map of the United States (from Wikimedia):

Here's another such map. A portion of the U.S. (basically, the mid-west) has considerably lower population density than the rest of the country. Let's define this low-density region as that bounded by longitudes -95° and -115°, which excludes California. A longitude map of the US can be found here.

A typical rural station in the low-density region should not be as likely to be near an urban area as a rural station in high-density areas of the U.S. Additionally, the population density around the station should be lower for rural stations in the low-density region, in average. Does that make sense?

With GHCN Processor we can easily obtain an unadjusted temperature anomaly series only for rural stations in the low-density region, as follows.

ghcnp -dt mean -include "country eq 'UNITED STATES OF AMERICA' && population_type eq 'R' && longitude < -95 && longitude > -115" -reg -o /tmp/us-really-rural-raw.csv

I've calculated 12-year running averages of the rural low-density series, and plotted it along with the full-rural series.

Things haven't really changed, have they?

There seems to be somewhat of an offset between both station sets, which is interesting to some extent. Apparently, "really rural" stations were warm relative to all rural stations during the baseline period (1950 to 1981.)

BTW, there are 432 "really rural" stations in the unadjusted data file.

Sunday, March 28, 2010

The Average Temperature of Earth

If you Google the average temperature of Earth, you'll find a couple of frequent estimates: 13°C and 15°C. GISTemp data files carry a note that says:
Best estimate for absolute global mean for 1951-1980 is 14.0 deg-C or 57.2 deg-F...

GHCN Processor 1.1 has a -abs option that causes the tool to write out "absolute measurement" averages, as opposed to temperature anomalies. Additionally, simulations I've run indicate that the tool's default cell and station combination method (the linear-equations-based method) is adequate for this sort of application.

You can get a global absolute measurement series with the following command.

ghchp -gt seg -abs -o /tmp/ghcn-global-abs.csv

In this case the tool says 4771 stations have been analyzed. GHCN v2 has considerably more stations, however. A lot of them are dropped in the adjusted data set because they don't have enough data points. To get a series based on the raw data, you can run:

ghcnp -dt mean -gt seg -abs -o /tmp/ghcn-global-abs-noadj.csv

The tool now analyzes 7067 out of 7280 stations. (GHCN Processor 1.1 has an implicit filter that drops any station without at least 10 data points in any given month.)

It probably shouldn't come as a surprise that there are differences in the results we obtain with each data set. With the adjusted data, the average for the baseline period 1950 to 1981 is 14.7°C. With the raw data, the average is 14.1°C. The reason for the discrepancy probably has to do with the sorts of stations that don't make it into the adjusted data set, typically because they haven't reported long enough. They might be cold stations, like stations near Antarctica, a region with a substantial scarcity of stations in the adjusted data set.

Let me post a graph of both series, just so there's no confusion.

So it's basically an offset difference between the two. If I'm correct about it being due to stations near Antarctica, one caveat would be elevation. It appears that stations in Antarctica are high up, and this is a problem. We could filter stations by elevation, but then we basically drop Antarctica, like the adjusted data set does.

I believe 14.1°C is probably a low estimate. It's also pretty clear that GHCN v2 is land-biased.

There does seem to be a slight slope difference (0.15°C/century) between the adjusted and raw series. This can't be anywhere near significance, and again, it probably has to do with the sorts of stations that don't make it into the adjusted data set. I wouldn't be surprised if "sceptics" manage to make a big deal of it, though.

GHCN Processor 1.1

Version 1.1 of GHCN Processor is now available for download. Relative to Version 1.0, the highlights are:

  • Implemented the first difference station combination method (Peterson et al. 1998.)
  • Added -abs option, which causes the tool to produce an "absolute measurement" series, as opposed to an "anomaly" series.
  • Added a -ccm option (similar to -scm) that sets the grid cell/box combination method.
  • The default station and grid cell combination method is the linear-equation-based method (olem.) This decision was based on simulations I ran comparing all 3 supported methods, involving series with equal and different underlying slopes.
  • Added -og option, which lets you write grid partitioning information to a CSV-format file. This feature is informative if you use the -gt seg option. (Zeke suggested producing a map. This is the next best thing.)

See the release-notes.txt and readme.txt files for additional details.