Tuesday, March 30, 2010

US Rural vs. Urban Temperature Stations

There's an article by Edward R. Long titled "Contiguous U.S. Temperature Trends Using NCDC Raw and Adjusted Data For One-Per-State Rural and Urban Station Sets." It claims to show that in the raw/unadjusted NCDC data, urban U.S. stations have a warming trend that diverges from that of rural stations, whereas in the adjusted data, the rural trend has been adjusted to "take on the time-line characteristics of urban data." In not so many words, it claims that NCDC data has been fudged. Not surprisingly, Long's article appears to be quite popular in "sceptic" circles.

The methodology of the article is peculiar. First, why only analyze the U.S.? Even though the U.S. has more stations than any other single country, its surface area is only 2% that of Earth.

More importantly, why pick only one rural and one urban station from each state? What was the criteria used to pick each state's stations? Was it random? The article does not clarify, so it lends itself to accusations of cherry-picking.

It's fairly easy to verify Long's claims with GHCN Processor. A quick verification takes perhaps 10 minutes if you're familiar with the tool's options.

First, we can get rural and urban temperature anomaly series for the U.S. from the adjusted data, with the following commands, respectively:

ghcnp -include "country eq 'UNITED STATES OF AMERICA' && population_type eq 'R'" -reg -o /tmp/us-rural.csv

ghcnp -include "country eq 'UNITED STATES OF AMERICA' && population_type eq 'U'" -reg -o /tmp/us-urban.csv

GHCN Processor informs us that it processed 867 rural stations, and 117 urban stations. That's plenty more than the article analyzes. Let's take a look at a graph of both series.

This is still consistent with Long's claims. What we want to confirm is whether the rural trend is significantly less steep in the raw unadjusted data. To get rural and urban series from the unadjusted data file, I used the following commands, respectively:

ghcnp -dt mean -include "country eq 'UNITED STATES OF AMERICA' && population_type eq 'R'" -reg -o /tmp/us-rural-raw.csv

ghcnp -dt mean -include "country eq 'UNITED STATES OF AMERICA' && population_type eq 'U'" -reg -o /tmp/us-urban-raw.csv

In this case, the tool informs us that 1046 rural stations have been analyzed, compared to 392 urban stations. A graph of the series follows.

To be thorough, let's also get 12-year running averages of the unadjusted series. That's what the article does.

This graph is very different to Figure 6 from Long's article, and it doesn't support Long's conclusions by any stretch of the imagination. It's also clear that while Long's urban trend is roughly correct, the 48 rural stations he picked are not representative of the 1046 stations GHCN Processor retrieves out of the raw data file. Why they are not representative can only be speculated upon, but I have some ideas.

The divergence between urban and rural stations that exists prior to the reference period (1950 to 1981) might be something that needs to be looked into further, but it's not too surprising. The farther back you go, the fewer the number of stations that report. There's simply more error in older data.

Addendum (3/31/2010)

In comments, steven suggests that GHCN v2 population assessments are old and may no longer be applicable. For example, a station might be near a town that used to have less than 10,000 people, and classified as 'R', but then the town grew.

Intuitively, it doesn't seem likely that this would be sufficient to explain away the findings, and it certainly doesn't address Dr. Long's choice of only 48 rural stations in the U.S. But I try to keep an open mind about these types of arguments, within reason.

Fully addressing steven's objection would take substantial work, but we can do the next best thing. Steven indicates that population density is what really matters. Let's take a look at a population density map of the United States (from Wikimedia):

Here's another such map. A portion of the U.S. (basically, the mid-west) has considerably lower population density than the rest of the country. Let's define this low-density region as that bounded by longitudes -95° and -115°, which excludes California. A longitude map of the US can be found here.

A typical rural station in the low-density region should not be as likely to be near an urban area as a rural station in high-density areas of the U.S. Additionally, the population density around the station should be lower for rural stations in the low-density region, in average. Does that make sense?

With GHCN Processor we can easily obtain an unadjusted temperature anomaly series only for rural stations in the low-density region, as follows.

ghcnp -dt mean -include "country eq 'UNITED STATES OF AMERICA' && population_type eq 'R' && longitude < -95 && longitude > -115" -reg -o /tmp/us-really-rural-raw.csv

I've calculated 12-year running averages of the rural low-density series, and plotted it along with the full-rural series.

Things haven't really changed, have they?

There seems to be somewhat of an offset between both station sets, which is interesting to some extent. Apparently, "really rural" stations were warm relative to all rural stations during the baseline period (1950 to 1981.)

BTW, there are 432 "really rural" stations in the unadjusted data file.


steven said...

Ah well the problem is that you are trusting the GHCN metadata.

The U and R codes in GHCN metadata have various problems. Problems so bad that they really are not used anymore by anyone seriously looking at the problem.

The GHCn population data ( see the ipop metadata) is quite old and the U/S/R code derives from this.

R is meant to pick out towns that have less than 10K people and U is meant to pick out places that have more than 50K people.

1. The population data is old.
2. Its population desnity that probably matters more.
3. Its the 3D profile of the landscape that also matters.
4. The effect of UHI going from zero population density to 10K people in a town, can be larger than going from 10K to 50K. UHI is a THRESHOLDED phenomena. For example, going from 1 Million people in a town to 5million won't drive UHI as much as going from zero people to 10K
( it's a logistic function )
5. Land use in the rural areas ( particularly the building of damns and irrigation) is also a contaminating feature. ( tour the sites in google maps.. I have KMZ files its a blast )

Anyways, I think your tool is great. Probalem is of course that the underlying data isnt it that great of shape. but it will get better

Zeke said...

I got similar results awhile back: http://rankexploits.com/musings/wp-content/uploads/2010/03/Picture-1101.png

However, as Mosh mentioned, the metadata isn't always perfect. I took a more detailed look at various urbanity proxies (nightlights, pop density, and GRUMP urban boundaries): http://rankexploits.com/musings/2010/uhi-in-the-u-s-a/

Joseph said...

@steven: I've written an addendum to address your point.

BTW, it should be possible to fix these data issues in software rather than the data itself. If I'm bored, that's one more thing I could do :)

steven said...

Not sure if you got it as zeke points out.

Let's see if I can explain from a physics perspective what UHI is all about.

UHI is a localized heating bias. The size of the area effected and the size of the bias is variable.

1. the bias is thresholded. That means, for example, that it doesnt grow linearly with population for example. the bias in growing from 0 people to 1 million people is greater than the bias in moving from 3million to 4 million.

2. The extent of the bias is variable as well having to do with the exact conditions on the ground.

With that said lets look at the basic PHYSICAL causes of UHI.

A. Radiative Canyons
B. Surface material heat storage
C. Albedo changes
D. Land use changes
E. Waste heat from human activity
F. Boundary Layer disruptions which impede
turbulant mixing. ( convection of the heat
away from the surface.
G. Shading.

Now, to some extent each of these is correlated with population and population density. But lets take a high density place with low buildings versus a high density place with tall buildings. What's the difference to UHI? can be substantial.

Lets take a low density place. what difference does land use make? Well, if you look at some low density places the low density places are...
Airports and dams. So if you do a tour ( Google earth is your friend ) of the stations in a "rural" set you will find them at airports ( low population, low building height, but lots of surface property differences. You will also find them at Dams. Dams built early in the century. Dams built to support what? agriculture.

So when we talk about UHI we are not just talking about population. population and population density is only a proxy for UHI. more people tends to mean more tall buildings, more waste heat, etc. but fewer people can also mean more land use changes..

Anyways, Ron (the whiteboard ) has some more metadata and I've put it all together in a common place under creative commons. drop me a note. and I'll hook you up.

steven said...


I dont regard Longs work as a credible analysis.

steven said...

Ok Here is what you really looked at in the 434 stations.

I pulled up the metadata on them.

Of the 434 "Rural stations" in your zone

You actually have this:

1. Of the 434 100 are at airports. In the study performed by NOAA to calibrate the best sites
( CRN climate reference network) against a site that they thought was good ( and airport nearby) they found a .25C bias. Airports do not equal RURAL and they are certainly not UHI free. Although the lack of large structures does help them.

2. 84 of the stations are rated as URBAN by the GRUMP database. That is, a combined metric of nightlights and population density.

From the GPW population density studies your areas where as follows:

the mean population density was 11 people per sq km, but the max was 232 and the SD was 26.

now, according to the measure of urbanity that NASA GISS uses ( in their 2010 paper in draft..
new Nightlights numbers ( 1995 values I believe )

A nightlights value of 10 is considered rural ( might be 14 I have to check ) anyways, your mean was 9.49, your max was 58 and your Sd was 10.

only 222 of the 434 had a nightlights less than 10
and 296 had a nightlights less than 14.

steven said...


So I cut the data I bit differently to show you what you have

1. GHCN Rural = TRUE
2. GRUMP urban extent = TRUE
3. Nightlights (v2) is less than 11.
4. Population density (1sqkm) less than 10.
5. No airports.
6. No coasts or lake stations.

The best station by these criteria


What's it look like close up ( micro site)




Now the "worst" by those criteria.



If you want this list just ask and I'll post it.
there are 239 such sites

Joseph said...

@steven: I'm sure there's some warming bias due to UHI effect at some stations. The relevant points are:

- Is the warming bias significant at a global scale? Note that the US is small fraction of the Earth's surface, which is mostly ocean.

- Even at a US-only level, is it significant?

- What is the magnitude of the bias? Let's see a methodology to quantify it. It's clearly not enough to look at pictures, or list the theoretical causes of UHI.