I wrote some code in order to carry out a more thorough analysis of a possibly systematic effect in the raw data, hypothesizing the effect depends on the size of the station's associated town. Basically, I divided stations in population size groups, using 1.25-fold increments. That is, the first group consists of towns whose population is between 10,000 and 12,500. The second group has between 12,500 and 15,625 people, and so on. The last group consists of towns with populations between 15.8 million and 19.7 million. For each group, I got a global temperature series, in a way equivalent to how GHCN Processor would produce them. This is what I came up with:
This is a highly significant effect. It doesn't even make sense to post a confidence level, because it's exceedingly close to 100%.
It is obvious from the graph, nonetheless, that the number of cities declines rapidly with population size. It's a good idea in these cases to look at a logarithmic scale of the X axis.
This logarithmic relationship is clearly a good candidate for segmented regression. When the population is less than about 1.04 million, there is no discernible effect. A linear regression of the left-hand "segment" has a slight downward slope, which is not statistically significant. The average temperature slope between 1880 and 2009 is
0.0056°C/year(which is what the red line represents.)
We can thus derive a straightforward model for UHI, applicable to the GHCN v2 raw data file, which follows.
C = -0.0039·[ln(P + 1) - ln(1042)]
- C is a correction (in °C/year) that should be added to the temperature slope of a station only if the population of its associated town is greater than 1.04 million.
- P is the population of the town associated with the station, in thousands.