Weather Normalization of US Electricity Consumption Using Population-Weighted Degree Days

Together with a natural resources investment firm we investigated the link between US electricity consumption and GDP. The hypothesis was that increases or decreases in GDP were accompanied by corresponding increases or decreases in electricity consumption, and that it might be possible to assess the current state of the US economy by looking at recent US electricity-consumption data. US electricity-consumption data is published weekly, so could potentially provide a valuable early indication of US GDP, which is published only quarterly.

Naturally US electricity consumption is also influenced by the weather: cold weather means more heating and hot weather means more cooling, and both of these influence total consumption. Using multiple regression analysis with heating and cooling degree days we normalized the electricity-consumption data to remove (or at least significantly reduce) the influence of the weather, so we could better assess the link with the economy.

Our investigation led to two articles:

the article you are reading now which explains how we weather normalized US electricity-consumption data using population-weighted degree days and regression; and
a separate article that discussed the link between weather-normalized electricity consumption and GDP.

Unfortunately the second article (about the link with GDP) is no longer online, but we keep this article (the one you are reading now) because the methodology it describes is much more broadly applicable and could be useful for anyone analyzing the energy consumption of any country or large region.

Calculating US population-weighted degree days

US aggregate electricity-consumption data comes as weekly figures, with the weeks starting at midnight on Sundays. So each figure covers consumption for Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, and Saturday. The date that is listed with each published figure is that of the Saturday that is the last day of the weekly period. (Note that this is different to our system which labels weekly degree-day figures with the first day of the 7-day period that they cover.)

To weather-normalize this US aggregate electricity-consumption data we wanted US population-weighted degree days. The assumption is that electricity consumption is distributed around the country according to population, so the weather in more populated areas has a stronger influence (a higher-weighted influence) on total consumption than the weather in less populated areas. Population distribution is not a perfect proxy for electricity-usage distribution, but it's a pretty good one, and it has the advantage of requiring only degree days and readily-available population data.

To calculate the US population-weighted degree days we started with a list of US cities by population from Wikipedia (which is great for this sort of data). Conveniently this list includes a longitude/latitude position for each city, which meant that, using the Degree Days.net API (or the desktop app for non-programmers), we could get our system to choose the most appropriate weather station to represent each city automatically. Using just the top 100 or so cities would probably be pretty good, but, since we'd automated the process, we used the full list for greater accuracy. At the time there were 295 cities in the list.

For each of the 295 cities we collected weekly HDD and CDD in a wide range of base temperatures, from January 2000 through to the present day. This was quick and easy with the API / desktop app (see our products page for more on these). Then, for HDD and CDD, we combined the figures for all the cities using the following simple formula for population-weighting:

Population-weighted degree days

Population-weighted average = (p₁ × dd₁) + (p₂ × dd₂) + (p₃ × dd₃) + ...
                                         (p₁ + p₂ + p₃ + ...)

Where:

p₁, p₂, p₃ are the populations of cities 1, 2, 3 etc.
dd₁, dd₂, dd₃ are the degree days for weather stations in cities 1, 2, 3 etc.

This gave us weekly US population-weighted HDD, and weekly US population-weighted CDD, in a wide range of base temperatures.

Being a software company, we wrote a little program to assemble this population-weighted data for us automatically using our API. But for non-programmers we also came up with a simple process to calculate population-weighted degree days using our desktop app and Excel. You can use this process to assemble similar population-weighted data of your own for the United States, or any other country or region, or for the world as a whole. Download this spreadsheet to see how.

Testing thousands of regressions to determine the best base temperatures

Base temperature is a very important factor in any regression analysis involving degree days. Different buildings have different base temperatures, and those with both heating and cooling almost always have different base temperatures for each. Aggregate energy-consumption data is similar, and it is important to choose appropriate heating and cooling base temperatures for its regression analysis.

An automated way to help determine the best base temperatures is to run test regressions using data in a range of base temperatures to see which give the best statistical fit using e.g. R-squared, the standard error, or CVRMSE (all of which usually give the same results). This is essentially what our regression tool does, but we couldn't use that as it uses degree days from a specific location (for analysis of energy data from a specific building), whilst we needed to use the population-weighted data that we had calculated. So we did it ourselves using the weekly US electricity data and the population-weighted HDD and CDD in a wide range of base temperatures, running regressions covering a range of different periods:

January 2000 right through to the present day.
Each calendar year from 2000 onwards.
Each 4-quarter period from 2000 onwards e.g. quarter 1 2000 to quarter 4 2000, quarter 2 2000 to quarter 1 2001, quarter 3 2000 to quarter 2 2001 etc. (This approach made sense as the US GDP data comes as quarterly figures. And it's always good to cover a multiple of 12 months when running regressions using degree days, since it ensures that all seasons are given equal weighting.)

Analyzing the best-fitting regressions for each analysis period, we found that overall it worked best to use HDD with a base temperature of 55°F and CDD with a base temperature of 69°F. It is reassuring to note that these statistically-chosen base temperatures make good logical sense as well.

You'd probably need a custom-built program in order to test many thousands of different regressions like we did. But for your own analysis of US electricity-consumption data you could just use the base temperatures that our analysis determined.

Running further regressions using the chosen base temperatures

With the electricity data and population-weighted HDD (with a base temperature of 55°F) and CDD (with a base temperature of 69°F), we ran a multiple regression for each calendar year from 2000 onwards. For each year this gave us a regression equation like the following for 2014:

2014 regression equation

US electricity consumption = (182.00741 × HDD) + (396.05980 × CDD) + (8632.88222 × days)


Where:

days is the number of days covered by the period you have HDD and CDD for
US electricity consumption is the total usage predicted over the period, in millions of kWh
HDD is the population-weighted heating degree days with base temperature 55°F over the period
CDD is the population-weighted cooling degree days with base temperature 69°F over the period

With the HDD and CDD for any given period, and the length (in days) of that period, you can use this regression equation to calculate the 2014-predicted energy consumption of the period (i.e. the energy consumption according to the 2014 baseline). You can then compare that figure with the actual consumption over the period, and use the comparison as an indication of whether normalized energy consumption is increasing or decreasing.

For comparisons in any given year we used the previous calendar year as the baseline. An improved approach may have been to use the previous four quarters as the baseline for comparisons in any given quarter.

For our analysis we were more interested in short-term trends than longer-term trends, so this is why we used a different baseline regression for each calendar year. If you were simply looking to see how energy consumption had changed since, say, 2000, you would need just one baseline regression for the year 2000, and you would make all your comparisons against that.

If you want to do similar analysis yourself in Excel...

The spreadsheet mentioned above should help you calculate population-weighted degree days using our desktop app and Excel. That should be relatively easy.

We used a custom-built program to run our multiple regressions (energy data regressed against both HDD and CDD together), but you can also run multiple regressions in Excel using the Data Analysis ToolPak add-in:

In Excel 2007 and above go to the "Data" tab, then "Data Analysis", and select "Regression". (If you don't see "Data Analysis" in the "Data" tab, go to "File" -> "Options" -> "Add-Ins", click "Go..." to manage "Excel Add-Ins", and enable the "Analysis ToolPak" there.)
In Excel 2003 and below go to "Tools" -> "Data Analysis" and select "Regression". (If you don't see "Data Analysis" in the "Tools" menu, go to "Tools" -> "Add-Ins" and enable the "Analysis ToolPak" there.)

You can't easily use the Data Analysis ToolPak to test regressions using lots of different base temperatures, but for analysis of US electricity consumption you could just use the base temperatures we found (55°F for HDD and 69°F for CDD).

For non-US data it would be better to determine the optimal base temperatures yourself. This would be difficult in Excel, but you may be able to do it in R, or with custom-programmed software. Failing that you could just estimate the optimal base temperatures rather than determining them statistically.

Anyway, quite likely you are looking to do something a little different to what we did here. People analyze aggregate energy-consumption data for all sorts of different reasons. So it might not make sense for you to emulate our approach, but hopefully this article has given you some useful ideas at least!

What next?

We have several other articles on degree days and how to use them effectively, and answers to frequently asked questions.

You can download degree days from our free website and use our regression tool from there too, by choosing "Regression" as the "Data type". It will automatically test your energy data against degree days in lots of different base temperatures, to find the ones that give the best statistical fit. It won't do this with population-weighted data like what we used for the analysis described in this article, but it is ideal for the much-more-common case of analyzing energy data from a building using degree days from a nearby weather station. It's one of many reasons to choose Degree Days.net over alternative data sources.

You might also like to read an overview of the Degree Days.net products that cater for the more sophisticated needs of many energy professionals, multi-site organizations, academic/government researchers, and energy-software developers who use our system. If you're looking for additional data, data for lots of locations, or automated access to data (in large or small quantities), our products can help!