Degree Days

Weather Data for Energy Professionals

Together with Sinclair & Co. we investigated the link between US electricity consumption and GDP figures. The hypothesis was that increases or decreases in GDP were accompanied by corresponding increases or decreases in electricity consumption, and that it might be possible to assess the current state of the US economy by looking at recent US electricity-consumption data.

Naturally US electricity consumption is also influenced by the weather: cold weather means more heating and hot weather means more cooling, and both of these influence total consumption. Using multiple regression analysis with heating and cooling degree days we normalized the electricity-consumption data to remove (or at least significantly reduce) the influence of the weather, so that the influence of the economy could be better assessed.

The main article is here. On this page we have some more details on the regressions and the degree days that were used as inputs to those regressions.

US aggregate electricity-consumption data comes as weekly figures, with the weeks starting at midnight on Sundays. So each figure covers consumption for Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, and Saturday. The date that is listed with each published figure is that of the Saturday that is the last day of the weekly period. (Note that this is different to our system which labels weekly degree-day figures with the first day of the 7-day period that they cover.)

To weather-normalize this US aggregate electricity-consumption data we wanted US population-weighted degree days. The assumption is that electricity consumption is distributed around the country according to population, and so the weather in more populated areas has a stronger influence (a higher-weighted influence) on total consumption than the weather in less populated areas. Population distribution is not a perfect proxy for electricity-usage distribution, but it's a pretty good one, and it has the advantage of requiring only degree days and readily-available population data.

To calculate the US population-weighted degree days we started with the top 295 US cities by population, listed on Wikipedia. Conveniently this list includes a longitude/latitude position for each city, which meant that, using the Degree Days.net API (or the desktop app for non-programmers) we could get our system to choose the most appropriate weather station to represent each city. Using just the top 100 or so cities would probably be pretty good, but, since we'd automated the process, we decided to err on the side of excess in the name of accuracy.

For each of the 295 cities we collected weekly HDD and CDD in a wide range of base temperatures, from January 2000 through to the present day. Then we combined the HDD and CDD in each base temperature, using the following simple formula for population-weighting:

Population-weighted average = (p_{1}* dd_{1}) + (p_{2}* dd_{2}) + (p_{3}* dd_{3}) + ... (p_{1}+ p_{2}+ p_{3}+ ...) Where: p_{1}, p_{2}, p_{3}are the populations of cities 1, 2, 3 etc. dd_{1}, dd_{2}, dd_{3}are the degree days from weather stations in cities 1, 2, 3 etc.

At the end of it we had weekly US population-weighted HDD and CDD in a wide range of base temperatures.

Being a software company, we wrote a little program to assemble this population-weighted data for us automatically using our API. But for non-programmers we also came up with a simple process to calculate population-weighted degree days using our desktop app and Excel. You can use this process to assemble similar population-weighted data of your own for the US or any other country or region. Download this spreadsheet to see how.

Base temperature is a very important factor in any regression analysis involving degree days. Different buildings have different base temperatures, and those with both heating and cooling almost always have different base temperatures for each. Aggregate energy-consumption data is similar, and it is important to choose appropriate heating and cooling base temperatures for its regression analysis.

An automated way to help determine the best base temperatures is to run test regressions using data in a range of base temperatures to see which give the best statistical fit using e.g. R-squared, the standard error, or CVRMSE (all of which usually give the same results). We did this using the weekly electricity data and the population-weighted HDD and CDD in a wide range of base temperatures, running regressions covering a range of different periods:

- January 2000 right through to the present day.
- Each calendar year from 2000 onwards.
- Each 4-quarter period from 2000 onwards e.g. quarter 1 2000 to quarter 4 2000, quarter 2 2000 to quarter 1 2001, quarter 3 2000 to quarter 2 2001 etc. (This approach made sense as the US GDP data comes as quarterly figures. And it's always good to cover a multiple of 12 months when running regressions using degree days, since it ensures that all seasons are given equal weighting.)

Analyzing the best-fitting regressions for each analysis period, we found that **overall it worked best to use HDD with a base temperature of 55 F and CDD with a base temperature of 69 F**. It is reassuring to note that these statistically-chosen base temperatures make good logical sense as well.

You'd probably need a custom-built program in order to test many thousands of different regressions like we did. But for your own analysis of US electricity-consumption data you can just use the base temperatures that our analysis determined.

With the electricity data and population-weighted HDD (with a base temperature of 55 F) and CDD (with a base temperature of 69 F), we ran a multiple regression for each calendar year from 2000 onwards. For each year this gave us a formula like the following one for 2014:

US electricity consumption = (182.00741 * HDD) + (396.05980 * CDD) + (8632.88222 * noDays) Where: noDays is the number of days covered by the period you have HDD and CDD for US electricity consumption is the total usage predicted over the period, in millions of kWh HDD is the population-weighted heating degree days with base temperature 55 F over the period CDD is the population-weighted cooling degree days with base temperature 69 F over the period

With the HDD and CDD for any given period, and the length (in days) of that period, you can use this formula to calculate the 2014-predicted energy consumption of the period (i.e. the energy consumption according to the 2014 baseline). You can then compare that figure with the actual consumption over the period, and use the comparison as an indication of whether normalized energy consumption is increasing or decreasing.

For comparisons in any given year we used the previous calendar year as the baseline. An improved approach may have been to use the previous four quarters as the baseline for comparisons in any given quarter.

The spreadsheet mentioned above should help you to calculate population-weighted degree days using our desktop app and Excel.

We used a custom-built program to run our multiple regressions (energy data regressed against both HDD and CDD together), but you can also run multiple regressions in Excel using the Data Analysis ToolPak add-in:

- In Excel 2007 and above go to the "Data" tab, then "Data Analysis", and select "Regression". (If you don't see "Data Analysis" in the "Data" tab, go to "File" -> "Options" -> "Add-Ins", click "Go..." to manage "Excel Add-Ins", and enable the "Analysis ToolPak" there.)
- In Excel 2003 and below go to "Tools" -> "Data Analysis" and select "Regression". (If you don't see "Data Analysis" in the "Tools" menu, go to "Tools" -> "Add-Ins" and enable the "Analysis ToolPak" there.)

You can't easily use the Data Analysis ToolPak to test regressions using lots of different base temperatures, but you shouldn't need to for regression of US electricity consumption against population-weighted degree days, as we already established that those regressions work best using HDD with a base temperature of 55 F and CDD with a base temperature of 69 F.

Please refer back to the main article for ideas on comparing weather-normalized electricity data with GDP figures, and don't be afraid to use the approaches described in both these articles as inspiration for your own analysis!