Degree Days

Degree Days

Weather Data for Energy Professionals

Degree Baseline Regression Tool

Regression is at the core of most analysis of heating/cooling energy consumption. A baseline regression describes energy consumption over a chosen baseline period, and is typically used to compare later energy consumption against baseline levels (e.g. to track ongoing performance or prove savings from changes made after the baseline period).

The regression tool tests thousands of regressions against your energy-usage data to help you choose the best regression (with the best HDD/CDD base temperatures) to represent baseline energy consumption.

  1. You choose a weather station and copy/paste in your energy-usage data from a spreadsheet.
  2. Degree generates HDD and CDD in a wide range of base temperatures and uses them to test thousands of regressions against your energy-usage data.
  3. You download a spreadsheet of the regressions that give the best statistical fit, together with a range of statistics to help you assess their quality.
  4. You choose the best regression (usually the first listed) and use it as the baseline for future analysis.

To use the regression tool: go to the Degree web tool and select "Regression" as the "Data type". Or continue reading this page to find out more about the regression tool and how best to use it.

On this page:

Why use the regression tool instead of Excel?

A simple regression of energy consumption against heating degree days or cooling degree days is easy enough to do in Excel. But the Degree regression tool offers a lot more:

Back to top

Copy/pasting in your energy data

The regression tool takes your energy-usage data and runs regressions against it. You will presumably have your energy data in a spreadsheet - you just have to check (and maybe modify) the format and then copy/paste the relevant data into the regression tool.

Step by step instructions for copy/pasting in your energy data

  1. Select the relevant data in your spreadsheet (see the 2 allowed formats below), and hit Ctrl-c to copy it into the clipboard (Command-c on Mac).
  2. Go back to the regression tool, click your mouse in the box, and hit Ctrl-v to paste (Command-v on Mac).
  3. The regression tool should show you a table of your data... Check it over to see that it interpreted everything correctly. If it didn't, edit the data in your spreadsheet and try copy/pasting again.

Don't worry if your spreadsheet contains a lot of other analysis too - you can select/copy/paste only the cells that the regression tool needs.

Getting your spreadsheet data into the right format

Your energy data can have one of two formats. The examples below show the same data specified in each of the 2 formats:

The first-day-only format is a common format for spreadsheets of energy-usage data. The specified dates must be the first day of each period of measured energy usage. The last day of each period is assumed to be the day before the first day of the next. You may need extra rows with dates to specify any gaps in the data and to help the regression tool figure out when the last period ends:

First-day-only format

If your data is regular daily, weekly, monthly (each month starting on the same day), or yearly data, you shouldn't need the final row as the regression tool should figure out the end of the last period automatically. But it's not a bad idea to include it anyway, for clarity.

The first-and-last-day format can be a good one to use if you have gaps in your data as you can typically add an extra column (column 2 in the example below) and insert a few dates without affecting other parts of your spreadsheet:

First-and-last-day format

With the first-and-last-day format you can specify the last day of every period (in column 2), but for the normal case (the last day of one period being the day before the first day of the next) you can just leave it blank.

Date formats

Date formats are a source of much confusion for computer systems. Something like 10/11/12 is highly ambiguous as it can be interpreted as mm/dd/yy, or dd/mm/yy, or even yy/mm/dd.

We like the ISO date format yyyy-mm-dd because it is totally unambiguous. But we've programmed the regression tool to do its best to make sense of a variety of other formats as well. People from all over the world use Degree and we don't want to force them to change their spreadsheets any more than necessary before copy/pasting their data in.

So we suggest you try copy/pasting your data as it is. Our system will say if it can't make sense of it, and, if it interprets your dates wrong, you should be able to see from the table it displays immediately after you paste your data into the box.

If it's not working correctly, try changing the format of all your dates to yyyy-mm-dd. This is easy to do in Excel: select all the date cells, right-click, select "Format Cells...", then "Custom", type yyyy-mm-dd in the "Type:" box, and click "OK". If your original date format was a common one that you would expect to work automatically, please email us so we can see if there's a way we can improve the system.

Back to top

Day normalization

If in doubt, just choose "Weighted", as it works well in all cases. Or read on for more information.

Day normalization is important for dealing with energy-usage data that has periods of different length (such as monthly data). The formulas below show the key difference between regressions that are day normalized and those that aren't:

With day normalization (weighted or unweighted):

y = a*HDD + c*days
y = b*CDD + c*days
y = a*HDD + b*CDD + c*days

Without day normalization:

y = a*HDD + c
y = b*CDD + c
y = a*HDD + b*CDD + c


y is the energy usage over the period in question;
HDD is the heating degree days over the period in question;
CDD is the cooling degree days over the period in question;
days is the length (in days) of the period in question;
a, b, and c, are regression coefficients (the regression tool calculates these)

Day-normalized regression formulas require you to plug in the length (in days) of the period that you want to calculate baseline-predicted energy consumption for. In contrast, regressions that aren't day normalized only work for periods of the same length as the periods in the original baseline data - that period length is effectively built into the constant coefficient (c) already.

If your baseline data (what the regressions are calculated from) has periods that are all the same length (e.g. daily or weekly data), day normalization is not important. With such data, regression with weighted and unweighted day normalization will give the same results, and regression with no day normalization will give a constant coefficient (c) that is simply the period length (in days) multiplied by the constant coefficient given by day-normalized regression. The other coefficients (a and b) will be the same as those given by day-normalized regression.

However, day normalization is important for regressions from data with periods of different length, as it will improve the accuracy of the coefficients. The more variation there is in the period lengths, the more difference there will be in the calculated coefficients.

In summary:

Note that monthly data typically has periods of different length (as calendar months can be 28, 29, 30, or 31 days in length), so it's definitely best to use day normalization (and preferably weighted day normalization) when running regressions against it. For consistency of your post-regression calculation processes we recommend using day normalization (and preferably weighted day normalization) for all the data you work with, whether it has different-length periods or not.

Back to top

Specifying base temperatures to include in the results

The "Include in results" option lets you specify a heating base temperature and a cooling base temperature for which regressions will be included in the results along with the auto-selected ones.

As the regression tool tests thousands of base-temperature combinations automatically, your chosen base temperatures will probably be tested whether you specify them or not. But by specifying them here you can see how their regressions compare with those in the auto-selected shortlist. If the statistics are close, you might want to use them instead of the auto-selected ones.

If you don't have a good idea of the base temperature(s) you want, you can just leave them on the default values to see how much better it is to choose optimal base temperatures than to stick with historically-prescribed defaults like 15.5°C or 65°F.

Back to top

Interpreting the results

After testing thousands of regressions against your energy-usage data, the Degree regression tool returns a spreadsheet with details of the regression(s) that gave the best statistical fit (the "shortlist"), and any others that were notable (e.g. for data that looks like it was from a heating-only building the regression tool will typically also return the best CDD-only regression, even though it's unlikely to make it into the shortlist).

The spreadsheet output contains the following columns of data:

Watch out for negative coefficients!

A negative coefficient (on HDD, CDD, or the constant) is usually an indication that the regression is not a good one. A regression with one or more negative coefficients can often look good in other respects (i.e. good statistics), but it is unlikely to be justifiable in real-world terms, so is typically best ignored.

For informational purposes the regression tool will return regressions with negative coefficients if they fit better than any other regressions with the same formula (e.g. y = a*HDD + c*days, y = b*CDD + c*days, or y = a*HDD + b*CDD + c*days), but it will always list them below any regressions with only non-negative coefficients.

Choosing the best regression

The regression tool has a sophisticated process for comparing and ranking the thousands of regressions that it tests against each set of energy-usage data. The shortlist regressions are the ones it considers to be likely candidates, and the first-listed regression is the one it thinks best. But the regression tool knows nothing about the building that your energy-usage data came from. And, although we are always looking to improve the regression tool's algorithms, it is based on statistics and probabilities so it will never be possible for it to be correct 100% of the time.

If a building has no cooling then you're unlikely to want a regression involving CDD, even if the first-listed regression involves CDD. For a building with no heating you're unlikely to want a regression involving HDD. (Although if the numbers for such surprise regressions look much better than they do for the others it may be worth you questioning your assumptions about the building and the equipment that your metered energy is feeding.)

An experienced energy professional will often have a rough idea of the likely base temperature(s) of a building, the likely baseload energy consumption (expressed in the constant coefficient of the regression formula), and the likely split of energy usage between heating, cooling, and baseload, over the baseline period of energy-usage data provided. This knowledge can help further in choosing the best regression.

If you have good knowledge of the building, use it!

  • Favour regressions with the predictors you expect (e.g. HDD only, CDD only, HDD and CDD together).
  • Favour regressions with base temperatures that you can justify in terms of the building and its operation (our article on estimating base temperatures should help).
  • Look at the regression coefficients and the HDD total and CDD total figures to see how much energy usage each regression attributes to heating, cooling, and baseload (non-weather dependent consumption accounted for by the constant), over the baseline period your energy-usage data covers. Favour regressions with usage breakdowns that fit with your expectations.

Though do check the statistics before choosing any regression that isn't in the shortlist. Here are some tips on comparing regressions based on the statistics:

Bear in mind that the regression tool already aims to use the statistics as best it can when choosing and ranking the regressions it returns from the thousands that it tests against each set of energy-usage data. It will always be possible to improve the algorithms, but it should be doing a pretty good job. However, a statistics-only approach can only go so far, and, with knowledge of the building, you will often find that regression 2 or below will be a better choice than the one the regression tool put first.

Back to top

Help us improve the regression tool

We always like constructive feedback, but it's particularly important to us at the moment while the regression tool is in beta. We'd love to hear what you think of it: what you like about it, what you don't like about it, and what else it could do to make it more useful to you.

Please email us at with any feedback about the regression tool. We'll be most appreciative, and as a way of saying thanks we'll gladly set you up with temporary free access to Degree Pro so you can use the regression tool with larger data sets or data going further back in time.

About sending us data...

For a year after we launched the regression tool in beta in October 2015 we were particularly keen for people to send us real-world energy-usage data with which we could test and improve the regression-tool's algorithms. We have now received a lot of data to work with (thank you!) and getting more is no longer such a priority for us. But if you do have an interesting data set that you would like to discuss, or that highlights a good or bad aspect of the regression tool, please do send it along, together with the following information:

  1. The fuel that the energy data represents (e.g. gas or electricity).
  2. The location of the building that it came from (so we can choose one or more weather stations to test the data against).
  3. Whether the building has heating, or cooling, or both.
  4. What other fuels supply the building.
  5. Which fuel(s) supply which components of the heating/cooling system (e.g. gas heating, electric cooling).
  6. What temperature the building is heated/cooled to (these are often different).
  7. Whether it is heated/cooled 24/7 or intermittently (e.g. for office hours only).
  8. Whether it is well insulated.
  9. Whether it has any significant internal heat gains (e.g. equipment that generates a lot of heat).
  10. Whether it has any significant refrigeration or freezer loads.
  11. Anything else that you think is likely to be relevant.

Sorry about the long list, it's just that we need to know about the building and its usage to figure out whether or not the regression tool is giving useful results for any given set of data.

Thank you!

Back to top

© 2019 BizEE Software - About | Contact | Privacy | FAQ | Web Tool | Desktop App | API