How to Calculate or Prove Energy Savings Using Degree Days and Regression

So you've been working to reduce your energy consumption... Maybe you've installed new insulation, or upgraded your HVAC system, or campaigned to encourage your coworkers to switch things off, or made some other improvements to increase efficiency... At some point you'll probably want to work out how much energy you've saved...

But heating and cooling energy consumption varies with the weather, and this complicates things. Colder weather means more energy for heating, and warmer weather means more energy for cooling. You might have energy-usage data from before and after your improvements, but you can't just compare the before-and-after figures like-for-like if the period before your improvements was hotter or colder than the period after your improvements. To calculate or prove the energy savings you've made, you'll need to correct for the weather variations somehow.

Fortunately there is a well-established process for calculating energy savings in situations like these. It involves degree days and regression analysis. You might hear it loosely called "weather correction", or "weather normalization", but these terms can imply a variety of calculations, so we need to be more precise.

We'll explain the process step by step:

Step 1: Assemble your energy data

You'll need energy-usage data from before the changes/improvements you've made, and from after too. We'll call this the "before data" and the "after data".

Either the before data, or the after data, or both, will need to be broken down into dated periods (e.g. weeks or months), with an energy-consumption figure for each period. You'll need this for the baseline regression, which we will explain shortly. Weekly data is typically best if you can get it.

If you have daily data, you can use it directly if your building has similar usage patterns on all 7 days of the week. But if it doesn't (e.g. because it is unoccupied on weekends) you should sum it into weekly totals. Or, if you are particularly keen, you could split it into occupied and unoccupied days and analyze both sets of data separately for the rest of the process described in this article. (This would involve running a baseline regression for both sets of data separately, then making comparisons of baseline-predicted consumption with actual consumption for both sets of data separately, then combining the results together at the end. This article doesn't explain this more-complicated approach in any more detail than this, but hopefully you can figure out the details if you want to give it a go.)

If you have interval data from an automated smart meter (e.g. 5-, 10-, 15-, 20-, 30-, or 60-minute data), you should also sum it into weekly totals, or daily totals if you are comfortable with the complexities described in the paragraph above. Our energy management software can help with this. "Isn't it bad to lose the detail?" you might wonder... For many sorts of analysis, yes, but for analysis of heating/cooling energy consumption, quite the opposite, thanks to the complicated time lags between temperature changes outside and heating/cooling energy consumption inside. This is explained in more detail in our frequently asked questions.

Monthly data is less ideal, but it's common, and you can use it. Though really you should have at least a year's worth (12 months), ideally more.

In fact, however your data is broken down, you should really have at least a year's worth of before data and a year's worth of after data. You can do the calculations with less, but they won't be so reliable.

If you have multiple meters

You might have different meters for different fuels (like electricity or gas), or you might have different meters for different buildings or different submeters within individual buildings. Either way it's almost always best to do this analysis for each meter individually. When you've completed this process for all your meters, you can combine the results to calculate the total overall savings. This will almost certainly work much better than combining the energy data first.

Step 2: Choose your baseline period

Your "baseline period" is the period over which you'll do your "baseline regression".

Your baseline period should:

Cover a period over which the building and its usage patterns saw no big changes.
Have energy data broken down into multiple periods (like weekly or monthly energy data, as discussed above).
Cover at least a year, ideally longer. Generally it's best to cover a whole number of years, like 2 years or 3 years rather than 2.5 years, as this way the seasons are all represented equally.
Come exclusively from the before period or the after period. (This is probably obvious, but we mention it just in case.)

Most people would typically take the baseline period from the before data, but actually it's fine to take it from the after data instead if the after data better fits the criteria above. If both the before data and the after data fit the above criteria similarly, it generally makes sense to choose the one with more dated periods (of measured energy usage) within it. And if they both have the same number of dated periods, you can calculate a baseline regression for both (see below for instructions), and choose whichever regression has the better statistics (e.g. the higher R² value).

Step 3: Calculate your baseline regression

The "baseline regression" will give you an equation that describes the energy consumption over the baseline period in terms of heating degree days (HDD) and/or cooling degree days (CDD). Typically this "baseline regression equation" will be one of the following:

Regression equations

If your meter supplies heating (but not cooling):

E = b*days + h*HDD

If your meter supplies cooling (but not heating):

E = b*days + c*CDD

If your meter supplies both heating and cooling:

E = b*days + h*HDD + c*CDD

Where:

E is the energy usage over the period in question;
days is the length (in days) of the period in question;
HDD is the heating degree days over the period in question;
CDD is the cooling degree days over the period in question;
b, h, and c, are regression coefficients (see below right).

Regression coefficients?

The b, h, and c in the regression equations above are called "regression coefficients". In a real regression equation (calculated using real energy data and real degree days) these regression coefficients would be real numbers (like 12.563, 539.1, or 4,092.271).

Regression coefficients are calculated as part of the regression process, but you don't need to understand how, as it will be done automatically by whatever software you use to calculate your regression equation (most likely our online regression tool, or Excel).

Note that all 3 regression equations above allow for non-weather-dependent energy usage on the meter (from things with energy consumption that doesn't vary with the weather). So don't worry if your metered energy consumption covers heating and/or cooling along with non-weather-dependent things like hot water or office equipment.

Note also that the choice of base temperature is very important for the HDD and/or the CDD. This is a big part of getting a good baseline regression.

This may all be sounding rather complicated... Getting a good baseline regression is undoubtedly the hardest part of calculating energy savings... But fortunately the regression tool on our website can do most of the hard work for you.

To use the regression tool to generate your baseline regression: go to the Degree Days.net web tool, select "Regression" as the "Data type", and follow the instructions from there.

To use the regression tool effectively you should really read through the instructions... Yes, we know, instructions are boring and nobody likes reading them, but a good understanding of the regression tool will really help you get good results from it.

Also, although the regression tool will automatically help you figure out appropriate base temperature(s) to use for your HDD and/or CDD, you should make sure that the base temperature(s) you decide upon make sense for your building. Our article on choosing base temperatures should help with this.

You can also calculate your baseline regression in Excel, and our article on regression analysis has more on how to do this. But our regression tool has some useful features that are impractical to reproduce in Excel, so we'd generally recommend you stick with the regression tool to get the best possible baseline regression.

Step 4: Compare baseline-predicted consumption with actual consumption

The baseline regression equation describes energy usage as it was over the baseline period. But we can also use it to "predict" the energy that would have been used over any other period if the building had been operating as it did over the baseline period.

This is a simple concept, but difficult to explain, so don't worry if it doesn't immediately make sense. We'll explain in more detail:

Using the baseline regression equation to calculate predicted energy usage

Over the baseline period the building had a certain construction, a certain set of equipment, and a certain pattern of usage. These factors all combined to determine its energy usage over that baseline period.

The weather varied over the baseline period, so the heating/cooling energy usage would have varied over the baseline period too. This is what the baseline regression equation accounts for with HDD and/or CDD.

Let's consider a heating-only example:

Example heating-only baseline regression equation

E = 1954.298*days + 220.624*HDD

You'll notice that this regression equation has values for the b and h coefficients that were introduced further above. They're just example values, your real baseline regression equation will have different values (likely very different). The regression tool will calculate them automatically based on the energy-usage data you give it and the degree days it generates automatically for your chosen location.

Let's now consider the whole of the baseline period. Using Degree Days.net we can get the HDD for the whole of the baseline period (making sure to get them in the HDD base temperature used by our baseline regression equation). Let's say the baseline period covered the whole of 2018 (i.e. from the start of January 1st, 2018 to the end of December 31st, 2018). We could get the HDD for that exact same period, either by selecting "Custom" as the "Breakdown" option and copy/pasting in our dates, or by downloading daily data and summing it in Excel. Let's say there were 2,678 HDD over that baseline period of 365 days. We can feed these figures into the regression equation:

E = 1954.298*days + 220.624*HDD

E = 1954.298*365 + 220.624*2678

E = 1,304,150

Note that the units of the calculated energy usage E will be the same as the units of the energy-usage data that was used to make the baseline regression – it could be kWh, or therms, or whatever. Either way, we have just used the baseline regression equation to calculate the total energy usage over the baseline period. And this calculated energy usage should be the same as the actual energy usage over the baseline period, assuming we calculated the baseline regression equation properly (which our regression tool should do automatically).

We can also consider a sub-period (like a week) within the baseline period, and we can get the HDD for that sub-period, and we can work out its length in days (7 for a week). Feeding these numbers into the regression equation will give us an approximate value for the energy usage over that sub-period. It won't be exactly the same as the energy usage that was actually used over that sub-period, because the baseline regression equation is a simplification of reality (we essentially got the regression equation by fitting a straight line to a scatter plot), but it should be fairly close.

Now the clever part: we can take any period for which we can get HDD (in the appropriate base temperature of course), and we can feed the HDD and the length of the period (whether it's a day, a week, a year, or whatever) into our baseline regression equation, and we'll get the baseline-predicted energy usage – the energy usage that we would expect the building to have used over that period if everything other than the weather had been exactly the same as it was over the baseline period.

For a period outside the baseline period, whether before it or after it, things could be different in the building: the construction, the equipment, or the pattern of usage could be different. But it doesn't matter to our calculation of the baseline-predicted energy usage: all we need is the HDD over our period and we can calculate what the energy usage would have been if everything other than the weather had been the same as it was over the baseline period.

The final comparison of baseline-predicted usage with actual usage

If we calculated our baseline regression from our before data, we'd get the total energy usage from our after data, the HDD and/or CDD over the period that our after data covers, and the length in days of that period, and we'd feed these figures into our baseline regression equation to get the baseline-predicted energy usage for that after period. We'd then compare that baseline-predicted energy usage with the actual energy usage over that after period to see how much energy was (or wasn't) saved.

If we calculated our baseline regression from our after data, we'd get the total energy usage from our before data, the HDD and/or CDD over the period that our before data covers, and the length in days of that period, and we'd feed these figures into our baseline regression equation to get the baseline-predicted energy usage for that before period. We'd then compare that baseline-predicted energy usage with the actual energy usage over that before period to see how much energy was (or wasn't) saved.

Either way we will end up with two numbers: the baseline-predicted energy usage and the actual energy usage. With these figures it should be easy to calculate percentage savings and so on. Fingers crossed they come out as good as we are hoping!

What next?

We have several other articles on degree days and how to use them effectively, and answers to frequently asked questions.

You can download degree days from our free website and use our regression tool from there too, by choosing "Regression" as the "Data type". It is very useful for getting a good baseline regression, as required for step 3 of the process described in this article, and for most degree-day analysis in fact. It's one of many reasons to choose Degree Days.net over alternative data sources.

You might also like to read an overview of the Degree Days.net products that cater for the more sophisticated needs of many energy professionals, multi-site organizations, academic/government researchers, and energy-software developers who use our system. If you're looking for additional data, data for lots of locations, or automated access to data (in large or small quantities), our products can help!

Generate Degree Days or Run Regressions Now