Degree Days

Weather Data for Energy Professionals

So you've been working to reduce your energy consumption... Maybe you've installed new insulation, or upgraded your HVAC system, or campaigned to encourage your coworkers to switch things off, or made some other improvements to increase efficiency... At some point you'll probably want to **work out how much energy you've saved**...

But heating and cooling energy consumption varies with the weather, and this complicates things. Colder weather means more energy for heating, and warmer weather means more energy for cooling. You might have energy-usage data from before and after your improvements, but you can't just compare the before-and-after figures like-for-like if the period before your improvements was hotter or colder than the period after your improvements. To calculate or prove the energy savings you've made, you'll need to correct for the weather variations somehow.

Fortunately there is a well-established process for calculating energy savings in situations like these. It involves degree days and regression analysis. You might hear it loosely called "**weather correction**", or "**weather normalization**", but these terms can imply a variety of calculations, so we need to be more precise.

We'll explain the process step by step:

You'll need energy-usage data from before the changes/improvements you've made, and from after too. We'll call this the "**before data**" and the "**after data**".

Either the *before data*, or the *after data*, or both, will need to be broken down into dated periods (e.g. weeks or months), with an energy-consumption figure for each period. You'll need this for the *baseline regression*, which we will explain shortly. Weekly data is typically best if you can get it.

If you have daily data, you can use it directly if your building has similar usage patterns on all 7 days of the week. But if it doesn't (e.g. because it is unoccupied on weekends) you should sum it into weekly totals. Or, if you are particularly keen, you could split it into occupied and unoccupied days and analyze both sets of data separately for the rest of the process described in this article. (This would involve running a baseline regression for both sets of data separately, then making comparisons of baseline-predicted consumption with actual consumption for both sets of data separately, then combining the results together at the end. This article doesn't explain this more-complicated approach in any more detail than this, but hopefully you can figure out the details if you want to give it a go.)

If you have interval data from an automated smart meter (e.g. 5-, 10-, 15-, 20-, 30-, or 60-minute data), you should also sum it into weekly totals, or daily totals if you are comfortable with the complexities described in the paragraph above. Our energy management software can help with this. "Isn't it bad to lose the detail?" you might wonder... For many sorts of analysis, yes, but for analysis of heating/cooling energy consumption, quite the opposite, thanks to the complicated time lags between temperature changes outside and heating/cooling energy consumption inside. This is explained in more detail in our frequently asked questions.

Monthly data is less ideal, but it's common, and you can use it. Though really you should have at least a year's worth (12 months), ideally more.

In fact, however your data is broken down, you should really have at least a year's worth of *before data* and a year's worth of *after data*. You *can* do the calculations with less, but they won't be so reliable.

You might have different meters for different fuels (like electricity or gas), or you might have different meters for different buildings or different submeters within individual buildings. Either way it's almost always best to do this analysis for each meter individually. When you've completed this process for all your meters, you can combine the results to calculate the total overall savings. This will almost certainly work much better than combining the energy data first.

Your "**baseline period**" is the period over which you'll do your "**baseline regression**".

Your *baseline period* should:

- Cover a period over which the building and its usage patterns saw no big changes.
- Have energy data broken down into multiple periods (like weekly or monthly energy data, as discussed above).
- Cover at least a year, ideally longer. Generally it's best to cover a whole number of years, like 2 years or 3 years rather than 2.5 years, as this way the seasons are all represented equally.
- Come exclusively from the
*before period*or the*after period*. (This is probably obvious, but we mention it just in case.)

Most people would typically take the *baseline period* from the *before data*, but actually it's fine to take it from the *after data* instead if the *after data* better fits the criteria above. If both the *before data* and the *after data* fit the above criteria similarly, it generally makes sense to choose the one with more dated periods (of measured energy usage) within it. And if they both have the same number of dated periods, you can calculate a *baseline regression* for both (see below for instructions), and choose whichever regression has the better statistics (e.g. the higher R^{2} value).

The "**baseline regression**" will give you a formula that describes the energy consumption over the *baseline period* in terms of heating degree days (HDD) and/or cooling degree days (CDD). Typically this "**baseline regression formula**" will be one of the following:

If your meter supplies heating (but not cooling): y = a*HDD + c*days If your meter supplies cooling (but not heating): y = b*CDD + c*days If your meter supplies both heating and cooling: y = a*HDD + b*CDD + c*days Where: y is the energy usage over the period in question; HDD is the heating degree days over the period in question; CDD is the cooling degree days over the period in question; days is the length (in days) of the period in question; a, b, and c, are regression coefficients (see below right)

The *a*, *b*, and *c* in the formulas above are called "regression coefficients". In a real regression formula (calculated using real energy data and real degree days) these regression coefficients would be real numbers (like 12.563, 539.1, or 4,092.271).

Regression coefficients are calculated as part of the regression process, but you don't need to understand how, as it will be done automatically by whatever software you use to calculate your regression formula (most likely our online regression tool, or Excel).

Note that all 3 formulas above allow for non-weather-dependent energy usage on the meter (from things with energy consumption that doesn't vary with the weather). So don't worry if your metered energy consumption covers heating and/or cooling along with non-weather-dependent things like hot water or office equipment.

Note also that the choice of base temperature is very important for the HDD and/or the CDD. This is a big part of getting a good *baseline regression*.

This may all be sounding rather complicated... Getting a good *baseline regression* is undoubtedly the hardest part of calculating energy savings... But fortunately the regression tool on our website can do most of the hard work for you.

**To use the regression tool to generate your baseline regression**: go to the Degree Days.net web tool, select "Regression" as the "Data type", and follow the instructions from there.

To use the regression tool effectively you should really read through the instructions... Yes, we know, instructions are boring and nobody likes reading them, but a good understanding of the regression tool will really help you get good results from it.

Also, although the regression tool will automatically help you figure out appropriate base temperature(s) to use for your HDD and/or CDD, you should make sure that the base temperature(s) you decide upon make sense for your building. Our article on choosing base temperatures should help with this.

You can also calculate your *baseline regression* in Excel, and our article on regression analysis has more on how to do this. But our regression tool has some useful features that are impractical to reproduce in Excel, so we'd generally recommend you stick with the regression tool to get the best possible *baseline regression*.

The *baseline regression formula* describes energy usage as it was over the *baseline period*. But we can also use it to "predict" the energy that would have been used over any other period **if the building had been operating as it did over the baseline period**.

This is a simple concept, but difficult to explain, so don't worry if it doesn't immediately make sense. We'll explain in more detail:

Over the *baseline period* the building had a certain construction, a certain set of equipment, and a certain pattern of usage. These factors all combined to determine its energy usage over that *baseline period*.

The weather varied over the *baseline period*, so the heating/cooling energy usage would have varied over the *baseline period* too. This is what the *baseline regression formula* accounts for with HDD and/or CDD.

Let's consider a heating-only example:

y = 220.624*HDD + 1954.298*days

You'll notice that this formula has values for the *a* and *c* coefficients that were introduced further above. They're just example values, your real *baseline regression formula* will have different values (likely very different). The regression tool will calculate them automatically based on the energy-usage data you give it and the degree days it generates automatically for your chosen location.

Let's now consider the whole of the *baseline period*. Using Degree Days.net we can get the HDD for the whole of the *baseline period* (making sure to get them in the HDD base temperature used by our *baseline regression formula*). Let's say the *baseline period* covered the whole of 2018 (i.e. from the start of January 1st, 2018 to the end of December 31st, 2018). We could get the HDD for that exact same period, either by selecting "Custom" as the "Breakdown" option and copy/pasting in our dates, or by downloading daily data and summing it in Excel. Let's say there were 2,678 HDD over that *baseline period* of 365 days. We can feed these figures into the formula:

y = 220.624*HDD + 1954.298*days y = 220.624*2678 + 1954.298*365 y = 1,304,150

Note that the units of the calculated energy usage *y* will be the same as the units of the energy-usage data that was used to make the *baseline regression* - it could be kWh, or therms, or whatever. Either way, we have just used the *baseline regression formula* to calculate the total energy usage over the *baseline period*. And this calculated energy usage should be the same as the actual energy usage over the *baseline period*, assuming we calculated the *baseline regression formula* properly (which our regression tool should do automatically).

We can also consider a sub-period (like a week) within the *baseline period*, and we can get the HDD for that sub-period, and we can work out its length in days (7 for a week). Feeding these numbers into the formula will give us an approximate value for the energy usage over that sub-period. It won't be exactly the same as the energy usage that was actually used over that sub-period, because the *baseline regression formula* is a simplification of reality (we essentially got the regression formula by fitting a straight line to a scatter plot), but it should be fairly close.

**Now the clever part**: we can take any period for which we can get HDD (in the appropriate base temperature of course), and we can feed the HDD and the length of the period (whether it's a day, a week, a year, or whatever) into our *baseline regression formula*, and we'll get the baseline-predicted energy usage - the energy usage that we would expect the building to have used over that period if everything other than the weather had been exactly the same as it was over the *baseline period*.

For a period outside the *baseline period*, whether before it or after it, things could be different in the building: the construction, the equipment, or the pattern of usage could be different. But it doesn't matter to our calculation of the baseline-predicted energy usage: all we need is the HDD over our period and we can calculate what the energy usage would have been if everything other than the weather had been the same as it was over the *baseline period*.

If we calculated our *baseline regression* from our *before data*, we'd get the total energy usage from our *after data*, the HDD and/or CDD over the period that our *after data* covers, and the length in days of that period, and we'd feed these figures into our *baseline regression formula* to get the baseline-predicted energy usage for that *after*-period. We'd then compare that baseline-predicted energy usage with the actual energy usage over that *after*-period to see how much energy was (or wasn't) saved.

If we calculated our *baseline regression* from our *after data*, we'd get the total energy usage from our *before data*, the HDD and/or CDD over the period that our *before data* covers, and the length in days of that period, and we'd feed these figures into our *baseline regression formula* to get the baseline-predicted energy usage for that *before*-period. We'd then compare that baseline-predicted energy usage with the actual energy usage over that *before*-period to see how much energy was (or wasn't) saved.

Either way we will end up with two numbers: the baseline-predicted energy usage and the actual energy usage. With these figures it should be easy to calculate percentage savings and so on. Fingers crossed they come out as good as we are hoping!