We get a lot of questions along the lines of "how do I do this using degree days?" It's very common for the answers to involve linear regression analysis.
There are many text books and online resources that explain what linear regression analysis is... But the theory can get a little heavy going... So we wrote this short article to explain just the basics of regression analysis using energy consumption figures and degree days.
Before diving into regression analysis, it rather helps if you understand what a degree day actually is... This article provides a good introduction.
To do linear regression analysis, you need to correlate energy-consumption data with degree-day data:
You might have detailed interval data from a smart meter but more likely you'll have weekly or monthly records of energy consumption that you've collected yourself, or energy bills from a utility or energy supplier.
Most buildings follow a weekly routine, which means that weekly energy-consumption data is typically a good option for regression analysis. Although the occupancy of the building and the heating patterns might vary throughout the week, the patterns are usually fairly consistent from one week to the next.
Monthly data is usually OK too, but it's rarely as good as weekly data, because the days of the week don't line up with calendar months (e.g. one month might have 5 weekends, the next might have 4). Unless your building is heated and used in the same way on weekends as it is on weekdays, these calendar mismatches will cause inaccuracies in your calculations.
If you have detailed interval energy consumption data (typically readings taken automatically every 60 minutes or less), you can use our Energy Lens software to turn it into daily, weekly, or monthly kWh data.
You might not have much choice about what energy-consumption data you use for your correlation. But you should try to get data for at least a few periods of measured energy consumption. If you've got daily, weekly, or monthly data, try to cover at least one full heating or cooling season. If you've got annual data, try to cover at least a few years of consumption.
If you're using meter readings provided by your utility or energy supplier, make sure not to use any estimated meter readings. Estimated readings are no use at all for this analysis!
If you're investigating heating energy consumption you'll want heating degree days; if you're investigating cooling energy consumption you'll want cooling degree days.
You will need one degree-day figure for each period of measured energy consumption. If your periods of measured energy consumption are irregular, you'll need to get daily degree days and sum them together to make a total for each period.
Our Degree Days.net tool enables you to access data in a variety of timescales, including daily data (which you can sum into figures matching any period spanning multiple days).
Above we explained how to get the two sets of data (energy consumption and degree days). Next you need to correlate these two sets of data...
The method that follows is ideal if your periods of measured energy consumption are all the same length (like weeks, which are all 7 days long)... But there's a slightly more complicated method that we'll introduce shortly that works slightly better for monthly data and that is usually necessary for regression analysis of irregular periods of consumption.
Our explanation of the improved method is based on this one, so please do go through this method first.
You might want to start by making 3 columns in Excel (or whatever spreadsheet software package you have):
For an example, see the screenshot to the right.
You can then use the second and third columns of data to plot an X-Y scatter chart of HDD (or CDD) against energy consumption.
Once you've made the basic scatter chart, there are a few important extras that you'll probably want: a trend line, an equation, and an R2 value. To get these using Excel:
You should end up with something like the chart below:
Most importantly the equation enables you to estimate kWh from degree days. By plugging a known HDD or CDD figure into the equation you can calculate the predicted energy consumption for the period that the HDD/CDD covered. You can then compare the predicted energy consumption with the actual energy consumption for that period. You would typically do this to see whether the energy efficiency has got better or worse than it was in the period that you did the original regression analysis for.
The R2 value is basically a measure of how good the correlation is. The closer the R2 value is to 1, the better the correlation. A good correlation between degree days and energy consumption indicates that the methodology is sound (the main pitfalls in degree-day analysis have been avoided or corrected for), and that the heating/cooling system is working well (the "control" of the system is good). In other words, the higher the R2, the better.
Generally speaking, an R2 of 0.75 indicates an reasonable correlation between energy consumption and degree days. 0.9 or above is very good. An R2 much below 0.7 or so is likely an indication that the heating control is either very poor, or that the analysis methodology needs to be improved (e.g. wrong base temperature, irregular building occupancy that hasn't been corrected for, heating/cooling metered together with other energy consumption that varies considerably throughout the year).
The example chart above shows a pretty good correlation. In this instance you don't need the R2 value to see that - it's clear from looking at the chart... But R2 is useful for assessing the strength of a correlation objectively.
As explained in this article, the base temperature of the degree days makes a big difference to how well they correlate with the energy consumption of any particular building.
The optimal base temperature varies from building to building. It's difficult to estimate the correct base temperature accurately for any particular building using logic alone, so it can be helpful to make a rough estimate and then try correlating kWh with degree days calculated to various base temperatures around that point. R2 gives a way to compare the strength of the different correlations.
In theory, the base temperature that produces the highest R2 should be the optimal base temperature of the building. However, it doesn't always work out quite so perfectly, because of the other factors that make degree-day-based analysis less than perfect. Nonetheless, testing various base temperatures can give you a useful indication. It shouldn't replace your intuition of what the base temperature should be, approximately, but it can help you to decide on what exact number to use. Generally speaking, the better your correlations (the higher your R2 values), the more faith you can reasonably place in the numbers.
The 3 relevant Excel functions are:
What's great about these functions is that you can quickly apply them to multiple correlations, using degree days with a range of base temperatures. Its a question of copying functions across a spreadsheet rather than creating lots of individual charts.
The screenshot below shows a spreadsheet containing one set of energy-consumption data, multiple sets of degree days (all with different base temperatures), and gradient, intercept, and R2 values for each energy/degree-days correlation:
The above spreadsheet can be a little overwhelming on first glance, but it's clearer when broken down into steps:
If you're new to using $ symbols and functions/formulas in Excel, it's likely that you found some of the steps above a little confusing. It's well worth taking the time to learn how those features of Excel work - they make it possible to do all sorts of things in seconds that would take minutes or hours otherwise.
The intercept (baseload) should be roughly zero or positive. For base temperatures that are too high, you may see a negative intercept (so you can take that as an indication that you should be looking towards lower base temperatures).
Things get more messy when the heating or cooling consumption that you are analyzing is metered together with other energy uses. If those other energy uses are significant, you should expect a significant baseload energy consumption (positive intercept). If you have a good idea of how much energy those other energy uses consume, you can compare your predicted figure with the intercept values to look for the base temperatures with intercepts that fit your expectations.
Do bear in mind that "baseload" is a fuzzy concept, and it often varies throughout the year (meaning it's not really a "baseload" at all). Analysis is less precise when heating and cooling aren't metered separately from each other and from everything else. Do what you can with the figures you have, but don't be surprised if the numbers don't line up as neatly as you hope.
Also take a look at the R2 values. In theory, the base temperature of the building should be the base temperature with the highest R2 value. That's in theory though... In reality the various inaccuracies in degree-day-based analysis tend to muddy things up, and can cause misleading figures. Use the R2 values as an indication rather than an absolute, especially if your correlations aren't strong (i.e. low R2 values across the board).
If the numbers indicate that the optimal base temperature might be higher or lower than the range that you have tested, you should probably download more degree days with higher/lower base temperatures so that you can include them in your analysis.
Correlating energy usage with degree days, as described above, works well when all the energy-consumption records cover identical periods of time. It's ideal for linear regression analysis of daily or weekly data.
However, the above method doesn't work properly for irregular periods of consumption, like those gathered from records of oil deliveries...
The problem is with the baseload energy consumption. The method above assumes that the baseload is a constant number, but this assumption only makes sense if the periods of consumption are all the same length.
When records of energy consumption cover periods of various lengths, the baseload energy consumption depends on the lengths in question. If the baseload is 20 kWh for a 1 week period, it will be 40 kWh for a 2 week period, and 60 kWh for a 3 week period.
Baseload energy consumption can't be expressed as a constant unless the length of the period is also a constant.
(If the above statement doesn't make sense, you might be confusing kWh with kW... Many people do! If you're in any doubt, take a look at our article on kW and kWh - it explains both units in detail.)
Because different months have different lengths, using a constant figure for baseload kWh causes slight inaccuracies in correlations of monthly energy consumption with monthly degree days. The more irregular your consumption records, the greater these inaccuracies become.
Fortunately there's another approach that works just as well for irregular data as it does for regular data:
Instead of correlating energy consumption with degree days, correlate energy consumption per day with degree days per day.
To explain this, let's consider the example data that we used previously... Previously we simply correlated the kWh with the HDD, but the improved method involves a correlation of the kWh per day with the HDD per day... Here's how we might arrange this data in a spreadsheet:
Let's explain each column in turn:
Once we have the figures we can create a scatter chart, just like before, except with HDD-per-day and kWh-per-day instead of HDD and kWh:
Like before, we can also add a trendline and the equation of that trendline (see the chart above).
This equation is very similar to the one described earlier, and you can apply it similarly. Just remember that x, y, and the constant, are per-day figures. Once you've calculated the energy consumption per day from an HDD-per-day figure, you can of course multiply it by the number of days in the period to work out the predicted kWh over the whole of the period.
We have already explained the process for investigating the effect of base temperature on the regression analysis (see here). We used kWh figures and HDD figures above, but it's easy to apply the same approach using kWh-per-day figures and HDD-per-day figures. We need to insert a few additional steps between steps 3 and 4 above:
The screenshot below shows one way in which you could organize the data. It might be a little neater to put the HHD/day and kWh/day figures to the right of the original figures, but putting them below makes it easier to fit them all in a screenshot:
In Excel, the trick to calculating HHD/day figures across all periods and base temperatures is to:
In the example above, cell B22 contained the formula "= B8 / $P8". This meant that the "Days" column P was fixed in the formula, but everything else in the formula was relative. So copying the formula across the base temperatures and down the periods worked as desired.
Provided you understood the previous examples, you should hopefully find this slightly-modified method pretty straightforward. But please let us know if anything is unclear. We appreciate that these instructions might be a little intimidating to anyone unfamiliar with Excel formulas and so on, but we're trying to make them as accessible as possible!