Linear Regression Analysis of Energy Consumption Data
We get a lot of questions along the lines of "how do I do this using degree days?" It's very common for the answer to involve linear regression analysis.
There are many text books and online resources that explain what linear regression analysis is... But the theory can get a little heavy going... So we wrote this short article to explain just the basics of regression analysis using energy consumption figures and degree days.
First things first, do you know what a degree day is?
Before diving into regression analysis, it rather helps if you understand what a degree day actually is... This article provides a good introduction.
Getting the raw data
To do linear regression analysis, you need to correlate energy-consumption data with degree-day data:
Getting the energy consumption data
You might have detailed interval data from a smart meter but more likely you'll have weekly or monthly records of energy consumption that you've made yourself, or energy bills from a utility or energy supplier.
For most buildings, which aren't heated/cooled every day of the week, weekly energy-consumption data is ideal, as most weeks are similar in terms of their heating-timer patterns. In contrast, because the days of the week don't line up with calendar months (e.g. one month may have 5 weekends, the next may have 4), monthly data isn't usually as good. (Unless your building is heated every day of the year, in which case daily, weekly, and monthly data should all work well.)
If you have detailed interval energy consumption data (typically readings taken automatically every 60 minutes or less), you can use our Energy Lens software to turn it into daily, weekly, or monthly kWh data.
You might not have much choice about what energy-consumption data you use for your correlation. But you should try to get data for at least a few periods of measured energy consumption. If you've got daily, weekly, or monthly data, try to cover at least one full heating or cooling season. If you've got annual data, try to cover at least a few years of consumption.
If you're using meter readings provided by your utility or energy supplier, make sure not to use any estimated meter readings. Estimated readings are no use at all for this analysis!
Getting the degree-day data
If you're investigating heating energy consumption you'll want heating degree days; if you're investigating cooling energy consumption you'll want cooling degree days.
You will need one degree-day figure for each period of measured energy consumption. If your periods of measured energy consumption are irregular, you'll need to get daily degree days and sum them together to make a total for each period.
Our Degree Days.net tool enables you to access data in a variety of timescales, including daily data (which you can sum into figures matching any period spanning multiple days).
Correlating energy usage with degree days
Above we explained how to get the two sets of data (energy consumption and degree days). Next you need to correlate these two sets of data...
You might want to start by making 3 columns in Excel (or whatever spreadsheet software package you have):
- Start of period (you don't need this for the correlation, but it'll probably be useful for keeping track of the data).
- Degree days (either HDD or CDD, depending on whether you're investigating heating or cooling energy consumption).
- kWh (or BTU, or litres of oil, or whatever units your records of energy consumption have).
For an example, see the screenshot to the right.
You can then use the second and third columns of data to plot an X-Y scatter chart of HDD (or CDD) against energy consumption.
Enhancing the scatter chart
Once you've made the basic scatter chart, there are a few important extras that you'll probably want: a trend line, an equation, and an R2 value. To get these using Excel 2003:
- Right-click one of the data points and select "Add Trendline...".
- For the "Type" select "Linear" (we're doing linear regression analysis).
- Click the "Options" tab and check the boxes to "Display equation on chart" and "Display R-squared value on chart".
The exact steps might be a little different on Excel 2007, but hopefully you'll figure it out if you're using that version of Excel...
Either way, you should end up with something like the chart below:
What does the equation mean?
- The "y" corresponds to kWh (or BTU, or whatever your units of energy consumption are).
- The "x" corresponds to your degree days (HDD or CDD, whichever you used).
- The figure that multiplies the x (3.3215 in the example chart above) represents the gradient of the trend line.
- The constant at the end (48.186 in the example chart above) represents the point at which the trend line crosses the y axis. In theory this should represent the "baseload energy consumption" (see here for more on this).
Most importantly the equation enables you to estimate kWh from degree days. By plugging a known HDD or CDD figure into the equation you can calculate the predicted energy consumption for the period that the HDD/CDD covered. You can then compare the predicted energy consumption with the actual energy consumption for that period. You would typically do this to see whether the energy efficiency has got better or worse than it was in the period that you did the original regression analysis for.
What does the R2 show?
The R2 value is basically a measure of how good the correlation is. The closer the R2 value is to 1, the better the correlation.
The example chart above shows a very good correlation. In this instance you don't need the R2 value to see that - it's clear from looking at the chart... But correlations between degree days and energy consumption are rarely that good, so the R2 value can be useful for making objective comparisons.
As explained in this article, the base temperature of the degree days makes a big difference to how well they correlate with the energy consumption of any particular building.
It's very difficult to estimate the correct base temperature accurately for any particular building using logic alone, so it's often best to make a rough estimate (e.g. 15.5C) and then try correlating kWh with degree days calculated to various base temperatures around that point.
The R2 values make it easier to see which base temperature gives the best correlation - this helps you to decide which base temperature you should use for your degree-day-based analysis of the building's energy consumption. (The optimal base temperature varies from building to building.)
Using a formula to calculate R2 and determine the optimal base temperature
The RSQ formula in Excel enables you to calculate R2 without making a chart. What's great about this is that you can quickly calculate R2 for multiple correlations, using degree days with a range of base temperatures. And this makes it easy (or at least fairly easy) to determine the optimal base temperature of the building.
The screenshot below shows a spreadsheet containing one set of energy-consumption data, multiple sets of degree days (all with different base temperatures), and R2 values for each energy/degree-days correlation:
Confused? Yes, it is a little confusing! Here are some step-by-step instructions explaining how to use this method to estimate a building's base temperature:
- Make a rough estimate of the building's base temperature (you can find some basic information and guidance here).
- Select your estimated base temperature on Degree Days.net, check the "Include base temperatures nearby" box, and generate and download the degree days. You'll notice that the data has been calculated to a range of base temperatures either side of your estimate. It'll look a lot like the spreadsheet above, but without the blue cells, which are the ones we added in ourselves (you don't need to make them blue!).
- Add your energy-consumption figures to the right of the last column of degree days. (The energy-consumption periods need to match the periods that you generated the degree days for.)
- Under the last column of degree days, use the RSQ formula, selecting your energy-consumption values as the known_y's (these are the values in column "O" in the screenshot above).
- After selecting the known_y's, hit F4 to make Excel insert $ symbols in front of the row and column references. The $ symbols "fix" the referenced cells so that you can copy the formula without the referenced cells changing. (Strictly speaking you only need the $ symbols in front of the column references, to fix the column on the energy data, but in this instance it doesn't hurt to have them in front of the row references as well.)
- Next, for the known_x's part of the formula, select the degree days from the last column of degree-day data (column "N" in the screenshot above). You don't want any $ symbols here.
- You should have a formula something like the one you can see at the top of the image above.
- Now, copy that formula into the cells under each of the columns of degree-day data.
- You should now be able to see the R2 value for each base temperature. For our screenshot above, we reduced the column widths to make the screenshot smaller, but you should be able to see more accurate R2 values on your spreadsheet. Your R2 values will probably be lower too - energy/degree-days correlations are rarely as good as those shown above.
- The column with the highest R2 value should give you a pretty good approximation of the building's base temperature. If it's the lowest or the highest base temperature you should probably generate more degree days with lower/higher base temperatures and test them out to see if you can improve your R2 further.
Still confused? Completely understandable if you're new to $ symbols and formulas in Excel...
But this method is worth figuring out - it's the best method we know of for determining a building's base temperature. And determining the base temperature properly should significantly improve the accuracy of all further degree-day-based calculations for that building, so it's worth doing it well if you can.