Degree Days

Degree Days

Weather Data for Energy Professionals

Weather Underground

Linear Regression Analysis of Energy Consumption Data

We get a lot of questions along the lines of "how do I do this using degree days?" It's very common for the answers to involve linear regression analysis.

There are many text books and online resources that explain what linear regression analysis is... But the theory can get a little heavy going... So we wrote this short article to explain just the basics of regression analysis using energy consumption figures and degree days.

First things first, do you know what a degree day is?

Before diving into regression analysis, it rather helps if you understand what a degree day actually is... This article provides a good introduction.

Getting the raw data

To do linear regression analysis, you need to correlate energy-consumption data with degree-day data:

Getting the energy consumption data

You might have detailed interval data from a smart meter but more likely you'll have weekly or monthly records of energy consumption that you've collected yourself, or energy bills from a utility or energy supplier.

Most buildings follow a weekly routine, which means that weekly energy-consumption data is typically a good option for regression analysis. Although the occupancy of the building and the heating patterns might vary throughout the week, the patterns are usually fairly consistent from one week to the next.

Monthly data is usually OK too, but it's rarely as good as weekly data, because the days of the week don't line up with calendar months (e.g. one month might have 5 weekends, the next might have 4). Unless your building is heated and used in the same way on weekends as it is on weekdays, these calendar mismatches will cause inaccuracies in your calculations.

If you have detailed interval energy consumption data (typically readings taken automatically every 60 minutes or less), you can use our Energy Lens software to turn it into daily, weekly, or monthly kWh data.

You might not have much choice about what energy-consumption data you use for your correlation. But you should try to get data for at least a few periods of measured energy consumption. If you've got daily, weekly, or monthly data, try to cover at least one full heating or cooling season. If you've got annual data, try to cover at least a few years of consumption.

If you're using meter readings provided by your utility or energy supplier, make sure not to use any estimated meter readings. Estimated readings are no use at all for this analysis!

Getting the degree-day data

If you're investigating heating energy consumption you'll want heating degree days; if you're investigating cooling energy consumption you'll want cooling degree days.

You will need one degree-day figure for each period of measured energy consumption. If your periods of measured energy consumption are irregular, you'll need to get daily degree days and sum them together to make a total for each period.

Our Degree Days.net tool enables you to access data in a variety of timescales, including daily data (which you can sum into figures matching any period spanning multiple days).

Correlating energy usage with degree days

Above we explained how to get the two sets of data (energy consumption and degree days). Next you need to correlate these two sets of data...

The method that follows is ideal if your periods of measured energy consumption are all the same length (like weeks, which are all 7 days long)... But there's a slightly more complicated method that we'll introduce shortly that works slightly better for monthly data and that is usually necessary for regression analysis of irregular periods of consumption.

Our explanation of the improved method is based on this one, so please do go through this method first.

Energy Consumption and HDD in Excel

You might want to start by making 3 columns in Excel (or whatever spreadsheet software package you have):

  1. Start of period (you don't need this for the correlation, but it'll probably be useful for keeping track of the data).
  2. Degree days (either HDD or CDD, depending on whether you're investigating heating or cooling energy consumption).
  3. kWh (or BTU, or litres of oil, or whatever units your records of energy consumption have).

For an example, see the screenshot to the right.

You can then use the second and third columns of data to plot an X-Y scatter chart of HDD (or CDD) against energy consumption.

Enhancing the scatter chart

Once you've made the basic scatter chart, there are a few important extras that you'll probably want: a trend line, an equation, and an R2 value. To get these using Excel 2003:

  1. Right-click one of the data points and select "Add Trendline...".
  2. For the "Type" select "Linear" (we're doing linear regression analysis).
  3. Click the "Options" tab and check the boxes to "Display equation on chart" and "Display R-squared value on chart".

The exact steps might be a little different on Excel 2007, but hopefully you'll figure it out if you're using that version of Excel...

Either way, you should end up with something like the chart below:

Regression analysis chart of kWh against HDD

What does the equation mean?

Most importantly the equation enables you to estimate kWh from degree days. By plugging a known HDD or CDD figure into the equation you can calculate the predicted energy consumption for the period that the HDD/CDD covered. You can then compare the predicted energy consumption with the actual energy consumption for that period. You would typically do this to see whether the energy efficiency has got better or worse than it was in the period that you did the original regression analysis for.

What does the R2 show?

The R2 value is basically a measure of how good the correlation is. The closer the R2 value is to 1, the better the correlation.

The example chart above shows a very good correlation. In this instance you don't need the R2 value to see that - it's clear from looking at the chart... But correlations between degree days and energy consumption are rarely that good, so the R2 value can be useful for making objective comparisons.

As explained in this article, the base temperature of the degree days makes a big difference to how well they correlate with the energy consumption of any particular building.

It's very difficult to estimate the correct base temperature accurately for any particular building using logic alone, so it's often best to make a rough estimate (e.g. 15.5C) and then try correlating kWh with degree days calculated to various base temperatures around that point.

The R2 values make it easier to see which base temperature gives the best correlation - this helps you to decide which base temperature you should use for your degree-day-based analysis of the building's energy consumption. (The optimal base temperature varies from building to building.)

Using a formula to calculate R2 and determine the optimal base temperature

The RSQ formula in Excel enables you to calculate R2 without making a chart. What's great about this is that you can quickly calculate R2 for multiple correlations, using degree days with a range of base temperatures. And this makes it easy (or at least fairly easy) to determine the optimal base temperature of the building.

The screenshot below shows a spreadsheet containing one set of energy-consumption data, multiple sets of degree days (all with different base temperatures), and R2 values for each energy/degree-days correlation:

Using Excel's RSQ function to determine the optimal base temperature

Confused? Yes, it is a little confusing! Here are some step-by-step instructions explaining how to use this method to estimate a building's base temperature:

  1. Make a rough estimate of the building's base temperature (you can find some basic information and guidance here).
  2. Select your estimated base temperature on Degree Days.net, check the "Include base temperatures nearby" box, and generate and download the degree days. You'll notice that the data has been calculated to a range of base temperatures either side of your estimate. It'll look a lot like the spreadsheet above, but without the blue cells, which are the ones we added in ourselves (you don't need to make them blue!).
  3. Add your energy-consumption figures to the right of the last column of degree days. (The energy-consumption periods need to match the periods that you generated the degree days for.)
  4. Under the last column of degree days, use the RSQ formula, selecting your energy-consumption values as the known_y's (these are the values in column "O" in the screenshot above).
  5. After selecting the known_y's, hit F4 to make Excel insert $ symbols in front of the row and column references. The $ symbols "fix" the referenced cells so that you can copy the formula without the referenced cells changing. (Strictly speaking you only need the $ symbols in front of the column references, to fix the column on the energy data, but in this instance it doesn't hurt to have them in front of the row references as well.)
  6. Next, for the known_x's part of the formula, select the degree days from the last column of degree-day data (column "N" in the screenshot above). You don't want any $ symbols here.
  7. You should have a formula something like the one you can see at the top of the image above.
  8. Now, take the cell that you entered the formula into, and copy and paste it across the row, filling the cells under each column of degree-day data.
  9. You should now be able to see the R2 value for each base temperature. For our screenshot above, we reduced the column widths to make the screenshot smaller, but you should be able to see more accurate R2 values on your spreadsheet. Your R2 values will probably be lower too - energy/degree-days correlations are rarely as good as those shown above.
  10. The column with the highest R2 value should give you a pretty good approximation of the building's base temperature. If it's the lowest or the highest base temperature you should probably generate more degree days with lower/higher base temperatures and test them out to see if you can improve your R2 further.

Still confused? Completely understandable if you're new to $ symbols and formulas in Excel...

But this method is worth figuring out - it's the best method we know of for determining a building's base temperature. And determining the base temperature properly should significantly improve the accuracy of all further degree-day-based calculations for that building, so it's worth doing it well if you can.

Improved correlation method for irregular data

Correlating energy usage with degree days, as described above, works well when all the energy-consumption records cover identical periods of time. It's ideal for linear regression analysis of daily or weekly data.

However, the above method doesn't work properly for irregular periods of consumption, like those gathered from records of oil deliveries...

What's wrong with correlating energy usage with degree days covering irregular periods?

The problem is with the baseload energy consumption - the constant in the equation of the straight line (48.186 in the example above). The method above assumes that the baseload is a constant number, but this assumption only makes sense if the periods of consumption are all the same length.

When records of energy consumption cover periods of various lengths, the baseload energy consumption depends on the lengths in question. If the baseload is 20 kWh for a 1 week period, it will be 40 kWh for a 2 week period, and 60 kWh for a 3 week period.

Baseload energy consumption can't be expressed as a constant unless the length of the period is also a constant.

(If the above statement doesn't make sense, you might be confusing kWh with kW... Many people do! If you're in any doubt, take a look at our article on kW and kWh - it explains both units in detail.)

Because different months have different lengths, using a constant figure for baseload kWh causes slight inaccuracies in correlations of monthly energy consumption with monthly degree days. The more irregular your consumption records, the greater these inaccuracies become.

Fortunately there's another approach that works just as well for irregular data as it does for regular data:

The slightly-more-complicated solution that works well for irregular data

Instead of correlating energy consumption with degree days, correlate energy consumption per day with degree days per day.

To explain this, let's consider the example data that we used previously... Previously we simply correlated the kWh with the HDD, but the improved method involves a correlation of the kWh per day with the HDD per day... Here's how we might arrange this data in a spreadsheet:

Excel data for a more sophisticated correlation of energy per day with HDD per day

Let's explain each column in turn:

Once we have the figures we can create a scatter chart, just like before, except with HDD per day and kWh per day instead of HDD and kWh:

Regression analysis chart of kWh per day against HDD per day

Like before, we can also add a trendline and the equation of that trendline (see the chart above).

What does the equation mean now?

This equation is very similar to the one described earlier, and you can apply it similarly. Just remember that x, y, and the constant, are per-day figures. Once you've calculated the energy consumption per day from an HDD per day figure, you can of course multiply it by the number of days in the period to work out the predicted kWh over the whole of the period.

Using this modified method to determine the optimal base temperature

We have already explained how to determine the optimal base temperature using Excel's RSQ function, energy consumption figures, and corresponding degree-day figures for a range of base temperatures (see here). We used kWh figures and HDD figures above, but it's easy to apply the same approach using kWh-per-day figures and HDD-per-day figures. We need to insert a few additional steps between steps 3 and 4 above:

The screenshot below shows one way in which you could organize the data. It might be a little neater to put the HHD/day and kWh/day figures to the right of the original figures, but putting them below makes it easier to fit them all in a screenshot:

Determining the optimal base temperature using kWh-per-day and HDD-per-day figures

In Excel, the trick to calculating HHD/day figures across all periods and base temperatures is to:

In the example above, cell B22 contained the formula "= B8 / $P8". This meant that the "Days" column P was fixed in the formula, but everything else in the formula was relative. So copying the formula across the base temperatures and down the periods worked as desired.

Provided you understood the previous examples, you should hopefully find this slightly-modified method pretty straightforward. But please let us know if anything is unclear. We appreciate that these instructions might be a little intimidating to anyone unfamiliar with Excel formulas and so on, but we're trying to make them as accessible as possible!

Specify and Generate Your Degree Days Now

© 2010 BizEE Software Limited - About | Contact | Privacy Policy | FAQ