Regression analysis is a statistical technique to assess the relationship between two or more variables. In business, you can use it to help prove whether adjusting one variable is likely to impact another e.g. if we invest in more training will our staff become more productive?
This article won’t go into the detailed mathematics behind the calculation, but will cover the basics you need to know in order to run a regression analysis in Excel and make sense of the findings.
In this example we look at how two independent variables might affect the final league position of a football team. Logically, you might expect the number of cup games played will negatively affect league position whereas the amount of money spent on players will positively affect it, but our regression analysis will test this. We test over a 5 year period for two teams, Liverpool and Chelsea.
Here are the step by step instructions to perform a regression analysis on this data using Excel.
1) Firstly you must have the Analysis ToolPak add-in installed if you haven’t already. Go to File – Options – Add-ins and it will show at the top of the list.
2) Go to the Developer tab, and within Add-ins, select Excel Add-ins. Select the Analysis ToolPak.
3) Data Analysis tools will now be available to you from the Data tab. For this example, choose Regression.
4) The Y range is your dependent variable (League Position) and the X range your independent variable(s), in this case money spent on players each season and cup games played. Choose an Output Range of your choice and if you only have a reasonably low number of records it can be useful to tick to return a Residuals output too:
5) Your results will output to the range you selected. As you can see there is a lot of statistical terminology returned, so for the purposes of this article I’ve just highlighted the key ones and will explain what they are showing.
The above are the results for Liverpool FC over course of 5 seasons.
R Square = 0.76 indicates that 76% of the variance in Liverpool’s final league position is explained by the money invested on new players and amount of cup games played. R Square can be anywhere between 0 and 1 and the closer to 1 the better regression. An R Square of 0.76 is a relatively good fit but isn’t statistically significant in its own right, it should be judged alongside the Residuals (see below) to come to a more accurate conclusion.
Significance F = 0.24 within the ANOVA section helps check the statistical significance and reliability of your results. As a rule, a value less than 0.05 is deemed reliable. With this value of 0.24 if you had a bigger dataset you would probably change to a different independent variable as opposed to league position to see if you can find a lower Significance F.
Coefficients X Variable 1 refers to money spent (in millions). And the -0.06 value indicates for every million pounds Liverpool spent they improve their league position by 0.06 places. X Variable 2 refers to the number of cup games played and suggests for every additional cup game Liverpool finish a massive 0.5 places lower in the league. This is a huge figure, but largely influenced by the 2013/14 season where they finished 2nd and only played 5 additional cup games.
P-values are low but not low enough to reach the same 0.05 significance threshold.
Residual Output at the bottom indicates how far each data point (each season) differed from its predicted value based on the money spent and cup games played. E.g. for Observation 1 (the 2014/15 season) the Predicted position would have been 6.09 and they finished 6th, with Residuals of just -0.09. Glancing through the 5 residuals they are all pretty close which might indicate that the two independent variables do have a predictable impact on the final league position based on these 5 seasons.
Chelsea’s results are below, but I’ll leave you to work through the findings using the Liverpool example as reference.
If you’ve found this article helpful, unhelpful, or have any other feedback… please leave us a comment below.