Correlation

Correlation

Correlation is a way of measuring the relationship between two variables.

 

A value of +1 indicates a perfect positive correlation meaning an increase in one is associated with an increase in the other. -1 would be a perfect negative correlation, where an increase in one field is associated with a decrease in the other. You can use the CORREL function or the PEARSON function in Excel to return the correlation coefficient between two variables.

 

To interpret a correlation coefficient, or ‘r’ as it is referred to:

  • r = -1 (a perfect negative linear relationship)
  • r = -0.70 (a strong negative linear relationship)
  • r = -0.50 (a moderate negative linear relationship)
  • r = -0.30 (a weak negative linear relationship)
  • r = 0 (no linear relationship)
  • r = 0.30 (a weak positive linear relationship)
  • r = 0.50 (a moderate positive linear relationship)
  • r = 0.70 (a strong positive linear relationship)
  • r = 1 (a perfect positive linear relationship)

The correlation coefficient is a measurement of linear relationship, it doesn’t work with curved relationships. Theoretically a U shaped scatterplot could have a strong relationship between x and y yet have a correlation coefficient of 0. The calculation for the correlation coefficient differs slightly depending on which interpretation you use as various statisticians (such as Spearman and Pearson) have devised different ways of measuring the strength of a relationship.

 

It is important to remember correlation is not necessarily causation. When one variable has a direct impact on another there is what’s known as covariance, it could be that x is causing y in the anticipated way. However it could also be the case that reverse causation is taking place (y is causing x) or alternatively there could be a third variable where something else (z) is causing both x and y.

 

 

Spearman’s rank-order correlation coefficient

Spearman’s formula for calculating correlation coefficient is based on ranking all the values for each variable and analysing based on their ranked value, rather than the values in the raw data. As with other correlation measures, the results compare two variables with either 1 or -1 demonstrating perfect correlation and the nearer the value to 0 the less correlation found. As the results are based on ranked values, Spearman’s rank can return a 1 or -1 value using Spearman’s formula which would not return 1 or -1 using Pearson’s.

 

 

Pearson’s product-moment correlation coefficient

The Pearson correlation uses the raw data from the continuous variables to evaluate a linear relationship between two variables. Pearson’s calculation would be favourable over Spearman’s when measuring exact continuous values, like temperature for example, or where there are outliers in the dataset.