The p-value is a function of the observed sample results used for testing a statistical hypothesis. Prior to the test being performed an agreed threshold for the probability value (p-value) should be chosen, which is known as the significance level and will usually be somewhere between 1 and 5%. The results can be measured against the significance level to provide evidence for or against the null hypothesis.


The p-value returns the probability of getting the same result if the null hypothesis is true and the results were actually due to random chance. So what we’re actually testing is if the null was true, what chance is there you would get this result.


As a general rule a p-value of 0.05 (5%) or less is deemed to be statistically significant evidence against a hypothesis:

p > 0.10 = Little evidence against the hypothesis

p <= 0.10 and > 0.05 = Weak evidence against the hypothesis

p <= 0.05 and > 0.01 = Moderate evidence against the hypothesis

p <= 0.01 and > 0.001 = Strong evidence against the hypothesis

p < 0.001 = Very strong evidence against the hypothesis


Whilst a p-value is useful in establishing significance, it is often more informative to calculate a confidence interval to understand the precision of the result. The p-value may tell you to reject the null hypothesis however a confidence interval could also calculate to a threshold (often 95%) and tell you to reject the null hypothesis but also offer the additional information of a range of values the result is probably between.




P-hacking, also known as data dredging, is the process of findings patterns in data which can enable a test to be deemed statistically significance when in reality there is no significant underlying effect.