**CALCULATIONS**

### Mode

The most common value for a variable based on its frequency, can be calculated from either qualitative or quantitative data.

### Mean

The average value based on a variable of quantitative data.

### Median

The central value of a variable of quantitative data. Using the median instead of the mean lessens the impact of outliers.

Example: The median UK salary in 2017 was around £22,000 whereas the mean was closer to £26,500 and more heavily influenced by some of be large outliers from the top earners.

### Outlier

A unit that falls far from the rest of the data, which can have a misleading impact on the mean. Outliers can be measured in a couple of different ways, simply as any observations outside of 1.5 multiplied by the IQR (interquartile range) or alternatively outliers can be deemed as any observations outside of two standard deviations from the mean.

### Leading indicators

An indicator that may signal a future event.

Example: A creche getting attached to a restaurant could lead to a higher reported accident rate for the restaurant.

### Lagging indicators

An indicator that follows an event.

Example: Reporting the recent performance of a company’s share price to predict what might happen to it in the future.

### Moving average

An average based on a specific time period which generates a trend-following (or lagging) indicator because it is based on the past.

Example: Opinion polls average based on the previous 10 days, with each day the 11th day dropping off and the new day added.

### Range

The difference between the maximum and minimum values of a quantitative variable in a data set.

### Percentile

The observed values of a variable divided into hundredths. The first percentile (P1) divides the bottom 1% of values from the rest of the data set, the second percentile (P2) the bottom 2%, etc. The median is the 50th percentile (P50).

### Decile

The observed values of a variable divided into tenths. The second decile is the 20th percentile, represented as either D2 or P20.

### Quartile

The most common type of percentile used, dividing the observed values into quarters. There are three quartiles: Q1 divides at 25%, Q2 at 50% (the median) and Q3 at 75%.

### Interquartile range

The difference between the first and third quartiles of a variable (Q3 – Q1). This is the preferred measure of variation where there is a skewed distribution, in order to disregard outliers.

### Variance

The dispersion of the data from the mean. Variance measures the sum of the difference (deviation) between each observation and the mean. We have to square each of these deviations to keep them as positive values, if we didn’t the variance would always sum to zero.

In Excel, VAR.P is the function to use if you have the full population available or VAR.S to estimate the variance if you just have a sample.

### Standard deviation

The most frequently used measure of variability, showing how tightly the observed values cluster around the mean. The standard deviation is the square root of the variance, making it a much easier measurement to interpret.

Once you have the standard deviation of your sample, you can measure how many observations are within one standard deviation of the mean, how many within two standard deviations, etc. Standard deviation can easily be calculated within Excel.

In a perfect normal distribution, you would expect 68% of observations to fall within one standard deviation of the mean, 95% within two standard deviations and 99.7% within three standard deviations. For example with 49 balls in the National Lottery the mean is 25, assuming the results have a normal distribution you’d expect 68% of balls will be within one standard deviation of 25.

### Standard error

The standard error indicates how close the sample mean is from the true population mean, giving us an idea of how reliable our results will be. It is calculated as se = s / n (where se is the standard error, s is the sample’s standard deviation and n is the square root of the total number of observations).

### Probability

The proportion of times a particular outcome would occur in a long run of repeated observations.

### Point estimate

A single number calculated from the data set, that is the best single guess for an unknown parameter.

### Interval estimate

A range of numbers around the point estimate, within which an unknown parameter is expected to fall.

### Confidence interval

A calculation which allows you to provide a % confidence of the probability of a parameter falling within particular values, based on known values for related variables.

### Triangulation

Combining information from multiple sources to help arrive at the most accurate conclusion possible, often by testing the same hypothesis using numerous different methods.

**DECIMALS AND FRACTIONS**

### Decimal places

Decimal places allow you to specify how many decimals you wish to round a continuous qualitative variable to. As a standard, you will round to the nearest decimal place, either up or down, although Excel has a variety of rounding functions you can use including fixed round-ups and round-downs.

Example: 2.46874 to 3 decimal places would be 2.469.

### Significant figures

You can specify a number to a specific amount of significant figures to ensure you are reaching your required degree of accuracy. Unlike decimal places, this will include whole numbers before the decimal.

Example: 2.46874 to 3 significant figures would be 2.47, or to two significant figures it would simply be 2.5.

There is no set formula in Excel for setting significant figures but it can be achieved using a combination of the ROUND, INT, LOG10 and ABS functions.

### Fractions simplest form

Fractions can be converted to their simplest form by reducing both the top and bottom figures to the lowest possible whole numbers. To do this you need to find the greatest common factor of the two numbers involved.