The most common value for a variable based on its frequency, can be calculated from either qualitative or quantitative data. The MODE function can be used in Excel to return this.
The average value based on a variable of quantitative data.
The arithmetic mean is the most commonly used. In Excel, the AVERAGE function returns the arithmetic mean which simply divides the total of all values by the number of observations.
The geometric mean is returned by the GEOMEAN function, which is often used to measure the average rate of growth e.g. when calculating interest rates over a number of years.
The harmonic mean is calculated using the HARMEAN function in Excel. One potential use would be calculating the average rate of speed e.g. if you travel 10 miles at 30 miles per hour and then another 10 miles at 70 miles per hour the formula =HARMEAN(30,70) will return the result of 42.
The central value of a variable of quantitative data. Using the median instead of the mean lessens the impact of outliers.
Example: The median UK salary in 2017 was around £22,000 whereas the mean was closer to £26,500 and more heavily influenced by some of be large outliers from the top earners.
Found by taking the average of the minimum and maximum values from the data set.
A unit that falls far from the rest of the data, which can have a misleading impact on the mean. Outliers can be measured in a couple of different ways, simply as any observations outside of 1.5 multiplied by the IQR (interquartile range) or alternatively outliers can be deemed as any observations outside of two standard deviations from the mean.
The TRIMMEAN function in Excel can be used to exclude outliers from your measure of central tendency. You can use TRIMMEAN to specify a percentage of the top and bottom values to exclude from the mean calculation.
An indicator that may signal a future event.
Example: A creche getting attached to a restaurant could lead to a higher reported accident rate for the restaurant.
An indicator that follows an event.
Example: Reporting the recent performance of a company’s share price to predict what might happen to it in the future.
An average based on a specific time period which generates a trend-following (or lagging) indicator because it is based on the past.
Example: Opinion polls average based on the previous 10 days, with each day the 11th day dropping off and the new day added.
The difference between the maximum and minimum values of a quantitative variable in a data set.
The observed values of a variable divided into hundredths. The first percentile (P1) divides the bottom 1% of values from the rest of the data set, the second percentile (P2) the bottom 2%, etc. The median is the 50th percentile (P50).
The observed values of a variable divided into tenths. The second decile is the 20th percentile, represented as either D2 or P20.
The most common type of percentile used, dividing the observed values into quarters. There are three quartiles: Q1 divides at 25%, Q2 at 50% (the median) and Q3 at 75%.
The difference between the first (lower) and third (upper) quartiles of a variable (Q3 – Q1). This is the preferred measure of variation where there is a skewed distribution, in order to disregard outliers.
To measure the lower quartile, where n is the number of observations you need to calculate 1/4 of n+1. E.g. if n=18 then the IQR is 4.75, which means it is three quarters of the way between the 4th and 5th ordered values.
To measure the upper quartile, you need to calculate 1/4 of n+1 *3. E.g. if n=18, 1/4 of 57 = 14.25 so the third quartile lies one quarter of the way between the 14th and 15th ordered values.
The dispersion of the data from the mean. Variance measures the sum of the difference (deviation) between each observation and the mean. We have to square each of these deviations to keep them as positive values, if we didn’t the variance would always sum to zero.
In Excel, VAR.P is the function to use if you have the full population available alternatively VAR.S or VAR will estimate the variance if you just have a sample.
The standard deviation is a measure of the average deviation within a batch. It is the most frequently used measure of variability, used to explain how tightly the observed values cluster around the mean. When batches are more spread out, they have larger standard deviations.
If you were to sum the deviation from the mean for each value in a batch, it will always equal zero. However, if you sum the squared deviation that isn’t the case, that gives you the variance. The standard deviation takes the square root of the variance, changing the unit of measurement back to something much easier to interpret.
Once you have the standard deviation of your sample, you can measure how many observations are within one standard deviation of the mean, how many within two standard deviations, etc. Standard deviation can easily be calculated within Excel.
In a perfect normal distribution, you would expect 68% of observations to fall within one standard deviation of the mean, 95% within two standard deviations and 99.7% within three standard deviations. For example with 49 balls in the National Lottery the mean is 25, assuming the results have a normal distribution you’d expect 68% of balls will be within one standard deviation of 25.
Measures of location and spread
The mean and the median are measures of location. The standard deviation and interquartile range are measures of spread.
The mean and standard deviation are commonly used together, these are less resistant measures which are sensitive to outliers. The median and interquartile range are commonly used together, they are more resistant measures not impacted by outliers.
The standard error can be used to measure how trustworthy our samples are. It indicates how close the sample mean is from the true population mean, giving us an idea of how reliable the results of our experiment will be. A standard error of 0.01, for example, means that on average our results are likely to be 1% out from the true population and statistically very reliable.
Standard error is calculated as se = s / n (where se is the standard error, s is the sample’s standard deviation and n is the square root of the total number of observations).
The proportion of times a particular outcome would occur in a long run of repeated observations.
A single number calculated from the data set, that is the best single guess for an unknown parameter.
A range of numbers around the point estimate, within which an unknown parameter is expected to fall.
A calculation which provides a percentage confidence of the probability of a parameter falling within particular values, based on known values for related variables.
For example your Z test could return a confidence interval on a survey question where respondents have scored 1-10 which may state 95% confidence that customer satisfaction is between 7 and 8 out of 10.
Combining information from multiple sources to help arrive at the most accurate conclusion possible, often by testing the same hypothesis using numerous different methods.
DECIMALS AND FRACTIONS
Decimal places allow you to specify how many decimals you wish to round a continuous qualitative variable to. As a standard, you will round to the nearest decimal place, either up or down, although Excel has a variety of rounding functions you can use including fixed round-ups and round-downs.
Example: 2.46874 to 3 decimal places would be 2.469.
You can specify a number to a specific amount of significant figures to ensure you are reaching your required degree of accuracy. Unlike decimal places, this will include whole numbers before the decimal.
Example: 2.46874 to 3 significant figures would be 2.47, or to two significant figures it would simply be 2.5.
There is no set formula in Excel for setting significant figures but it can be achieved using a combination of the ROUND, INT, LOG10 and ABS functions.
Fractions simplest form
Fractions can be converted to their simplest form by reducing both the top and bottom figures to the lowest possible whole numbers. To do this you need to find the greatest common factor of the two numbers involved.
Parallel lines go in the same direction but do not intersect. E.g. two sides of a piece of paper.
Perpendicular lines are lines that intersect at a right angle. E.g. two lines crossing to form an ‘x’, if any one of the angles are 90° then all four of the angles will be 90°. If the lines intersect but none of the angles are at 90° then they are neither parallel not perpendicular.
Lines of symmetry perfectly divide a shape into multiple identical parts. E.g. dividing a square into four equally sized quadrants.
Right angles are formed when a horizontal line is met by a perfect vertical line at 90°.
Acute angles are any angles smaller than a right angle and therefore less than 90°.
Obtuse angles are any angles larger than a right angle and above 90°.
Shapes and area
Triangles can be defined by the length of their sides or by their angle types.
Scalene triangles are where none of the sides are equal. Isosceles triangles are where at least two of the sides have equal length. Equilateral triangles have all three sides of equal length.
All triangles have three angles adding to 180°.
Acute triangles are where all three angles are less than 90°. Right triangles have one angle which is exactly 90°. Obtuse triangles have one angle which is greater than 90°.
Polygons are any shape with at least three straight sides and angles, typically five or more. A ‘regular polygon’ is equilateral (all sides have the same length) and also has all angles equal in measure.
Quadrilaterals are polygons with four sides and four corners. Therefore all squares and rectangles are quadrilaterals. Parallelograms are quadrilaterals with two pairs of parallel sides.
The perimeter is the length of an outline of a shape, calculated by adding together the length of each side.
The area of a rectangle is measured in square units, which could be square inches, square feet, square metres, etc. The calculation is simply length multiplied by width, represented by A = L * W.
The area of a triangle can be calculated using an adapted version of the formula for the area of a rectangle. If you copy a triangle, flip it 180° and place it next to the original you end up with a parallelogram, a type of rectangle. You need to half the ordinary rectangle calculation, because you only want the area of the triangle. Therefore the formula is half the base multiplied by the height, A = W * 0.5 * L.
To calculate the area of a circle you first need to be able to understand and define the following:
Radius – The distance from the centre of a circle to the edge
Diameter – The full width of a circle
Circumference – The distance around the circle
Pi – The ratio of a circle’s circumference to its diameter, which is 3.14159265.
The area of a circle is calculated as pi multiplied by the square of the radius: A = π * r^2.
Pythagoras’ theorem can be used when we know two sides of a triangle in order to calculate the third side. Pythagoras discovered that on a right triangle, the square of the hypotenuse (the side opposite the right angle, which is always the longest side) is equal to the sum of the squares of the other two sides a^2 + b^2 = c^2.
Trigonometry is the study of triangles. Three trigonometric functions that are specific to right-angled triangles are sine, cosine and tangent, often shortened to sin, cos and tan.
Firstly it is important to understand the terms used to describe the three sides of a right triangle:
Hypotenuse – The side opposite the right angle, always the longest side
Opposite – The side opposite the angle you’re working with
Adjacent – The side next to the angle you’re working with
Sin = Opposite / Hypotenuse
Cos = Adjacent / Hypotenuse
Tan = Opposite / Adjacent
The acronym SOHCAHTOA is helpful to remember which sides go which each function.
These functions can be used to relate the angles in the triangle to the side lengths.
Algebra deals with letters and symbols and rules to manipulate those symbols used to represent numbers and quantities within formulae and equations.
Algebraic expressions commonly include a mixture of letters, numbers and other symbols. E.g. x + 5 = y. To represent multiplication you don’t need to use a multiplication symbol, 5 multiplied by x can simply be represented by 5x.
The Distributive Property is a mathematical law which means we can distribute the same operation to terms with parenthesis in the same equation.
The order of operations is used in both mathematics and computer programming to define which procedures are performed first in a complex formula. PEMDAS is an acronym to help remember the order:
Multiplication and Division (from left to right)
Addition and Subtraction (from left to right))
Linear equations are equations between two variables that give a straight line when plotted on a chart. While we can’t determine definite values of either variable, we can solve the equation for how to calculate one of the variables with respect to the other.
Firstly isolate the y term: 2y=2+4x
Divide both sides by two: y=1+2x
Now we know the value of y for any given value of x. All the possible values of x and y could be plotted on a chart to visualise the linear growth of the equation.
Equations that can be rearranged into the form y = mx + c will produce what is known as a straight line graph.
Example: x + y = 3 can be rearranged into y = 3 – x
Quadratic equations in algebra follow the form ax^2 + bx + c = 0 where x represents an unknown variable but a, b and c have known values. The value of a must be >0 for the equation to be quadratic, otherwise it is linear.
Polynomials can include many different terms: numeric values, variables (like x or y) and exponents (like squaring numbers). They can be combined by addition, subtraction, multiplication and division but not by division within a variable. E.g. 5xy^2 = 3x + 4y is a polynomial but 3xy^-2 is not.
An exponent is a quantity representing the power to which a given value is to be raised. For example 2^4 is equivalent to 2*2*2*2. Logarithms explain how many times a value needs to be multiplied to achieve another value. Using the same example of 2^4 = 16, this could also be represented as the log of 16 equalling 4: log(16)=4.
A linear scale can be visualised similar to a ruler or a tape measure, where the distance between all numbers are the same. This can also be applicable to minus numbers on a linear scale. A logarithmic scale, however, is based on multiplication of numbers rather than addition. When you move distance between numbers on a logarithmic number line you are multiplying by the previous number. E.g. a linear scale may add 10 to each value (0, 10, 20, 30, 40, 50) whereas a logarithmic scale which is scaling by 10 would multiply each value by 10 (1, 10, 100, 1000, 10000, 100000).
Linear growth is when a constant amount is fixed as the increase between each value e.g. 2, 5, 8, 11 which indicates a linear relationship. Exponential growth multiplies between values e.g. 1, 3, 9, 27 multiplying each by 3 and indicating an exponential relationship.
Vectors are quantities which explain the position of a point in space relative to another, defined by directional co-ordinates.
A matrix is an array of numbers that can be arranged into a tabular format with rows and columns. You can add or subtract matrices of the same size and structure and you can also transpose matrices to switch the row and column orientation.