Category: Data Science: Statistics

Decimals and fractions

Decimal places Decimal places allow you to specify how many decimals you wish to round a continuous qualitative variable to. As a standard, you will round to the nearest decimal place, either up or down, although Excel has a variety of rounding functions you can use including fixed round-ups and round-downs.   Example: 2.46874 to Read More …

Standard error

Standard error The standard error can be used to measure how trustworthy our samples are. It indicates how close the sample mean is from the true population mean, giving us an idea of how reliable the results of our experiment will be. A standard error of 0.01, for example, means that on average our results Read More …

Standard deviation

Standard deviation The standard deviation is a measure of the average deviation within a batch. It is the most frequently used measure of variability, used to explain how tightly the observed values cluster around the mean. When batches are more spread out, they have larger standard deviations.   If you were to sum the deviation Read More …

Variance

Variance The dispersion of the data from the mean. Variance measures the sum of the difference (deviation) between each observation and the mean. We have to square each of these deviations to keep them as positive values, if we didn’t the variance would always sum to zero. In Excel, VAR.P is the function to use Read More …

Interquartile range

Interquartile range The difference between the first (lower) and third (upper) quartiles of a variable (Q3 – Q1). This is the preferred measure of variation where there is a skewed distribution, in order to disregard outliers.   To measure the lower quartile, where n is the number of observations you need to calculate 1/4 of Read More …

Percentiles

Percentile The observed values of a variable divided into hundredths. The first percentile (P1) divides the bottom 1% of values from the rest of the data set, the second percentile (P2) the bottom 2%, etc. The median is the 50th percentile (P50).     Decile The observed values of a variable divided into tenths. The Read More …

Moving averages

Moving average An average based on a specific time period which generates a trend-following (or lagging) indicator because it is based on the past.   Example: Opinion polls average based on the previous 10 days, with each day the 11th day dropping off and the new day added.    

Leading and lagging indicators

Leading indicators An indicator that may signal a future event.   Example: A creche getting attached to a restaurant could lead to a higher reported accident rate for the restaurant.     Lagging indicators An indicator that follows an event.   Example: Reporting the recent performance of a company’s share price to predict what might Read More …

Measures of location

Mode The most common value for a variable based on its frequency, can be calculated from either qualitative or quantitative data. The MODE function can be used in Excel to return this.     Mean The average value based on a variable of quantitative data.   The arithmetic mean is the most commonly used. In Read More …