Data modelling
There are numerous different techniques you can use to model your data, which you choose is dependent on the problem you’re wanting to solve:
- Classification may be used to solve a Yes/No question
- Regression can be used to predict a numerical value
- Clustering can group observations into similar looking groups
Predictive modelling
When you have specific definitions to group your data by, predictive modelling can be a useful alternative to clustering. Variables found to be statistically significant predictors of another variable can be used to define segmentations for your analysis.
Statistical learning
Statistical learning emphasises more on mathematics and statistical models with their various interpretations and precisions.
Data mining
The technology used for collecting, store, processing, transforming and analysing raw data in order to make it useful for gaining insights.
Feature engineering
One way to fine tune a data model is by adding and improving its features. Feature engineering maps raw data to machine learning features, which might take the form of mapping text data into numeric categories which can then be more easily analysed within the model.
Overfitting
A term to describe where a model has been iterated several times to the effect that it is performing more accurately with the test data than it would with any new data.
Regularisation
Regularisation, also referred to as ‘shrinkage’, is the process of adding information into your data as a technique for avoiding overfitting in your machine learning model.
Cross validation
Cross validation is a process for evaluating a machine learning algorithm which is also a technique to prevent overfitting. Nested cross validation is a method for tuning the parameters of an algorithm.