There are numerous different techniques you can use to model your data, which you choose is dependent on the problem you’re wanting to solve:
- Classification may be used to solve a Yes/No question
- Regression can be used to predict a numerical value
- Clustering can group observations into similar looking groups
When you have specific definitions to group your data by, predictive modelling can be a useful alternative to clustering. Variables found to be statistically significant predictors of another variable can be used to define segmentations for your analysis.
Statistical learning emphasises more on mathematics and statistical models with their various interpretations and precisions.
The technology used for collecting, store, processing, transforming and analysing raw data in order to make it useful for gaining insights.
One way to fine tune a data model is by adding and improving its features. Feature engineering maps raw data to machine learning features, which might take the form of mapping text data into numeric categories which can then be more easily analysed within the model.
A term to describe where a model has been iterated several times to the effect that it is performing more accurately with the test data than it would with any new data.
Regularisation, also referred to as ‘shrinkage’, is the process of adding information into your data as a technique for avoiding overfitting in your machine learning model.
Cross validation is a process for evaluating a machine learning algorithm which is also a technique to prevent overfitting. Nested cross validation is a method for tuning the parameters of an algorithm.