Machine learning overview

Machine learning

Machine learning is the process of computers solving problems by themselves, usually ones that humans are unable to.
An application of artifical intelligence, machine learning places an emphasis on big data and large scale applications. Machine learning can train a computer algorithm to use statistics from data to learn and make forecasts and predictions about the future.
In supervised machine learning, algorithms use the numeric features of known values to detect patterns and trends on the variable we are trying to predict. A ‘label’ is chosen, which is the variable we are predicting, commonly represented as the variable ‘y’. The model maps examples to predicted labels (‘y’) based on internally learned parameters.
Supervised learning methods include regression and classification. Regression models predict continuous values, while classification models predict discrete categorical values e.g. Yes/No.
Unsupervised machine learning doesn’t focus on a particular known variable, instead looking at similiarities among all variables on the observations. Once the model is trained, new observations are assigned to their relevant cluster based on their characteristics.

Deep learning

A subset of machine learning, deep learning is an artifical intelligence function which imitates the workings of the human brain and creates patterns in the data it processes to assist with decision making.

Big Data

Extremely large datasets which are too complex to analyse for insight without the use of sophisticated statistical software. This includes sources like e-commerce platforms, internet search engine data and social media applications.

Full stack

The term ‘full stack’ in data science generally refers to someone with ability and experience in all areas of data science:

  • Machine Learning
  • Big Data
  • ETL (Extract, Transform, Load)
  • Analysis techniques (Regression, Classification, etc)
  • Statistics
  • Programming (usually Python or R)
  • Data modelling
  • Data visualisation and presentation

As well as the technical side, a ‘full stack’ data scientist should also have good business acumen and soft skills like communication, leadership and being a strong influencer.