**DATA MINING**

### Data mining

The technology used for collecting, store, processing, transforming and analysing raw data in order to make it useful for gaining insights.

**SOFTWARE**

### Microsoft Azure Machine Learning Studio

A drag-and-drop tool with a graphical user interface for building, testing and deploying predictive analytics solutions on your data.

### SAS Enterprise Miner

A solution for creating accurate predictive and descriptive data models using data mining and statistical techniques such as linear regression, clustering and classification (decision trees).

### Jupyter Notebooks

Jupyter Notebooks are used to explore datasets through an interactive browser-based environment in which you can add notes and run code to manipulate and visualise data. They support languages regularly used by data scientists such as R and Python.

**PROGRAMMING LANGUAGES**

### Object-oriented vs Procedural programming

Object-oriented programming is based on the concept of structured data, organised in fields within tables, with operations (functions) that can be applied to the structure. Procedural, or imperative programming focuses on explicit sequences of instructions to run a task.

### R

R is an open source programming language and software environment widely used for statistical analysis, testing and modelling.

### Python

Python is another open source language used for detailed statistical analysis, testing and modelling. It is considered object-oriented and is often used for building reusable code patterns.

**VISUALISATIONS**

### Histogram

Similar in look to a horizontal bar graph except the bars are connected to each other, histograms are formed from grouped data to display frequencies or relative frequencies (percentages) for each class in a sample.

### Scattergrams

A method of displaying the correlation between two or more variables, including a line of best fit to demonstrate how far each observation deviates from the mean.

### Frequency polygon

Line chart plotted at the mid-point of each class, with the classes grouped e.g. into 0-10, 11-20, etc.

### Venn diagram

Presented as two or more circles overlapping each other to demonstrate relationships between variables.

Example: Animals with two legs and animals who can fly. Some would show in one group or the other and some would overlap into both groups.

### Tree diagram

Uses probability to demonstrate outcomes based on more than one input.

Example: The first branch could be Europe, the second branches splitting out Germany, France and Spain and then the third branches split out the various cities in those countries.

### Box plots

A one-dimensional graph based on the numerical data from the five-number summary.