Data Science

SOFTWARE

Microsoft Azure Machine Learning Studio

A drag-and-drop tool with a graphical user interface for building, testing and deploying predictive analytics solutions on your data.

 

SAS Enterprise Miner

A solution for creating accurate predictive and descriptive data models using data mining and statistical techniques such as linear regression, clustering and classification (decision trees).

 

Jupyter Notebooks

Jupyter Notebooks are used to explore datasets through an interactive browser-based environment in which you can add notes and run code to manipulate and visualise data. They support languages regularly used by data scientists such as R and Python.

 

 

PROGRAMMING LANGUAGES

R

R is an open source programming language and software environment widely used for statistical analysis, testing and modelling.

 

Python

Python is another programming language used for detailed statistical analysis, testing and modelling.

 

 

VISUALISATIONS

Histogram

Similar in look to a horizontal bar graph except the bars are connected to each other, histograms are formed from grouped data to display frequencies or relative frequencies (percentages) for each class in a sample.

 

Scattergrams

A method of displaying the correlation between two or more variables, including a line of best fit to demonstrate how far each observation deviates from the mean.

 

Frequency polygon

Line chart plotted at the mid-point of each class, with the classes grouped e.g. into 0-10, 11-20, etc.

 

Venn diagram

Presented as two or more circles overlapping each other to demonstrate relationships between variables.

Example: Animals with two legs and animals who can fly. Some would show in one group or the other and some would overlap into both groups.

 

Tree diagram

Uses probability to demonstrate outcomes based on more than one input.

Example: The first branch could be Europe, the second branches splitting out Germany, France and Spain and then the third branches split out the various cities in those countries.

 

Box plots

A one-dimensional graph based on the numerical data from the five-number summary.

 

Leave a Reply

Your e-mail address will not be published. Required fields are marked *