Category: Data Science: Machine learning

Machine learning software

Data visualisation software There are numerous programs for creating data visualisations and dashboard reports some of the most popular being: Microsoft Power BI Qlik Tableau Spotfire Other products which specialise in infographics, animations and other visualisations include: Google Sites llustrator Unity   Microsoft Azure Machine Learning Studio A drag-and-drop tool with a graphical user interface Read More …

Knowledge discovery

Knowledge Discovery KDD stands for Knowledge Discovery in Databases, which covers the creation of knowledge from structured and unstructured sources in an attempt to formalise the knowledge discovery process. There are five steps: Selection Preprocessing Transformation Data Mining Interpretation / Evaluation CRISP-DM (CRoss Industry Standard Process for Data Mining) is another process to formalise the Read More …

Supervised vs Unsupervised learning

Supervised vs Unsupervised learning In supervised learning, the algorithm is given the dependent variable and it looks at all the independent variables in order to make a prediction about the dependent variable. E.g. classification, regression. When training a supervised learning model, it is standard practice to split the data into a training dataset and a Read More …

Sentiment analysis

Sentiment analysis The process of categorising opinions expressed within a piece of text, usually to determine whether the comment was positive, negative or neutral. This type of analysis is prominent when analysing customer feedback on social media platforms.   You can use build a sentiment analysis model using Python, or even within Excel with the Read More …

Random forests

Random forests Random forests are a learning method which consists of numerous decision trees and can be used with classification and regression analyses. An extension to the standard decision tree, a random forest can create numerous decision trees for a classification model and output the mode, or for regression the mean prediction of each decision Read More …

Classification

Classification Classification is a data mining technique for solving Yes / No questions. Whereas regression analysis predicts a numeric value, classification helps us predict which class (or category) our data observations belong to. Classification is sometimes referred to as a decision tree and is a particularly useful technique for breaking down very large datasets for Read More …

Clustering

Clustering Clustering is an ‘unsupervised’ analysis which categorises your observations into groups, or ‘clusters’. There are numerous variations but in each case there is some form of distance measurement to determine how close or far apart observations are within each cluster.     k-Means clustering k-Means clustering is probably the most common partitioning method for Read More …

Customer segmentation

Customer segmentation Sometimes referred to as market segmentation, this is the process of breaking down a population into groups, or samples, of similar characteristics with an identifiable difference. This can be as simple as splitting your samples between men and women, or could be based on any other attribute about the population. Often a core Read More …