Natural language processing (NLP) Natural Language Processing is a branch of artificial intelligence that uses computer algorithms to understand human’s language. NLP aims to read, understand and learn from human languages and to provide insights about the data.
Category: Data Science: Machine learning
Machine learning software
Data visualisation software There are numerous programs for creating data visualisations and dashboard reports some of the most popular being: Microsoft Power BI Qlik Tableau Spotfire Other products which specialise in infographics, animations and other visualisations include: Google Sites llustrator Unity Microsoft Azure Machine Learning Studio A drag-and-drop tool with a graphical user interface Read More …
Knowledge discovery
Knowledge Discovery KDD stands for Knowledge Discovery in Databases, which covers the creation of knowledge from structured and unstructured sources in an attempt to formalise the knowledge discovery process. There are five steps: Selection Preprocessing Transformation Data Mining Interpretation / Evaluation CRISP-DM (CRoss Industry Standard Process for Data Mining) is another process to formalise the Read More …
Data modelling
Supervised vs Unsupervised learning
Supervised vs Unsupervised learning In supervised learning, the algorithm is given the dependent variable and it looks at all the independent variables in order to make a prediction about the dependent variable. E.g. classification, regression. When training a supervised learning model, it is standard practice to split the data into a training dataset and a Read More …
Sentiment analysis
Sentiment analysis The process of categorising opinions expressed within a piece of text, usually to determine whether the comment was positive, negative or neutral. This type of analysis is prominent when analysing customer feedback on social media platforms. You can use build a sentiment analysis model using Python, or even within Excel with the Read More …
Random forests
Random forests Random forests are a learning method which consists of numerous decision trees and can be used with classification and regression analyses. An extension to the standard decision tree, a random forest can create numerous decision trees for a classification model and output the mode, or for regression the mean prediction of each decision Read More …
Classification
Classification Classification is a data mining technique for solving Yes / No questions. Whereas regression analysis predicts a numeric value, classification helps us predict which class (or category) our data observations belong to. Classification is sometimes referred to as a decision tree and is a particularly useful technique for breaking down very large datasets for Read More …
Clustering
Clustering Clustering is an ‘unsupervised’ analysis which categorises your observations into groups, or ‘clusters’. There are numerous variations but in each case there is some form of distance measurement to determine how close or far apart observations are within each cluster. k-Means clustering k-Means clustering is probably the most common partitioning method for Read More …
Customer segmentation
Customer segmentation Sometimes referred to as market segmentation, this is the process of breaking down a population into groups, or samples, of similar characteristics with an identifiable difference. This can be as simple as splitting your samples between men and women, or could be based on any other attribute about the population. Often a core Read More …