The tidytext library in R is one of the most innovative I’ve come across within the language. tidytext is the cornerstone library for developing text mining algorithms in R (developed by Julia Silge and David Robinson). Here, I conduct a […]

Continue reading »# Category: Data Science

# kNN: K-Nearest Neighbours Algorithm in R

The purpose of a k-nearest neighbours algorithm (kNN) is to classify information. kNN is one of the most simplistic machine learning algorithms, and is very useful when it comes to solving classification problems. Here, we are using a winery dataset […]

Continue reading »# Big Data Helps You Target The Right Market

Companies in their infancy often start out making the same mistake. They sell the right product to the wrong market. The most frequent reason for this is that the company fails to take the time to understand their customer properly. […]

Continue reading »# psycopg2: Connect Python to PostgreSQL Database – Part II

In a previous tutorial, we looked at how to create a simple PostgreSQL database of temperature across different world cities. The PostgreSQL database was created through a Linux terminal, and the same was then connected to R to import data/commit […]

Continue reading »# Create PostgreSQL Database In Linux And Connect To R – Part I

PostgreSQL is a commonly used database language for creating and managing large amounts of data effectively. Here, you will see how to: 1) create a PostgreSQL database using the Linux terminal 2) connect the PostgreSQL database to R using the […]

Continue reading »# Creating functions in R

Functions are used to simplify a series of calculations. For instance, let us suppose that there exists an array of numbers which we wish to add to another variable. Instead of carrying out separate calculations for each number in the […]

Continue reading »# Poisson and Cumulative Binomial Probabilities

A Poisson Distribution is a probability distribution which calculates the probability of a set of independent occurrences within a fixed time or space. e.g. Let us suppose that a prestigious college receives 1000 applications in a particular time interval. On […]

Continue reading »# Creating maps in R using ggplot2 and maps libraries

Here is how we can use the maps, mapdata and ggplot2 libraries to create maps in R. In this particular example, we’re going to create a world map showing the points of Beijing and Shanghai, both cities in China. For […]

Continue reading »# Python: Implementing a K-Means Algorithm With sklearn

The below is an example of how sklearn in Python can be used to develop a k-means clustering algorithm. The purpose of k-means clustering is to be able to partition observations in a dataset into a specific number of clusters […]

Continue reading »# Variance-Covariance Matrix in R (corpcor, covmat)

The following tutorial demonstrates how to calculate a variance-covariance matrix in R, along with shrinkage estimate of covariance and the calculation of a covariance into a correlation matrix. The purpose of a variance-covariance matrix is to illustrate the variance of […]

Continue reading »