Let’s take a look at how we can do some statistical analysis and visualizations with JavaScript. Admittedly, JavaScript is not a language one particularly associates with data science – it has always traditionally belonged to web developers. That said, I’ve […]

Continue reading »# Category: Data Science

# K-Nearest Neighbors (KNN): Solving Classification Problems

In this tutorial, we are going to use the K-Nearest Neighbors (KNN) algorithm to solve a classification problem. Firstly, what exactly do we mean by classification? Classification across a variable means that results are categorised into a particular group. e.g. […]

Continue reading »# VLOOKUP and SUMIF: Replicate in Python

Often times, a new user to Python will wish to replicate analysis previously done in Excel. Two major instances of this are the VLOOKUP and SUMIF commands. VLOOKUP: Combining data through a common index SUMIF: Summing up values by category […]

Continue reading »# neuralnet: Train and Test Neural Networks Using R

A neural network is a computational system that creates predictions based on existing data. Let us train and test a neural network using the neuralnet library in R. How To Construct A Neural Network? A neural network consists of: Input […]

Continue reading »# matplotlib: Generating line and pie charts in Python

Let’s take a look at how we can generate plots in Python. matplotlib is a particularly powerful library that we can use to generate visualisations. Let’s see how this works using a couple of examples. In a previous tutorial, we […]

Continue reading »# Cross Correlation Analysis: Analysing Currency Pairs in Python

When working with a time series, one important thing we wish to determine is whether one series “causes” changes in another. In other words, is there a strong correlation between a time series and another given a number of lags? […]

Continue reading »# Huber vs. Ridge Regressions: Accounting for Outliers

In a previous tutorial, we saw how we can use Huber and Bisquare weightings to adjust for outliers in a dataset. These weightings allow us to adjust our regression analysis to give less weight to extreme values. The previous analysis […]

Continue reading »# pykalman: Analysis of USD/CHF with Kalman Filter

In a previous tutorial, we saw how the Kalman Filter can account for “shocks”, or sudden changes in a time series. The analysis was done within R. Let’s now see how we can analyse the USD/CHF currency pair with the […]

Continue reading »# Kalman Filter: Modelling Time Series Shocks with KFAS in R

We have already seen how time series models such as ARIMA can be used to make time series forecasts. While these models can prove to have high degrees of accuracy, they have one major shortcoming – they do not account […]

Continue reading »# Working with panel data in R: Fixed vs. Random Effects (plm)

Panel data, along with cross-sectional and time series data, are the main data types that we encounter when working with regression analysis. Types of data Cross-Sectional: Data collected at one particular point in time Time Series: Data collected across several […]

Continue reading »