When it comes to time series forecasts, conventional models such as ARIMA are often a popular option. While these models can prove to have high degrees of accuracy, they have one major shortcoming – they do not typically account for […]

Continue reading »# Working with panel data in R: Fixed vs. Random Effects (plm)

Panel data, along with cross-sectional and time series data, are the main data types that we encounter when working with regression analysis.

Continue reading »# Robust Regressions: Dealing with Outliers

It is often the case that a dataset contains significant outliers – or observations that are significantly out of range from the majority of other observations in our dataset. Let us see how we can use robust regressions to deal […]

Continue reading »# Variance-Covariance Matrix: Stock Price Analysis in R (corpcor, covmat)

The purpose of a variance-covariance matrix is to illustrate the variance of a particular variable (diagonals) while covariance illustrates the covariances between the exhaustive combinations of variables.

Continue reading »# Sentiment Analysis with twitteR and tidytext

A sentiment analysis is a useful way of gauging group opinion on a certain topic at a particular point in time. Using social media data, let us see how we can use the twitteR library to stream tweets from Twitter […]

Continue reading »# Decision Trees with Python

Let’s take a look at how we can construct decision trees in Python. A decision tree is a model used to solve classification and regression tasks. As we saw in our example for R, the model allows us to generate […]

Continue reading »# Voice Recognition with Python (speech_recognition and PyAudio)

Python has quite a handy library called speech_recognition, which we can use to create a program where a user’s voice can be transcribed into text. Let’s have a look at how we can do this. Note that I’m using Python […]

Continue reading »# Cumulative Binomial Probability with R and Shiny

In conducting probability analysis, the two variables that take account of the chance of an event happening are N (number of observations) and λ (lambda – our hit rate/chance of occurrence in a single interval). When we talk about a […]

Continue reading »# Linear and Logistic Regression Modelling in Python

The statsmodels and sklearn libraries are frequently used when it comes to generating regression output. While these libraries are frequently used in regression analysis, it is often the case that a user needs to work with different libraries depending on […]

Continue reading »# plyr and dplyr: Data Manipulation in R

The purpose of the plyr and dplyr libraries in R is to manipulate data with ease. As we’ve seen in a previous post, there are various methods of wrangling and summarising data in R. However, wouldn’t it be great if […]

Continue reading »