Here is how we can use the maps, mapdata and ggplot2 libraries to create maps in R. In this particular example, we’re going to create a world map showing the points of Beijing and Shanghai, both cities in China. For […]

Continue reading »# Category: Data Science

# Python: Implementing a K-Means Algorithm With sklearn

The below is an example of how sklearn in Python can be used to develop a k-means clustering algorithm. The purpose of k-means clustering is to be able to partition observations in a dataset into a specific number of clusters […]

Continue reading »# Variance-Covariance Matrix in R (corpcor, covmat)

The following tutorial demonstrates how to calculate a variance-covariance matrix in R, along with shrinkage estimate of covariance and the calculation of a covariance into a correlation matrix. The purpose of a variance-covariance matrix is to illustrate the variance of […]

Continue reading »# Linear Models in R: OLS and Logistic Regressions

We use linear models primarily to analyse cross-sectional data; i.e. data collected at one specific point in time across several observations. We can also use such models with time series data, but need to be cautious of issues such as serial […]

Continue reading »# Cross-Correlation Function In R (ccf)

When working with a time series, one important thing we wish to determine is whether one series “causes” changes in another. In other words, is there a strong correlation between a time series and another given a number of lags? […]

Continue reading »# Machine Learning and Statistics: Recommended Texts

For learning the latest machine learning and statistics techniques, I’ve found certain guides to be much more useful than others. The two main languages that I currently rely on are R and Python. I’ve found that R comes out on […]

Continue reading »# Implement an ARIMA model using statsmodels (Python)

In a previous tutorial, I elaborated on how an ARIMA model can be implemented using R. The model was fitted on a stock price dataset, with a (0,1,0) configuration being used for ARIMA. Here, I detail how to implement an […]

Continue reading »# Chow Test For Structural Breaks in Time Series

A Chow test is designed to determine whether a structural break in a time series exists. That is to say, a sharp change in trend in a time series that merits further study. For instance, a structural break in one […]

Continue reading »# Text Mining and Search Analytics Using Python and R

The following example illustrates how text mining capabilities in Python and R can be used in order to analyse a text file with a set of words, and how these words can be split into separate categories with the frequency […]

Continue reading »# neuralnet: Train and Test Neural Networks Using R

A neural network is a computational system frequently employed in machine learning to create predictions based on existing data. In this example, we will use the neuralnet package in R to train and test a neural network model. A typical […]

Continue reading »