The below is an example of how sklearn in Python can be used to develop a k-means clustering algorithm. The purpose of k-means clustering is to be able to partition observations in a dataset into a specific number of clusters […]

Continue reading »# Category: Python

# Implement an ARIMA model using statsmodels (Python)

In a previous tutorial, I elaborated on how an ARIMA model can be implemented using R. The model was fitted on a stock price dataset, with a (0,1,0) configuration being used for ARIMA. Here, I detail how to implement an […]

Continue reading »# Text Mining and Search Analytics Using Python and R

The following example illustrates how text mining capabilities in Python and R can be used in order to analyse a text file with a set of words, and how these words can be split into separate categories with the frequency […]

Continue reading »# Python-SQL Interaction (MySQLdb): Azure ML Studio

The following tutorial illustrates how to use Microsoft Azure to create a database and execute SQL commands using Python. Specifically, this tutorial is divided into the following two segments: How to configure Azure to set up a storage account, SQL […]

Continue reading »# MySQLdb: Connect Python and mySQL Databases Together

In a previous tutorial, we set up a financial database using a range of mySQL queries, and used such queries to create separate tables and discriminate among data in those tables. However, there are many occasions when a user needs […]

Continue reading »# Linear regression in Python: Use of numpy, scipy, and statsmodels

The numpy, scipy, and statsmodels libraries are frequently used when it comes to generating regression output. While these libraries are frequently used in regression analysis, it is often the case that a user might choose different libraries depending on the […]

Continue reading »# Poisson Distribution: Generate Using Python and R

A Poisson Distribution is a probability distribution which calculates the probability of a set of independent occurrences within a fixed time or space. e.g. Let us suppose that a trader makes 10 trades per day. On average, 4 trades are […]

Continue reading »# Normal Distributions, Monte Carlo Simulations and Random Walks

A key concept in dealing with statistical data is the law of large numbers; meaning that the more observations we have across any particular dataset, the more that the data will resemble a normal distribution where the majority of results […]

Continue reading »