In a previous tutorial, I elaborated on how an ARIMA model can be implemented using R. The model was fitted on a stock price dataset, with a (0,1,0) configuration being used for ARIMA.

Here, I detail how to implement an ARIMA model in Python using the pandas and statsmodels libraries. I cover most of the theoretical “need-to-knows” in my previous article – this post is mainly to demonstrate how the same can be implemented in Python.

# 1. Load Libraries

Firstly, we load our libraries as standard. The library of major importance in this case is statsmodels, since we are using this library to calculate the ACF and PACF statistics, and also to formulate the ARIMA model:

import pandas import matplotlib.mlab as mlab import matplotlib.pyplot as plt import numpy as np import math from statsmodels.tsa.stattools import acf, pacf import statsmodels.tsa.stattools as ts from statsmodels.tsa.arima_model import ARIMA

# 2. Import csv and define “price” variable using pandas

variables = pandas.read_csv('jnj.csv') price = variables['price']

# 3. Autocorrelation and Partial Autocorrelation Plots

lnprice=np.log(price) lnprice plt.plot(lnprice) plt.show() acf_1 = acf(lnprice)[1:20] plt.plot(acf_1) plt.show() test_df = pandas.DataFrame([acf_1]).T test_df.columns = ['Pandas Autocorrelation'] test_df.index += 1 test_df.plot(kind='bar') pacf_1 = pacf(lnprice)[1:20] plt.plot(pacf_1) plt.show() test_df = pandas.DataFrame([pacf_1]).T test_df.columns = ['Pandas Partial Autocorrelation'] test_df.index += 1 test_df.plot(kind='bar') result = ts.adfuller(lnprice, 1) result

We see that statsmodels produces the autocorrelation and partial autocorrelation plots:

Moreover, we have confirmation that our data follows an AR(1) stationary process (one with a constant mean, variance, and autocorrelation), and we see that the price plot now shows a stationary process:

lnprice_diff=lnprice-lnprice.shift() diff=lnprice_diff.dropna() acf_1_diff = acf(diff)[1:20] test_df = pandas.DataFrame([acf_1_diff]).T test_df.columns = ['First Difference Autocorrelation'] test_df.index += 1 test_df.plot(kind='bar') pacf_1_diff = pacf(diff)[1:20] plt.plot(pacf_1_diff) plt.show()

# 4. ARIMA Model Generation

price_matrix=lnprice.as_matrix() model = ARIMA(price_matrix, order=(0,1,0)) model_fit = model.fit(disp=0) print(model_fit.summary())

Using the (0,1,0) configuration, our ARIMA model is generated:

As previously mentioned, our data is in logarithmic format. Since we are analysing stock price, this format is necessary to account for compounding returns. However, once we have obtained the forecasts (for seven periods out in this case), then we can obtain the real price forecast by converting the logarithmic figure to an exponent:

predictions=model_fit.predict(122, 127, typ='levels') predictions predictionsadjusted=np.exp(predictions) predictionsadjusted

# Dataset

jnj.csv

Disclosure: The author is long JNJ at the time of writing. However, the example above is for illustration purposes only, and in no way constitutes promotion of a security or any other form of investment advice.