Implement an ARIMA model using statsmodels (Python)

In a previous tutorial, I elaborated on how an ARIMA model can be implemented using R. The model was fitted on a stock price dataset, with a (0,1,0) configuration being used for ARIMA.

Here, I detail how to implement an ARIMA model in Python using the pandas and statsmodels libraries. I cover most of the theoretical “need-to-knows” in my previous article – this post is mainly to demonstrate how the same can be implemented in Python.

1. Load Libraries

Firstly, we load our libraries as standard. The library of major importance in this case is statsmodels, since we are using this library to calculate the ACF and PACF statistics, and also to formulate the ARIMA model:

import pandas
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
import numpy as np
import math
from statsmodels.tsa.stattools import acf, pacf
import statsmodels.tsa.stattools as ts
from statsmodels.tsa.arima_model import ARIMA

2. Import csv and define “price” variable using pandas

variables = pandas.read_csv('jnj.csv')
price = variables['price']

3. Autocorrelation and Partial Autocorrelation Plots

lnprice=np.log(price)
lnprice
plt.plot(lnprice)
plt.show()
acf_1 =  acf(lnprice)[1:20]
plt.plot(acf_1)
plt.show()
test_df = pandas.DataFrame([acf_1]).T
test_df.columns = ['Pandas Autocorrelation']
test_df.index += 1
test_df.plot(kind='bar')
pacf_1 =  pacf(lnprice)[1:20]
plt.plot(pacf_1)
plt.show()
test_df = pandas.DataFrame([pacf_1]).T
test_df.columns = ['Pandas Partial Autocorrelation']
test_df.index += 1
test_df.plot(kind='bar')
result = ts.adfuller(lnprice, 1)
result

We see that statsmodels produces the autocorrelation and partial autocorrelation plots:

acf

pacf

Moreover, we have confirmation that our data follows an AR(1) stationary process (one with a constant mean, variance, and autocorrelation), and we see that the price plot now shows a stationary process:

lnprice_diff=lnprice-lnprice.shift()
diff=lnprice_diff.dropna()
acf_1_diff =  acf(diff)[1:20]
test_df = pandas.DataFrame([acf_1_diff]).T
test_df.columns = ['First Difference Autocorrelation']
test_df.index += 1
test_df.plot(kind='bar')
pacf_1_diff =  pacf(diff)[1:20]
plt.plot(pacf_1_diff)
plt.show()

4. ARIMA Model Generation

price_matrix=lnprice.as_matrix()
model = ARIMA(price_matrix, order=(0,1,0))
model_fit = model.fit(disp=0)
print(model_fit.summary())

Using the (0,1,0) configuration, our ARIMA model is generated:
arima

As previously mentioned, our data is in logarithmic format. Since we are analysing stock price, this format is necessary to account for compounding returns. However, once we have obtained the forecasts (for seven periods out in this case), then we can obtain the real price forecast by converting the logarithmic figure to an exponent:

predictions=model_fit.predict(122, 127, typ='levels')
predictions
predictionsadjusted=np.exp(predictions)
predictionsadjusted

predictions
 

Dataset

jnj.csv
 
Disclosure: The author is long JNJ at the time of writing. However, the example above is for illustration purposes only, and in no way constitutes promotion of a security or any other form of investment advice.