In a previous tutorial, I elaborated on how an ARIMA model can be implemented using R. The model was fitted on a stock price dataset, with a (0,1,0) configuration being used for ARIMA.
Here, I detail how to implement an ARIMA model in Python using the pandas and statsmodels libraries. I cover most of the theoretical “need-to-knows” in my previous article – this post is mainly to demonstrate how the same can be implemented in Python.
1. Load Libraries
Firstly, we load our libraries as standard. The library of major importance in this case is statsmodels, since we are using this library to calculate the ACF and PACF statistics, and also to formulate the ARIMA model:
import pandas import matplotlib.mlab as mlab import matplotlib.pyplot as plt import numpy as np import math from statsmodels.tsa.stattools import acf, pacf import statsmodels.tsa.stattools as ts from statsmodels.tsa.arima_model import ARIMA
2. Import csv and define “price” variable using pandas
variables = pandas.read_csv('jnj.csv') price = variables['price']
3. Autocorrelation and Partial Autocorrelation Plots
lnprice=np.log(price) lnprice plt.plot(lnprice) plt.show() acf_1 = acf(lnprice)[1:20] plt.plot(acf_1) plt.show() test_df = pandas.DataFrame([acf_1]).T test_df.columns = ['Pandas Autocorrelation'] test_df.index += 1 test_df.plot(kind='bar') pacf_1 = pacf(lnprice)[1:20] plt.plot(pacf_1) plt.show() test_df = pandas.DataFrame([pacf_1]).T test_df.columns = ['Pandas Partial Autocorrelation'] test_df.index += 1 test_df.plot(kind='bar') result = ts.adfuller(lnprice, 1) result
We see that statsmodels produces the autocorrelation and partial autocorrelation plots:
Moreover, we have confirmation that our data follows an AR(1) stationary process (one with a constant mean, variance, and autocorrelation), and we see that the price plot now shows a stationary process:
lnprice_diff=lnprice-lnprice.shift() diff=lnprice_diff.dropna() acf_1_diff = acf(diff)[1:20] test_df = pandas.DataFrame([acf_1_diff]).T test_df.columns = ['First Difference Autocorrelation'] test_df.index += 1 test_df.plot(kind='bar') pacf_1_diff = pacf(diff)[1:20] plt.plot(pacf_1_diff) plt.show()
4. ARIMA Model Generation
price_matrix=lnprice.as_matrix() model = ARIMA(price_matrix, order=(0,1,0)) model_fit = model.fit(disp=0) print(model_fit.summary())
As previously mentioned, our data is in logarithmic format. Since we are analysing stock price, this format is necessary to account for compounding returns. However, once we have obtained the forecasts (for seven periods out in this case), then we can obtain the real price forecast by converting the logarithmic figure to an exponent:
predictions=model_fit.predict(122, 127, typ='levels') predictions predictionsadjusted=np.exp(predictions) predictionsadjusted