In a previous tutorial, I elaborated on how an ARIMA model can be implemented using R. The model was fitted on a stock price dataset, with a (0,1,0) configuration being used for ARIMA.
Here, I detail how to implement an ARIMA model in Python using the pandas and statsmodels libraries. I cover most of the theoretical “need-to-knows” in my previous article – this post is mainly to demonstrate how the same can be implemented in Python.
1. Load Libraries
Firstly, we load our libraries as standard. The library of major importance in this case is statsmodels, since we are using this library to calculate the ACF and PACF statistics, and also to formulate the ARIMA model:
import pandas import matplotlib.mlab as mlab import matplotlib.pyplot as plt import numpy as np import math from statsmodels.tsa.stattools import acf, pacf import statsmodels.tsa.stattools as ts from statsmodels.tsa.arima_model import ARIMA
2. Import csv and define “price” variable using pandas
variables = pandas.read_csv('jnj.csv') price = variables['price']
3. Autocorrelation and Partial Autocorrelation Plots
lnprice=np.log(price) lnprice plt.plot(lnprice) plt.show() acf_1 = acf(lnprice)[1:20] plt.plot(acf_1) plt.show() test_df = pandas.DataFrame([acf_1]).T test_df.columns = ['Pandas Autocorrelation'] test_df.index += 1 test_df.plot(kind='bar') pacf_1 = pacf(lnprice)[1:20] plt.plot(pacf_1) plt.show() test_df = pandas.DataFrame([pacf_1]).T test_df.columns = ['Pandas Partial Autocorrelation'] test_df.index += 1 test_df.plot(kind='bar') result = ts.adfuller(lnprice, 1) result
We see that statsmodels produces the autocorrelation and partial autocorrelation plots:
Moreover, we have confirmation that our data follows an AR(1) stationary process (one with a constant mean, variance, and autocorrelation), and we see that the price plot now shows a stationary process:
lnprice_diff=lnprice-lnprice.shift() diff=lnprice_diff.dropna() acf_1_diff = acf(diff)[1:20] test_df = pandas.DataFrame([acf_1_diff]).T test_df.columns = ['First Difference Autocorrelation'] test_df.index += 1 test_df.plot(kind='bar') pacf_1_diff = pacf(diff)[1:20] plt.plot(pacf_1_diff) plt.show()
4. ARIMA Model Generation
price_matrix=lnprice.as_matrix() model = ARIMA(price_matrix, order=(0,1,0)) model_fit = model.fit(disp=0) print(model_fit.summary())
As previously mentioned, our data is in logarithmic format. Since we are analysing stock price, this format is necessary to account for compounding returns. However, once we have obtained the forecasts (for seven periods out in this case), then we can obtain the real price forecast by converting the logarithmic figure to an exponent:
predictions=model_fit.predict(122, 127, typ='levels') predictions predictionsadjusted=np.exp(predictions) predictionsadjusted
Disclosure: The author is long JNJ at the time of writing. However, the example above is for illustration purposes only, and in no way constitutes promotion of a security or any other form of investment advice.
Code Scripts and Datasets
Hope you enjoyed this tutorial!
The full code is available by subscribing to my mailing list.
Upon subscription, you will receive full access to the codes and datasets for my tutorials, as well as a comprehensive course in regression analysis in both Python and R.