Serial correlation (also known as autocorrelation) is a violation of the Ordinary Least Squares assumption that all observations of the error term in a dataset are uncorrelated. In a model with serial correlation, the current value of the error term is a function of the one immediately previous to it:
et = ρe(t-1) + ut where e = error term of equation in question; ρ = first-order autocorrelation coefficient; u = classical (not serially correlated error term)
This issue is quite endemic in time-series models, given that time series data is hardly ever random and often shows particular patterns and relationships between past and future data.>
In this particular example, the relationship between oil prices and fluctuations in the S&P 500 stock market index is analysed for the period June 2015 – October 2016. An Ordinary Least Squares regression is run to model the relationship between the same, and a Durbin-Watson test and Cochrane-Orcutt procedure is applied to test and remedy this condition respectively.
The OLS model used to describe this relationship is:
YSP500 Prices = Intercept + XOil Prices Call: lm(formula = gspc ~ oil) Residuals: Min 1Q Median 3Q Max -195.309 -46.802 6.726 45.612 139.918 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1768.1278 20.2744 87.21 <2e-16 *** oil 6.5421 0.4488 14.58 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 70.44 on 499 degrees of freedom Multiple R-squared: 0.2986,Adjusted R-squared: 0.2972 F-statistic: 212.5 on 1 and 499 DF, p-value: < 2.2e-16
Next, we test the model for the presence of serial correlation using the Durbin-Watson test. With a p-value below 0.05 as shown, this is an indication that serial correlation is present in the model and needs to be remedied.
> dwtest(reg1) Durbin-Watson test data: reg1 DW = 0.047108, p-value < 2.2e-16 alternative hypothesis: true autocorrelation is greater than 0
Notably, the Cochrane-Orcutt remedy only works when the data is an AR(1) stationary process. In other words, taking a first difference of the data results in a stationary process whereby the data has a constant mean, variance and autocorrelation.
Residuals of OLS Regression (Serial Correlation Present)
Residuals of First Differenced OLS Regression (Serial Correlation Eliminated)
Consequences of Serial Correlation
According to Studenmund (2010) – a textbook which I find gives a solid introduction to the particulars of serial correlation – the consequences of this condition for a regression model is as follows:
► Ordinary Least Squares is no longer the minimum variance estimator among all linear unbiased estimators.
► Standard errors encounter significant bias in the face of serial correlation, increasing the risk of making a Type 1 or Type 2 error.
► Our coefficient estimates remain unbiased in the face of serial correlation.
The purpose of first differencing - as mentioned - is to transform a non-stationary time series into a stationary one. This is necessary in order to ensure that the data in question follows a stationary AR(1) process.
Therefore, when first differences of the variables are obtained, a summary of the regression with the first differenced variables is also calculated.
Call: lm(formula = diff_gspc ~ diff_oil) Residuals: Min 1Q Median 3Q Max -65.068 -3.908 -0.183 5.016 73.286 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.1826 0.6836 0.267 0.79 diff_oil 5.6864 0.6935 8.199 2.07e-15 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 15.28 on 498 degrees of freedom Multiple R-squared: 0.1189,Adjusted R-squared: 0.1172 F-statistic: 67.23 on 1 and 498 DF, p-value: 2.067e-15
To ensure that the data follows an AR(1) process, a formal Dickey-Fuller test can be run and the autocorrelation functions plotted.
When ADF tests are run on both the ordinary and first differenced variables, we see that the former have p-values above 0.05 (indicating non-stationarity), while the latter have p-values below this threshold (indicating stationarity).
> adf.test(gspc) Augmented Dickey-Fuller Test data: gspc Dickey-Fuller = -2.38, Lag order = 7, p-value = 0.4174 alternative hypothesis: stationary > adf.test(oil) Augmented Dickey-Fuller Test data: oil Dickey-Fuller = -2.1883, Lag order = 7, p-value = 0.4986 alternative hypothesis: stationary > adf.test(diff_gspc) Augmented Dickey-Fuller Test data: diff_gspc Dickey-Fuller = -8.2973, Lag order = 7, p-value = 0.01 alternative hypothesis: stationary Warning message: In adf.test(diff_gspc) : p-value smaller than printed p-value > adf.test(diff_oil) Augmented Dickey-Fuller Test data: diff_oil Dickey-Fuller = -8.1493, Lag order = 7, p-value = 0.01 alternative hypothesis: stationary Warning message: In adf.test(diff_oil) : p-value smaller than printed p-value
A plot of the autocorrelation functions shows a sudden drop in correlations after lag 1 for the first differenced regression, also indicating a stationary series.
Given that the presence of a stationary AR(1) series has been established, the Cochrane-Orcutt method is appropriate to use in this case to remedy serial correlation.
The method works by estimating a ρ value, that is, a correlation value between the residuals and its lagged values, where:
yt = yt − p̂yt-1
xt = xt − p̂xt-1
When the autocorrelation function was run on the residuals of the initial regression, the following autocorrelation values were returned:
LagAutocorrelation 01 10.975 20.954 30.936 40.914 50.895 60.879 70.863 80.848 90.832 100.815 110.798 120.782 130.764 140.745 150.73 160.711 170.692 180.676 190.659 200.641 210.623 220.605 230.591 240.578 250.561 260.548
The Cochrane-Orcutt estimator in R estimates the appropriate value of p̂ to use in estimating the new regression. The purpose of p̂ is to formulate a regression where the correlations between one error term and the previous are removed so that each observation becomes IID (independent and identically distributed).
When the Cochrane-Orcutt procedure is run, the updated regression is displayed and it is observed that the p̂ value of 0.977051 generated is very close to the correlation coefficient of 0.975 calculated for lag 1 initially. Therefore, this value is used to calculate the new regression output as below:
> orcuttreg1 Cochrane-orcutt estimation for first order autocorrelation Call: lm(formula = gspc ~ oil) number of interaction: 4 rho 0.977051 Durbin-Watson statistic (original): 0.04711 , p-value: 4.902e-107 (transformed): 2.08847 , p-value: 8.393e-01 coefficients: (Intercept) oil 1811.922439 5.737714
Moreover, the Durbin-Watson statistic has been transformed with a p-value above 0.05, indicating that the serial correlation in the model has been eliminated.