Serial Correlation: Durbin-Watson and Cochrane-Orcutt Remedy

Serial correlation (also known as autocorrelation) is a violation of the Ordinary Least Squares assumption that all observations of the error term in a dataset are uncorrelated. In a model with serial correlation, the current value of the error term is a function of the one immediately previous to it:

  et = ρe(t-1) + ut
   
  where e = error term of equation in question; ρ = first-order autocorrelation coefficient; u = classical (not serially correlated error term)

This issue is quite endemic in time-series models, given that time series data is hardly ever random and often shows particular patterns and relationships between past and future data.

In this particular example, the relationship between oil prices and fluctuations in the S&P 500 stock market index is analysed for the period June 2015 – October 2016. An Ordinary Least Squares regression is run to model the relationship between the same, and a Durbin-Watson test and Cochrane-Orcutt procedure is applied to test and remedy this condition respectively.

The OLS model used to describe this relationship is:

  YSP500 Prices = Intercept + XOil Prices

Call:
lm(formula = gspc ~ oil)

Residuals:
     Min       1Q   Median       3Q      Max 
-195.309  -46.802    6.726   45.612  139.918 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 1768.1278    20.2744   87.21   <2e-16 ***
oil            6.5421     0.4488   14.58   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 70.44 on 499 degrees of freedom
Multiple R-squared:  0.2986,Adjusted R-squared:  0.2972 
F-statistic: 212.5 on 1 and 499 DF,  p-value: < 2.2e-16

Next, we test the model for the presence of serial correlation using the Durbin-Watson test. With a p-value below 0.05 as shown, this is an indication that serial correlation is present in the model and needs to be remedied.

> dwtest(reg1)

Durbin-Watson test

data:  reg1
DW = 0.047108, p-value < 2.2e-16
alternative hypothesis: true autocorrelation is greater than 0

Notably, the Cochrane-Orcutt remedy only works when the data is an AR(1) stationary process. In other words, taking a first difference of the data results in a stationary process whereby the data has a constant mean, variance and autocorrelation.

Residuals of OLS Regression (Serial Correlation Present)

Serial correlation

Residuals of First Differenced OLS Regression (Serial Correlation Eliminated)

No serial correlation

Consequences of Serial Correlation

According to Studenmund (2010) – a textbook which I find gives a solid introduction to the particulars of serial correlation – the consequences of this condition for a regression model is as follows:

► Ordinary Least Squares is no longer the minimum variance estimator among all linear unbiased estimators.

► Standard errors encounter significant bias in the face of serial correlation, increasing the risk of making a Type 1 or Type 2 error.

► Our coefficient estimates remain unbiased in the face of serial correlation.
 

First Differencing

The purpose of first differencing - as mentioned - is to transform a non-stationary time series into a stationary one. This is necessary in order to ensure that the data in question follows a stationary AR(1) process.

Therefore, when first differences of the variables are obtained, a summary of the regression with the first differenced variables is also calculated.

Call:
lm(formula = diff_gspc ~ diff_oil)

Residuals:
    Min      1Q  Median      3Q     Max 
-65.068  -3.908  -0.183   5.016  73.286 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   0.1826     0.6836   0.267     0.79    
diff_oil      5.6864     0.6935   8.199 2.07e-15 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 15.28 on 498 degrees of freedom
Multiple R-squared:  0.1189,Adjusted R-squared:  0.1172 
F-statistic: 67.23 on 1 and 498 DF,  p-value: 2.067e-15

To ensure that the data follows an AR(1) process, a formal Dickey-Fuller test can be run and the autocorrelation functions plotted.

When ADF tests are run on both the ordinary and first differenced variables, we see that the former have p-values above 0.05 (indicating non-stationarity), while the latter have p-values below this threshold (indicating stationarity).

> adf.test(gspc)

Augmented Dickey-Fuller Test

data:  gspc
Dickey-Fuller = -2.38, Lag order = 7, p-value = 0.4174
alternative hypothesis: stationary

> adf.test(oil)

Augmented Dickey-Fuller Test

data:  oil
Dickey-Fuller = -2.1883, Lag order = 7, p-value = 0.4986
alternative hypothesis: stationary

> adf.test(diff_gspc)

Augmented Dickey-Fuller Test

data:  diff_gspc
Dickey-Fuller = -8.2973, Lag order = 7, p-value = 0.01
alternative hypothesis: stationary

Warning message:
In adf.test(diff_gspc) : p-value smaller than printed p-value
> adf.test(diff_oil)

Augmented Dickey-Fuller Test

data:  diff_oil
Dickey-Fuller = -8.1493, Lag order = 7, p-value = 0.01
alternative hypothesis: stationary

Warning message:
In adf.test(diff_oil) : p-value smaller than printed p-value

A plot of the autocorrelation functions shows a sudden drop in correlations after lag 1 for the first differenced regression, also indicating a stationary series.

Cochrane-Orcutt Remedy

Given that the presence of a stationary AR(1) series has been established, the Cochrane-Orcutt method is appropriate to use in this case to remedy serial correlation.

The method works by estimating a ρ value, that is, a correlation value between the residuals and its lagged values, where:

yt = yt − p̂yt-1
xt = xt − p̂xt-1

When the autocorrelation function was run on the residuals of the initial regression, the following autocorrelation values were returned:

LagAutocorrelation
01
10.975
20.954
30.936
40.914
50.895
60.879
70.863
80.848
90.832
100.815
110.798
120.782
130.764
140.745
150.73
160.711
170.692
180.676
190.659
200.641
210.623
220.605
230.591
240.578
250.561
260.548

The Cochrane-Orcutt estimator in R estimates the appropriate value of to use in estimating the new regression. The purpose of p̂ is to formulate a regression where the correlations between one error term and the previous are removed so that each observation becomes IID (independent and identically distributed).

When the Cochrane-Orcutt procedure is run, the updated regression is displayed and it is observed that the p̂ value of 0.977051 generated is very close to the correlation coefficient of 0.975 calculated for lag 1 initially. Therefore, this value is used to calculate the new regression output as below:

> orcuttreg1
Cochrane-orcutt estimation for first order autocorrelation 
 
Call:
lm(formula = gspc ~ oil)

 number of interaction: 4
 rho 0.977051

Durbin-Watson statistic 
(original):    0.04711 , p-value: 4.902e-107
(transformed): 2.08847 , p-value: 8.393e-01
 
 coefficients: 
(Intercept)         oil 
1811.922439    5.737714

Moreover, the Durbin-Watson statistic has been transformed with a p-value above 0.05, indicating that the serial correlation in the model has been eliminated.


Dataset

oilgspc.csv