# Chow Test For Structural Breaks in Time Series

A Chow test is designed to determine whether a structural break in a time series exists. That is to say, a sharp change in trend in a time series that merits further study. For instance, a structural break in one series can give useful clues as to whether such a change is being propagated across other variables – assuming that there is a significant correlation between them under normal circumstances.

The Chow test is conducted by running three separate regressions: 1) a pooled regression with data before and after the structural break, 2) a regression with data before the structural break, and 3) a regression with data after the structural break. The residual sum of squares for each regression is used to calculate the Chow statistic using the following formula:

```CHOW = (RSSP - (RSSA+RSSB))/k) / (RSSA+RSSB)/(NA+NB-2k)

where RSS = Residual Sum of Squares, k = number of regressors (including intercept), N = degrees of freedom
```

Note that this test can be set up automatically in R using the “strucchange” package. However, I always prefer to calculate the test statistic manually where possible, as it facilitates understanding of why we are applying the test, along with understanding the specific break that we are analysing in the time series.

The null and alternative hypothesis is as follows:

Null Hypothesis: No structural break in time series

Alternative Hypothesis: Structural break in time series

At the outset, let me say that the Chow Test is more of an academic model in nature, and is not as commonly used as other time series methods. Firstly, most time series (especially ones with an economic or financial trend involved) will show many structural breaks. The Chow Test is mainly useful when it comes to analysing structural breaks across time series that are normally stationary, but a significant shift causes a break in the series.

Let us take the example of a time series measuring a person’s heart rate per minute for every 10 minute interval. It can be assumed that unless that person undergoes significant physical activity, has a sudden illness, is emotionally distressed, or another significant event occurs to change that person’s heart rate, the resting heart rate will remain more or less constant.

Let us plot heart rate and see what it looks like:

We see that with the exception of a clear structural break where heart rate rapidly rises (due to increased physical activity as measured by number of steps per minute), heart rate shows the properties of a stationary time series (one with a constant mean and variance).

Here is a snippet of what our data looks like overall:

 Time Beats per minute Steps taken per minute 10:10:00 75 67 10:20:00 78 0 10:30:00 80 0 10:40:00 78 0 10:50:00 80 0 11:00:00 78 90 11:10:00 75 0 11:20:00 79 0 11:30:00 78 0 11:40:00 75 0 11:50:00 78 0 12:00:00 75 79 12:10:00 77 68 12:20:00 78 80 12:30:00 77 77 12:40:00 79 80 12:50:00 80 67 13:00:00 79 73 13:10:00 75 79 13:20:00 129 173 13:30:00 137 178 13:40:00 126 187 13:50:00 124 190 14:00:00 126 188 14:10:00 124 169 14:20:00 126 192 14:30:00 75 0 14:40:00 78 0 14:50:00 79 0 15:00:00 79 0 15:10:00 80 0 15:20:00 79 0 15:30:00 79 0 15:40:00 76 0 15:50:00 75 0

Let us now run the three separate regressions using the following data:

Pooled Regression (RegP). Observations [22:43] with data containing heart rate before and after the sudden rise as a result of increased physical activity:

```Call:
lm(formula = Beats.per.minute[22:43] ~ Steps.taken.per.minute[22:43])

Residuals:
Min      1Q  Median      3Q     Max
-16.520 -11.234   3.483   9.050  17.197

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)                   68.95046    3.59375  19.186 2.38e-14 ***
Steps.taken.per.minute[22:43]  0.28569    0.03191   8.954 1.96e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 10.98 on 20 degrees of freedom
F-statistic: 80.17 on 1 and 20 DF,  p-value: 1.96e-08

> regP\$df
[1] 20
[1] 2411.655
```

Regression A (RegA). Observations [22:32] with data containing heart rate before increased physical activity:

```Call:
lm(formula = Beats.per.minute[22:32] ~ Steps.taken.per.minute[22:32])

Residuals:
Min     1Q Median     3Q    Max
-2.463 -1.297  0.533  1.197  2.586

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)                   77.138547   1.053733  73.205 8.38e-14 ***
Steps.taken.per.minute[22:32]  0.004106   0.016357   0.251    0.807
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.836 on 9 degrees of freedom
F-statistic: 0.06302 on 1 and 9 DF,  p-value: 0.8074

> regA\$df
[1] 9
[1] 30.33306
```

Regression B (Reg B). Observations [33:43] with data containing heart rate after increased physical activity:

```Call:
lm(formula = Beats.per.minute[33:43] ~ Steps.taken.per.minute[33:43])

Residuals:
Min      1Q  Median      3Q     Max
-5.3181 -2.8994 -0.0207  0.9793 10.9217

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)                   78.02066    2.38701   32.69 1.16e-10 ***
Steps.taken.per.minute[33:43]  0.26999    0.01639   16.48 4.98e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.783 on 9 degrees of freedom
F-statistic: 271.5 on 1 and 9 DF,  p-value: 4.976e-08

> regB\$df
[1] 9
>
[1] 205.8739
```

With a Chow statistic of 64.46 and an F critical value of 3.178, our Chow statistic is clearly much larger and therefore the null hypothesis is rejected – showing strong evidence of a structural break.

```k=2

fcrit=qf(.95,df1=regA\$df,df2=regB\$df)
fcrit
[1] 3.178893
Chow_Statistic
[1] 64.46947
```

However, let us now run a Chow test on heart rate data excluding the increase in physical activity, i.e. on all observations from [1:30] where heart rate remains pretty much consistent with no structural breaks:

Pooled Regression (RegP). Observations [1:30]:

```Call:
lm(formula = Beats.per.minute[1:30] ~ Steps.taken.per.minute[1:30])

Residuals:
Min      1Q  Median      3Q     Max
-3.1732 -0.7649  0.2830  1.2772  2.2162

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)                  78.173175   0.403941 193.526   <2e-16 ***
Steps.taken.per.minute[1:30] -0.005812   0.007742  -0.751    0.459
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.62 on 28 degrees of freedom
F-statistic: 0.5635 on 1 and 28 DF,  p-value: 0.4591

> regP\$df
[1] 28
[1] 73.48779
```

Regression A (RegA). Observations [1:15]:

```Call:
lm(formula = Beats.per.minute[1:15] ~ Steps.taken.per.minute[1:15])

Residuals:
Min      1Q  Median      3Q     Max
-2.8422 -0.9517  0.5483  1.2396  1.5483

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)                  78.451726   0.536338  146.27   <2e-16 ***
Steps.taken.per.minute[1:15] -0.009097   0.010454   -0.87      0.4
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.519 on 13 degrees of freedom
F-statistic: 0.7572 on 1 and 13 DF,  p-value: 0.4

> regA\$df
[1] 13
[1] 29.98672
```

Regression B (RegB). Observations [16:30]:

```Call:
lm(formula = Beats.per.minute[16:30] ~ Steps.taken.per.minute[16:30])

Residuals:
Min      1Q  Median      3Q     Max
-2.8951 -0.7039  0.1049  1.2104  2.2816

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)                   77.895134   0.634878  122.69   <2e-16 ***
Steps.taken.per.minute[16:30] -0.002638   0.011972   -0.22    0.829
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.803 on 13 degrees of freedom
F-statistic: 0.04854 on 1 and 13 DF,  p-value: 0.829

> regB\$df
[1] 13
>
[1] 42.24226
```

In this instance, we see that the Chow test of 0.1917 is significantly less than the F Critical Value of 2.5769. Therefore, the null hypothesis of no structural break in the time series cannot be rejected, which is in line with what we expect since no structural break for the time series was observed graphically.

```k=2
fcrit=qf(.95,df1=regA\$df,df2=regB\$df)
fcrit
[1] 2.576927