A Chow test is designed to determine whether a structural break in a time series exists. That is to say, a sharp change in trend in a time series that merits further study. For instance, a structural break in one series can give useful clues as to whether such a change is being propagated across other variables – assuming that there is a significant correlation between them under normal circumstances.

The Chow test is conducted by running three separate regressions: 1) a pooled regression with data before and after the structural break, 2) a regression with data before the structural break, and 3) a regression with data after the structural break. The residual sum of squares for each regression is used to calculate the Chow statistic using the following formula:

CHOW = (RSS_{P}- (RSS_{A}+RSS_{B}))/k) / (RSS_{A}+RSS_{B})/(N_{A}+N_{B}-2k)where RSS = Residual Sum of Squares, k = number of regressors (including intercept), N = degrees of freedom

Note that this test can be set up automatically in R using the “strucchange” package. However, I always prefer to calculate the test statistic manually where possible, as it facilitates understanding of why we are applying the test, along with understanding the specific break that we are analysing in the time series.

The null and alternative hypothesis is as follows:

**Null Hypothesis: No structural break in time series**

**Alternative Hypothesis: Structural break in time series**

At the outset, let me say that the Chow Test is more of an academic model in nature, and is not as commonly used as other time series methods. Firstly, most time series (especially ones with an economic or financial trend involved) will show many structural breaks. The Chow Test is mainly useful when it comes to analysing structural breaks across time series that are normally stationary, but a significant shift causes a break in the series.

Let us take the example of a time series measuring a person’s heart rate per minute for every 10 minute interval. It can be assumed that unless that person undergoes significant physical activity, has a sudden illness, is emotionally distressed, or another significant event occurs to change that person’s heart rate, the resting heart rate will remain more or less constant.

Let us plot heart rate and see what it looks like:

We see that with the exception of a clear structural break where heart rate rapidly rises (due to increased physical activity as measured by number of steps per minute), heart rate shows the properties of a stationary time series (one with a constant mean and variance).

Here is a snippet of what our data looks like overall:

Time | Beats per minute | Steps taken per minute |

10:10:00 | 75 | 67 |

10:20:00 | 78 | 0 |

10:30:00 | 80 | 0 |

10:40:00 | 78 | 0 |

10:50:00 | 80 | 0 |

11:00:00 | 78 | 90 |

11:10:00 | 75 | 0 |

11:20:00 | 79 | 0 |

11:30:00 | 78 | 0 |

11:40:00 | 75 | 0 |

11:50:00 | 78 | 0 |

12:00:00 | 75 | 79 |

12:10:00 | 77 | 68 |

12:20:00 | 78 | 80 |

12:30:00 | 77 | 77 |

12:40:00 | 79 | 80 |

12:50:00 | 80 | 67 |

13:00:00 | 79 | 73 |

13:10:00 | 75 | 79 |

13:20:00 | 129 | 173 |

13:30:00 | 137 | 178 |

13:40:00 | 126 | 187 |

13:50:00 | 124 | 190 |

14:00:00 | 126 | 188 |

14:10:00 | 124 | 169 |

14:20:00 | 126 | 192 |

14:30:00 | 75 | 0 |

14:40:00 | 78 | 0 |

14:50:00 | 79 | 0 |

15:00:00 | 79 | 0 |

15:10:00 | 80 | 0 |

15:20:00 | 79 | 0 |

15:30:00 | 79 | 0 |

15:40:00 | 76 | 0 |

15:50:00 | 75 | 0 |

Let us now run the three separate regressions using the following data:

**Pooled Regression (RegP).** Observations [22:43] with data containing heart rate before and after the sudden rise as a result of increased physical activity:

Call: lm(formula = Beats.per.minute[22:43] ~ Steps.taken.per.minute[22:43]) Residuals: Min 1Q Median 3Q Max -16.520 -11.234 3.483 9.050 17.197 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 68.95046 3.59375 19.186 2.38e-14 *** Steps.taken.per.minute[22:43] 0.28569 0.03191 8.954 1.96e-08 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 10.98 on 20 degrees of freedom Multiple R-squared: 0.8003, Adjusted R-squared: 0.7904 F-statistic: 80.17 on 1 and 20 DF, p-value: 1.96e-08 > regP$df [1] 20 > rssP <- sum(residuals(regP)^2) > rssP [1] 2411.655

**Regression A (RegA).** Observations [22:32] with data containing heart rate before increased physical activity:

Call: lm(formula = Beats.per.minute[22:32] ~ Steps.taken.per.minute[22:32]) Residuals: Min 1Q Median 3Q Max -2.463 -1.297 0.533 1.197 2.586 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 77.138547 1.053733 73.205 8.38e-14 *** Steps.taken.per.minute[22:32] 0.004106 0.016357 0.251 0.807 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.836 on 9 degrees of freedom Multiple R-squared: 0.006954, Adjusted R-squared: -0.1034 F-statistic: 0.06302 on 1 and 9 DF, p-value: 0.8074 > regA$df [1] 9 > rssA <- sum(residuals(regA)^2) > rssA [1] 30.33306

**Regression B (Reg B).** Observations [33:43] with data containing heart rate after increased physical activity:

Call: lm(formula = Beats.per.minute[33:43] ~ Steps.taken.per.minute[33:43]) Residuals: Min 1Q Median 3Q Max -5.3181 -2.8994 -0.0207 0.9793 10.9217 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 78.02066 2.38701 32.69 1.16e-10 *** Steps.taken.per.minute[33:43] 0.26999 0.01639 16.48 4.98e-08 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 4.783 on 9 degrees of freedom Multiple R-squared: 0.9679, Adjusted R-squared: 0.9644 F-statistic: 271.5 on 1 and 9 DF, p-value: 4.976e-08 > regB$df [1] 9 > rssB <- sum(residuals(regB)^2) > > rssB [1] 205.8739

With a Chow statistic of **64.46** and an F critical value of **3.178**, our Chow statistic is clearly much larger and therefore the null hypothesis is rejected – showing strong evidence of a structural break.

k=2 fcrit=qf(.95,df1=regA$df,df2=regB$df) fcrit [1] 3.178893 Chow_Statistic=((rssP-(rssA+rssB))/k)/((rssA+rssB)/(regA$df+regB$df-(2*k))) Chow_Statistic [1] 64.46947

However, let us now run a Chow test on heart rate data excluding the increase in physical activity, i.e. on all observations from [1:30] where heart rate remains pretty much consistent with no structural breaks:

**Pooled Regression (RegP).** Observations [1:30]:

Call: lm(formula = Beats.per.minute[1:30] ~ Steps.taken.per.minute[1:30]) Residuals: Min 1Q Median 3Q Max -3.1732 -0.7649 0.2830 1.2772 2.2162 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 78.173175 0.403941 193.526 <2e-16 *** Steps.taken.per.minute[1:30] -0.005812 0.007742 -0.751 0.459 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.62 on 28 degrees of freedom Multiple R-squared: 0.01973, Adjusted R-squared: -0.01528 F-statistic: 0.5635 on 1 and 28 DF, p-value: 0.4591 > regP$df [1] 28 > rssP <- sum(residuals(regP)^2) > rssP [1] 73.48779

**Regression A (RegA).** Observations [1:15]:

Call: lm(formula = Beats.per.minute[1:15] ~ Steps.taken.per.minute[1:15]) Residuals: Min 1Q Median 3Q Max -2.8422 -0.9517 0.5483 1.2396 1.5483 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 78.451726 0.536338 146.27 <2e-16 *** Steps.taken.per.minute[1:15] -0.009097 0.010454 -0.87 0.4 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.519 on 13 degrees of freedom Multiple R-squared: 0.05504, Adjusted R-squared: -0.01765 F-statistic: 0.7572 on 1 and 13 DF, p-value: 0.4 > regA$df [1] 13 > rssA <- sum(residuals(regA)^2) > rssA [1] 29.98672

**Regression B (RegB).** Observations [16:30]:

Call: lm(formula = Beats.per.minute[16:30] ~ Steps.taken.per.minute[16:30]) Residuals: Min 1Q Median 3Q Max -2.8951 -0.7039 0.1049 1.2104 2.2816 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 77.895134 0.634878 122.69 <2e-16 *** Steps.taken.per.minute[16:30] -0.002638 0.011972 -0.22 0.829 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.803 on 13 degrees of freedom Multiple R-squared: 0.00372, Adjusted R-squared: -0.07292 F-statistic: 0.04854 on 1 and 13 DF, p-value: 0.829 > regB$df [1] 13 > rssB <- sum(residuals(regB)^2) > > rssB [1] 42.24226

In this instance, we see that the Chow test of **0.1917** is significantly less than the F Critical Value of **2.5769**. Therefore, the null hypothesis of no structural break in the time series cannot be rejected, which is in line with what we expect since no structural break for the time series was observed graphically.

k=2 fcrit=qf(.95,df1=regA$df,df2=regB$df) fcrit [1] 2.576927 Chow_Statistic=((rssP-(rssA+rssB))/k)/((rssA+rssB)/(regA$df+regB$df-(2*k))) Chow_Statistic [1] 0.1917079

As mentioned, the Chow test is ideal for data that resembles a stationary process. As many (if not most) time series tend to follow a non-stationary process, the Chow test will typically show many structural breaks across the series (albeit some being more pronounced than others). Therefore, the Chow test is most useful when it comes to determining a structural break across data that has a sharp deviation from the normal trend.

## Code Scripts and Datasets

Hope you enjoyed this tutorial!

The full code is available by **subscribing to my mailing list**.

Upon subscription, you will receive full access to the codes and datasets for my tutorials, as well as a comprehensive course in regression analysis in both Python and R.