When working with a time series, one important thing we wish to determine is whether one series “causes” changes in another.
In other words, is there a strong correlation between a time series and another given a number of lags?
The way we can detect this is through measuring cross-correlation.
For instance, one time series could serve as a lagging indicator. This is where the effect of a change in one time series transfers to the other time series several periods later. This is quite common in economic data; e.g. an economic shock having an effect on GDP two quarters later.
But how do we measure the lag where this is significant? One very handy way of doing so in R is using the ccf (cross-correlation) function.
Running this function allows us to determine the lag at which the correlation between two time series is strongest.
Two important things that we must ensure when we run a cross-correlation:
- Our time series is stationary.
- Once we have chosen the suitable lag, we are then able to detect and correct for serial correlation if necessary.
If you are unfamiliar with these, then please review two of my previous posts on stationarity and serial correlation:
ccf Plot for Campus Gym Data
The sample dataset I use for this example is the campus gym dataset from Kaggle.
The goal is to determine if there is a correlation between the change in temperature every 10 minutes and the number of people using a campus gym.
When the number of people and temperature for every 10 minutes is plotted, a possible trend for both variables is evident and thus non-stationarity may be present:
Let us now formally test for stationarity using the ADF and KPSS tests:
While our ADF test indicates stationarity, the KPSS test conversely predicts that a unit root is present. In spite of the inconsistency, the plots still indicate a trend and the two series are therefore differenced:
We now see that while the ADF test still shows a p-value below 0.05, the KPSS test now shows a p-value above 0.05. This indicates trend stationarity.
Moreover, when we plot the ACF for the two differenced variables, stationarity is indicated:
Cross-Correlation: Output and Plot
Let us now plot the cross-correlation (CCF function) for the two differenced variables:
From the plot, we see that we have the strongest cross-correlations at lag 0 and lag -39. So, let’s go ahead and run regressions on these:
When we have run both of the regressions and corrected for serial correlation, we have a positive and significant independent variable at lag 0. However, note that we also a negative and insignificant independent variable at lag -39.
The output suggests that the correlation is indeed strongest and significant when no lag is present.
Of course, it could be the case that when using the cross-correlation function, we observe the strongest correlations at a certain lag – this did not happen in this instance.
However, when looking at correlations across time series models, it always helps to experiment with correlations across different periods to obtain the strongest predictive model.