# Cross Correlation Analysis: Analysing Currency Pairs in Python

When working with a time series, one important thing we wish to determine is whether one series “causes” changes in another. In other words, is there a strong correlation between a time series and another given a number of lags? The way we can detect this is through measuring cross correlation.

For instance, one time series could serve as a lagging indicator. This is where the effect of a change in one time series transfers to the other time series several periods later. This is quite common in economic data; e.g. an economic shock having an effect on GDP two quarters later.

Two important things that we must ensure when we run a cross correlation:

• Our time series is stationary.
• Once we have chosen the suitable lag, we are then able to detect and correct for serial correlation if necessary.

In a previous post, we looked at how we can determine the extent of cross correlation among different currency pairs using the ccf library in R. Let’s now see how this analysis can be conducted using Python.

Firstly, we will import our libraries and download currency data for the EUR/USD and GBP/USD from quandl.

```import numpy as np
import pandas as pd
import statsmodels
import statsmodels.tsa.stattools as ts
from statsmodels.tsa.stattools import acf, pacf
import matplotlib as mpl
import matplotlib.pyplot as plt
import quandl
import scipy.stats as ss

eurusd = quandl.get("FRED/DEXUSEU",start_date="2015-05-01",end_date="2015-10-01", type="xts")
eurusd
Value
Date
2015-05-01  1.1194
2015-05-04  1.1145
2015-05-05  1.1174
2015-05-06  1.1345
2015-05-07  1.1283
2015-05-08  1.1241
2015-05-11  1.1142
2015-05-12  1.1240
2015-05-13  1.1372
2015-05-14  1.1368
2015-05-15  1.1428
2015-05-18  1.1354
2015-05-19  1.1151
2015-05-20  1.1079
2015-05-21  1.1126
2015-05-22  1.1033
2015-05-26  1.0876
2015-05-27  1.0888
2015-05-28  1.0914
2015-05-29  1.0994
2015-06-01  1.0913
2015-06-02  1.1130
2015-06-03  1.1285
2015-06-04  1.1271
2015-06-05  1.1108
2015-06-08  1.1232
2015-06-09  1.1284
2015-06-10  1.1307
2015-06-11  1.1236
2015-06-12  1.1278
...            ...
2015-08-20  1.1200
2015-08-21  1.1356
2015-08-24  1.1580
2015-08-25  1.1410
2015-08-26  1.1390
2015-08-27  1.1239
2015-08-28  1.1172
2015-08-31  1.1194
2015-09-01  1.1263
2015-09-02  1.1242
2015-09-03  1.1104
2015-09-04  1.1117
2015-09-08  1.1182
2015-09-09  1.1165
2015-09-10  1.1262
2015-09-11  1.1338
2015-09-14  1.1307
2015-09-15  1.1260
2015-09-16  1.1304
2015-09-17  1.1312
2015-09-18  1.1358
2015-09-21  1.1204
2015-09-22  1.1133
2015-09-23  1.1160
2015-09-24  1.1252
2015-09-25  1.1192
2015-09-28  1.1236
2015-09-29  1.1246
2015-09-30  1.1162
2015-10-01  1.1200

[107 rows x 1 columns]
gbpusd = quandl.get("FRED/DEXUSUK",start_date="2015-05-01",end_date="2015-10-01", type="xts")
gbpusd
Value
Date
2015-05-01  1.5137
2015-05-04  1.5118
2015-05-05  1.5178
2015-05-06  1.5244
2015-05-07  1.5223
2015-05-08  1.5455
2015-05-11  1.5593
2015-05-12  1.5685
2015-05-13  1.5748
2015-05-14  1.5766
2015-05-15  1.5772
2015-05-18  1.5679
2015-05-19  1.5523
2015-05-20  1.5544
2015-05-21  1.5672
2015-05-22  1.5484
2015-05-26  1.5398
2015-05-27  1.5324
2015-05-28  1.5291
2015-05-29  1.5286
2015-06-01  1.5187
2015-06-02  1.5331
2015-06-03  1.5351
2015-06-04  1.5367
2015-06-05  1.5267
2015-06-08  1.5280
2015-06-09  1.5383
2015-06-10  1.5530
2015-06-11  1.5493
2015-06-12  1.5587
...            ...
2015-08-20  1.5688
2015-08-21  1.5698
2015-08-24  1.5731
2015-08-25  1.5698
2015-08-26  1.5493
2015-08-27  1.5411
2015-08-28  1.5362
2015-08-31  1.5363
2015-09-01  1.5341
2015-09-02  1.5310
2015-09-03  1.5254
2015-09-04  1.5195
2015-09-08  1.5381
2015-09-09  1.5363
2015-09-10  1.5457
2015-09-11  1.5426
2015-09-14  1.5423
2015-09-15  1.5352
2015-09-16  1.5499
2015-09-17  1.5509
2015-09-18  1.5573
2015-09-21  1.5506
2015-09-22  1.5356
2015-09-23  1.5243
2015-09-24  1.5254
2015-09-25  1.5172
2015-09-28  1.5208
2015-09-29  1.5168
2015-09-30  1.5116
2015-10-01  1.5162

[107 rows x 1 columns]
```

We will now check the data types, rename as x and y, and extract the currency values from each series:

```# Check type
type(eurusd)
class 'pandas.core.frame.DataFrame'
type(gbpusd)
class 'pandas.core.frame.DataFrame'

# Extract value columns and save as x and y
x = eurusd[eurusd.columns[0]]
y = gbpusd[gbpusd.columns[0]]
```

When dealing with financial data, it is good practice to express our data in returns rather than price.

Since an investor is subject to the compounding effect when holding an asset, the series should be expressed in the form of logarithmic returns:

```#Log format: we want to express in returns rather than price
x=np.log(x)
y=np.log(y)
```

Let us now plot the autocorrelation and partial autocorrelation plots:

```acfx=statsmodels.tsa.stattools.acf(x)
plt.plot(acfx)
plt.title("Autocorrelation Function")
Text(0.5,1,'Autocorrelation Function')
plt.show()
```

```pacfx=statsmodels.tsa.stattools.pacf(x)
plt.plot(pacfx)
plt.title("Partial Autocorrelation Function")
Text(0.5,1,'Partial Autocorrelation Function')
plt.show()
```

```acfy=statsmodels.tsa.stattools.acf(y)
plt.plot(acfy)
plt.title("Autocorrelation Function")
Text(0.5,1,'Autocorrelation Function')
plt.show()
```

```pacfy=statsmodels.tsa.stattools.pacf(y)
plt.plot(pacfy)
plt.title("Partial Autocorrelation Function")
Text(0.5,1,'Partial Autocorrelation Function')
plt.show()
```

Here are our currency plots:

```#Plot currencies
plt.plot(x)
plt.title("EUR/USD")
Text(0.5,1,'EUR/USD')
plt.show()
```

```plt.plot(y)
plt.title("GBP/USD")
Text(0.5,1,'GBP/USD')
plt.show()
```

## Dickey-Fuller Test and First Differencing

As mentioned, we wish to ensure that our time series are stationary before obtaining a cross correlation reading.

To test for stationarity, we will use the Dickey-Fuller test. A p-value below 0.05 indicates stationarity, while a p-value above this threshold indicates non-stationarity.

```#Dickey-Fuller Tests
xdf
(-3.0704779047168596, 0.028816508715839483, 0, 106, {'1%': -3.4936021509366793, '5%': -2.8892174239808703, '10%': -2.58153320754717}, -723.247574137278)
ydf
(-2.949959856756157, 0.03983919029636401, 1, 105, {'1%': -3.4942202045135513, '5%': -2.889485291005291, '10%': -2.5816762131519275}, -815.3639322514784)
```

Since our p-values are below 0.05 (xdf = 0.0288, ydf = 0.0398), this means that we do not have to first-difference our series for stationarity.

## Cross Correlation Analysis

Now, we will calculate the cross correlation between these two currency pairs. The following guide gives a great overview as to how to calculate cross correlations in Python, and I recommend viewing for more detail.

Firstly, we will calculate the cross correlation between x and y:

```# Calculate correlations
cc1 = np.correlate(x - x.mean(), y - y.mean())[0] # Remove means
cc1
0.0016363869247897089

cc1 /= (len(x) * x.std() * y.std()) #Normalise by number of points and product of standard deviations
cc1
0.09356342030097958

cc2 = np.corrcoef(x, y)[0, 1]
cc2
0.09444609407740386

print(cc1, cc2)
0.09356342030097958 0.09444609407740386
```

Now, we will generate the lags and calculate the autocorrelations:

```# Generating lags
lg = 108
x = np.random.randn(lg)
x
array([-1.33637126, -0.91268722,  0.58849321, -0.80577306,  0.44216736,
0.30785343,  0.83732885, -1.83063047, -1.27725301, -0.36030619,
0.74378721,  1.24821411, -0.21243094, -0.44926653,  0.53163943,
0.08144901, -0.09353262, -0.25299342, -0.08451991, -1.13738216,
0.11675753,  0.80485171,  2.13296554,  0.72919092,  0.60112596,
-1.56293131,  0.66922138, -0.23881075, -1.13166073, -0.83272733,
2.1403491 ,  0.02782964,  0.36113361, -1.07173447,  0.02294472,
-0.78681884, -0.63989533,  0.57949374,  1.86681425,  0.22116599,
1.20812001, -0.18709786, -0.74800786,  0.5837688 , -1.135576  ,
0.54448453, -0.73185883,  0.02619788, -0.0876121 ,  0.97783246,
-0.78385248,  1.22237382,  1.19862324,  1.38045208,  0.36843351,
-1.54990015,  0.80960025, -0.26339043,  1.84675467,  0.7049357 ,
0.30870394,  0.16942497, -0.77365822, -0.86607875, -0.16899972,
-0.11752054, -0.27399429,  1.70566348,  0.81296314,  1.27271672,
0.15526539, -0.63711848, -0.0908118 , -0.23123026, -2.11297765,
-0.21946581,  0.58519721, -2.03020424, -0.61200098, -0.48835915,
-1.52874924, -0.38179953,  0.68368341,  0.27901589,  0.09745758,
-1.94248569,  1.3592975 , -0.04699326,  1.64716923, -0.82430672,
-1.45955716,  0.83442515,  0.33651276, -0.27044973,  0.27876725,
-1.87478131, -1.3469446 ,  1.10244709, -0.6680917 , -0.87016061,
1.69110314, -1.21336262, -0.12559932, -0.82219965, -0.38586643,
-0.90661031, -1.3301571 ,  1.41940189])
lags = np.arange(-lg + 1, lg)
lags
array([-107, -106, -105, -104, -103, -102, -101, -100,  -99,  -98,  -97,
-96,  -95,  -94,  -93,  -92,  -91,  -90,  -89,  -88,  -87,  -86,
-85,  -84,  -83,  -82,  -81,  -80,  -79,  -78,  -77,  -76,  -75,
-74,  -73,  -72,  -71,  -70,  -69,  -68,  -67,  -66,  -65,  -64,
-63,  -62,  -61,  -60,  -59,  -58,  -57,  -56,  -55,  -54,  -53,
-52,  -51,  -50,  -49,  -48,  -47,  -46,  -45,  -44,  -43,  -42,
-41,  -40,  -39,  -38,  -37,  -36,  -35,  -34,  -33,  -32,  -31,
-30,  -29,  -28,  -27,  -26,  -25,  -24,  -23,  -22,  -21,  -20,
-19,  -18,  -17,  -16,  -15,  -14,  -13,  -12,  -11,  -10,   -9,
-8,   -7,   -6,   -5,   -4,   -3,   -2,   -1,    0,    1,    2,
3,    4,    5,    6,    7,    8,    9,   10,   11,   12,   13,
14,   15,   16,   17,   18,   19,   20,   21,   22,   23,   24,
25,   26,   27,   28,   29,   30,   31,   32,   33,   34,   35,
36,   37,   38,   39,   40,   41,   42,   43,   44,   45,   46,
47,   48,   49,   50,   51,   52,   53,   54,   55,   56,   57,
58,   59,   60,   61,   62,   63,   64,   65,   66,   67,   68,
69,   70,   71,   72,   73,   74,   75,   76,   77,   78,   79,
80,   81,   82,   83,   84,   85,   86,   87,   88,   89,   90,
91,   92,   93,   94,   95,   96,   97,   98,   99,  100,  101,
102,  103,  104,  105,  106,  107])
xr = x - x.mean() # Remove sample mean
xr
array([-1.29507509, -0.87139105,  0.62978939, -0.76447689,  0.48346353,
0.3491496 ,  0.87862502, -1.78933429, -1.23595684, -0.31901002,
0.78508339,  1.28951028, -0.17113477, -0.40797036,  0.5729356 ,
0.12274518, -0.05223644, -0.21169724, -0.04322373, -1.09608598,
0.15805371,  0.84614788,  2.17426171,  0.7704871 ,  0.64242213,
-1.52163514,  0.71051756, -0.19751458, -1.09036456, -0.79143115,
2.18164528,  0.06912581,  0.40242978, -1.0304383 ,  0.0642409 ,
-0.74552267, -0.59859916,  0.62078992,  1.90811042,  0.26246216,
1.24941618, -0.14580169, -0.70671168,  0.62506497, -1.09427983,
0.58578071, -0.69056266,  0.06749406, -0.04631593,  1.01912863,
-0.74255631,  1.26366999,  1.23991941,  1.42174825,  0.40972968,
-1.50860398,  0.85089643, -0.22209426,  1.88805084,  0.74623187,
0.35000011,  0.21072114, -0.73236205, -0.82478257, -0.12770354,
-0.07622436, -0.23269812,  1.74695966,  0.85425932,  1.3140129 ,
0.19656156, -0.59582231, -0.04951562, -0.18993409, -2.07168147,
-0.17816963,  0.62649338, -1.98890806, -0.57070481, -0.44706297,
-1.48745307, -0.34050335,  0.72497958,  0.32031206,  0.13875376,
-1.90118952,  1.40059368, -0.00569709,  1.68846541, -0.78301054,
-1.41826098,  0.87572133,  0.37780893, -0.22915356,  0.32006342,
-1.83348514, -1.30564843,  1.14374326, -0.62679552, -0.82886443,
1.73239932, -1.17206644, -0.08430314, -0.78090348, -0.34457026,
-0.86531414, -1.28886093,  1.46069806])

autocorr_xr = np.correlate(xr, xr, mode='full')
# Normalize by the zero-lag value:
... autocorr_xr /= autocorr_xr[lg - 1]
```

Now, we can plot the cross correlation:

```fig, ax = plt.subplots()
ax.plot(lags, autocorr_xr, 'b')
ax.set_xlabel('lag')
Text(0.5,0,'lag')
ax.set_ylabel('correlation coefficient')
Text(0,0.5,'correlation coefficient')
ax.grid(True)
plt.title("EUR/USD vs GBP/USD")
Text(0.5,1,'EUR/USD vs GBP/USD')
plt.show()
```

We see that while the correlations get weaker as the lags increase (which we expect), we have significantly negative lags at t = -50 and t = 50 with correlation coefficients lower than -0.2.

Overall, the cross correlation between EUR/USD and GBP/USD appears more negative than positive.

Let’s compare this to two other currency pairs. We will choose the JPY/USD vs CHF/USD:

Interestingly, we see that we have more frequent negative correlations while the positive correlations are stronger than that of the EUR and GBP. Given that the CHF and JPY are two “safe haven” currencies that typically rise during “risk-off” periods in the market, then it is not surprising that we are seeing stronger positive correlations between the two, along with significantly negative correlations when demand is falling.

## Conclusion

In this tutorial, you have learned:

• How to analyse financial data in Python using Quandl
• How to test a time series for stationarity
• How to conduct a cross correlation in Python

Many thanks for reading this tutorial, and please leave any questions you may have in the comments below.