A key concept in dealing with statistical data is the **law of large numbers**; meaning that the more observations we have across any particular dataset, the more that the data will resemble a normal distribution where the majority of results are centered around the mean.

This has implications for time series analysis since the more observations we have, the more likely we will observe our results converge towards the mean. However, an interesting observation is that as the number of time periods increases, we are more likely to also see highly significant deviations from the mean as well, in which the overall distribution would be substantially impacted.

# Normal Distributions and the Law of Large Numbers

Let’s illustrate this using an example. Using a normal distribution model generated by **Python**, suppose that we have a financial security with a **0.6%** daily return and a **2.5%** daily standard deviation:

fromfutureimport division import random import numpy as np import matplotlib.pyplot as plt mu, sigma = 0.6, 2.5 # mean and standard deviation s = np.random.normal(mu, sigma,5) count, bins, ignored = plt.hist(s, 30, normed=True) plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) * np.exp( - (bins - mu)2 / (2 * sigma2) ), linewidth=2, color='r') plt.show()

In this context, we are running 5 trials on the data; i.e. generating 5 random numbers with a mean of **0.6%** and standard deviation of **2.5%**:

**Trial 4**

We see that the distribution of our data is not particularly consistent, and often times our results will trade in a narrow range of -1% to 3.5% for instance; which we reasonably expect as the numbers are centered around the mean. However, let us suppose that we now run our program on **1000** observations:

fromfutureimport division import random import numpy as np import matplotlib.pyplot as plt mu, sigma = 0.6, 2.5 # mean and standard deviation s = np.random.normal(mu, sigma,1000) count, bins, ignored = plt.hist(s, 30, normed=True) plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) * np.exp( - (bins - mu)2 / (2 * sigma2) ), linewidth=2, color='r') plt.show()

**Trial 1**

**Trial 2**

**Trial 3**

**Trial 4**

We now see that our data is largely centered around the mean, and the histogram takes the general shape of a normal distribution, i.e. the distribution follows the 3-sigma rule where 68% of values are one standard deviation away from the mean, 95% of values lie two standard deviations away, while 99.7% of them lie within three standard deviations.

Note that while the majority of our values are now centered around the mean, the distribution also yields “extreme” values, i.e. we see that returns can differ from the mean by as much as **10** percent. This has important implications for trading strategies, since the model implies that the longer a security is held, the more chance we have of observing extreme values.

# Monte Carlo Simulation and Random Walk Generation

The purpose of a Monte Carlo simulation is to observe a range of potential outcomes based on a numerical simulation. For instance, if an investor chooses to hold an asset with a given level of return and volatility, then the same can be modelled to examine a range of potential gains and losses over a specified period.

The below shows how to calculate a price path of a stock using a random walk for a given level of **return (mu)** and

**volatility (vol)**. Moreover, our histogram plots allow us to observe a range of potential returns throughout the period as well as frequency of those returns.

import numpy as np import math import matplotlib.pyplot as plt %matplotlib inline from scipy.stats import norm #Define Variables T = 250 #Number of trading days (we also run at 1000 for the purposes of comparison). mu = 0.09 #Return vol = 0.1 #Volatility daily_returns=np.random.normal(mu/T,vol/math.sqrt(T),T)+1 price_list = [200] for x in daily_returns: price_list.append(price_list[-1]*x) #Generate Plots plt.plot(price_list) #plt.hist(daily_returns-1, 100) #Note that we run the line plot and histogram separately, not simultaneously. plt.show()

**Observations and Graphs**

We see that when we run the simulation across **1000** trading days as opposed to **250**, the distribution of returns comes to represent one where we do not positive or negative skew since the extremities of returns are identical at **+/-0.1**. In this regard, the higher our number of observations (trading days), the more our histogram comes to represent a normal distribution. Whereas, the lower the number of trading days, we see that we have negative skew in that the lowest return was **-0.02** whereas the highest return was **0.015**.

**10% Volatility Over 250 Trading Days**

**10% Volatility Over 1000 Trading Days**

As we can see, our distribution takes more of a normal U-shape as the number of trading days is increased from 250 to 1000.

**Random Walk at 10% Volatility**

**Random Walk at 30% Volatility**

When we increase the **vol** variable from 0.1 (10%) to 0.3 (30%), we see that while our random walk shows higher peaks and valleys, there is a lower return overall – demonstrating that the higher the volatility, the lower the effective return.