In this example, neural networks are used to forecast energy consumption of the Dublin City Council Civic Offices between March 2011 – February 2013.

## Summary of Study

This analysis is divided into two parts:

- The
**neuralnet**library in R is used to predict electricity consumption through the use of various explanatory variables - An
**LSTM**network is generated using Keras to predict electricity consumption using the time series exclusive of any explanatory variables

The relevant data was sourced from data.gov.ie and met.ie. Electricity consumption data was provided on an hourly basis, but converted to daily data for the purpose of this analysis.

The variables are as follows:

**eurgbp:**EUR/GBP currency rate**rain:**Rainfall**maxt:**Maximum temperature**mint:**Minimum temperature**wdsp:**Wind speed**sun:**Sunlight hours**kwh:**KWH (consumption)

With Ireland obtaining about 45% of its electricity from natural gas, 96% of which is imported from Scotland, EUR/GBP currency fluctuations clearly have a significant impact on the cost of electricity in Ireland, and was therefore included as an explanatory variable.

Moreover, with weather conditions also significantly influencing electricity usage, weather data for the Dublin region was also included for the relevant dates in question.

## Key Findings

It was found that of the two models, LSTM was able to predict electricity consumption more accurately, with the training and test predictions closely mirroring actual consumption:

The model demonstrated an average error of **353.25** on the training dataset, and **255.13** on the test dataset (out of thousands of kilowatts).

# Part 1: neuralnet

A neural network consists of:

**Input layers:**Layers that take inputs based on existing data**Hidden layers:**Layers that use backpropagation to optimise the weights of the input variables in order to improve the predictive power of the model**Output layers:**Output of predictions based on the data from the input and hidden layers

### 1.1. Data Normalization

The data is normalized and split into training and test data:

# MAX-MIN NORMALIZATION > normalize <- function(x) { > return ((x - min(x)) / (max(x) - min(x))) > } > maxmindf <- as.data.frame(lapply(fullData, normalize)) # TRAINING AND TEST DATA trainset <- maxmindf[1:378, ] testset <- maxmindf[379:472, ]

### 1.2. Neural Network Output

The neural network is then run and the parameters are generated:

# NEURAL NETWORK > library(neuralnet) > nn <- neuralnet(kwh ~ eurgbp + rain + maxt + mint + wdsp + sun,data=trainset, hidden=c(5,2), linear.output=TRUE, threshold=0.01) > nn$result.matrix 1 error 2.168927756297 reached.threshold 0.008657878909 steps 994.000000000000 Intercept.to.1layhid1 -0.943475389102 eurgbp.to.1layhid1 1.221792852624 rain.to.1layhid1 0.222508044224 maxt.to.1layhid1 1.356892947349 mint.to.1layhid1 -0.377284881968 wdsp.to.1layhid1 0.749993672528 sun.to.1layhid1 -0.250669884677 Intercept.to.1layhid2 3.424295572041 eurgbp.to.1layhid2 -4.921292790902 rain.to.1layhid2 3.380551856044 maxt.to.1layhid2 -2.353604121342 mint.to.1layhid2 0.877423599705 wdsp.to.1layhid2 -0.581900515451 sun.to.1layhid2 -7.083263552687 Intercept.to.1layhid3 0.352457802915 eurgbp.to.1layhid3 3.715376984054 rain.to.1layhid3 -1.030450129246 maxt.to.1layhid3 -0.672907974572 mint.to.1layhid3 0.898040603876 wdsp.to.1layhid3 -1.474470972212 sun.to.1layhid3 -1.793900522508 Intercept.to.1layhid4 0.819225033685 eurgbp.to.1layhid4 -16.770362105816 rain.to.1layhid4 -2.483557437596 maxt.to.1layhid4 -0.059472312293 mint.to.1layhid4 2.650852686615 wdsp.to.1layhid4 3.863732942893 sun.to.1layhid4 0.224801123127 Intercept.to.1layhid5 -13.987427433833 eurgbp.to.1layhid5 -1.661519269508 rain.to.1layhid5 -52.279711798215 maxt.to.1layhid5 22.717540151979 mint.to.1layhid5 11.670399514036 wdsp.to.1layhid5 9.713301368020 sun.to.1layhid5 10.804887927196 Intercept.to.2layhid1 -0.834412474581 1layhid.1.to.2layhid1 1.629948945316 1layhid.2.to.2layhid1 -3.064448233097 1layhid.3.to.2layhid1 0.197497636177 1layhid.4.to.2layhid1 -0.370098281335 1layhid.5.to.2layhid1 -0.402324278545 Intercept.to.2layhid2 -1.176093680811 1layhid.1.to.2layhid2 1.312897190062 1layhid.2.to.2layhid2 0.593640022150 1layhid.3.to.2layhid2 1.906008701982 1layhid.4.to.2layhid2 1.811035017074 1layhid.5.to.2layhid2 -0.725078284924 Intercept.to.kwh -0.093973916107 2layhid.1.to.kwh 0.700847362516 2layhid.2.to.kwh 0.922218125575

Here is what our neural network looks like in visual format:

### 1.3. Model Validation

Then, we validate (or test the accuracy of our model) by comparing the estimated consumption in KWH yielded from the neural network to the actual consumption as reported in the test output:

> results <- data.frame(actual = testset$kwh, prediction = nn.results$net.result) > results actual prediction 379 0.8394856269 0.72836479401 380 0.7976933676 0.72836479401 381 0.8125463657 0.72836479401 382 0.8377382154 0.72836479401 383 0.8394856269 0.72836479401 384 0.8415242737 0.72836479401 .......... 467 0.7464359625 0.80778769677 468 0.7018769682 0.82063018370 469 0.7004207919 0.78094824279 470 0.6726078249 0.77185373598 471 0.7176036721 0.91671846789 472 0.7199335541 0.80974222504

### 1.4. Accuracy

In the below code, we are then converting the data back to its original format, and yielding an accuracy of 98% on a mean absolute deviation basis (i.e. the average deviation between estimated and actual electricity consumption stands at a mean of 2%). Note that we are also converting our data back into standard values given that they were previously scaled using the max-min normalization technique:

> predicted=results$prediction * abs(diff(range(kwh))) + min(kwh) > actual=results$actual * abs(diff(range(kwh))) + min(kwh) > comparison=data.frame(predicted,actual) > deviation=((actual-predicted)/actual) > comparison=data.frame(predicted,actual,deviation) > accuracy=1-abs(mean(deviation)) > accuracy [1] 0.9828191884

A mean accuracy of 98% is obtained using a (5,2) hidden configuration. However, note that since this is a mean accuracy, it does not necessarily imply that all predictions generated by the model will have such high accuracy. Indeed, accuracy is lower in certain cases as can be observed from the histogram below.

When we plot a histogram of the deviation (with 100 breaks), we see that the majority of forecasts fall within 10% from the actual consumption.

When plotting the predicted and actual consumption, it is observed that while the prediction series generated by the neural network follows the general range of the actual (i.e. between 4200-5000 Kwhs), the model is not particularly adept at predicting the peaks and valleys in the series (or periods of abnormally low or high usage).

# Part 2: LSTM (Long-Short Term Memory Network)

A shortcoming of traditional neural network models is that they do not account for dependencies across time series data.

When a neural network was generated using neuralnet, it was assumed that all observations are independent to each other. However, this is not necessarily the case.

### 2.1. Issue of Stationarity

When observing line charts for both KWH (consumption) and the EUR/GBP, we can see that the KWH time series shows a stationary pattern (stationary meaning that the mean, variance, and autocorrelation are constant):

However, when the EUR/GBP currency fluctuations are plotted over the same time period, the data is clearly non-stationary, i.e. the mean, variance, and autocorrelation differ over time:

Given that non-stationarity was present in certain explanatory variables, the LSTM model will now be used to predict future values of KWH against the test set - independent of any other explanatory variables.

In other words, only the values of KWH will be predicted using LSTM. The analysis is carried out using the Keras library in Python. The following guide also provides a detailed overview of predictions with LSTM using a separate example.

### 2.2. Data Processing

Firstly, the relevant libraries are imported and data processing is carried out:

# Import libraries import numpy as np import matplotlib.pyplot as plt from pandas import read_csv import math from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from sklearn.preprocessing import MinMaxScaler from sklearn.metrics import mean_squared_error import os; path="filepath" os.chdir(path) os.getcwd() # Form dataset matrix def create_dataset(dataset, previous=1): dataX, dataY = [], [] for i in range(len(dataset)-previous-1): a = dataset[i:(i+previous), 0] dataX.append(a) dataY.append(dataset[i + previous, 0]) return np.array(dataX), np.array(dataY) # fix random seed for reproducibility np.random.seed(7) # load dataset dataframe = read_csv('data.csv', usecols=[0], engine='python', skipfooter=3) dataset = dataframe.values dataset = dataset.astype('float32') # normalize dataset with MinMaxScaler scaler = MinMaxScaler(feature_range=(0, 1)) dataset = scaler.fit_transform(dataset) # Training and Test data partition train_size = int(len(dataset) * 0.8) test_size = len(dataset) - train_size train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:] # reshape into X=t and Y=t+1 previous = 1 X_train, Y_train = create_dataset(train, previous) X_test, Y_test = create_dataset(test, previous) # reshape input to be [samples, time steps, features] X_train = np.reshape(X_train, (X_train.shape[0], 1, X_train.shape[1])) X_test = np.reshape(X_test, (X_test.shape[0], 1, X_test.shape[1]))

### 2.3. LSTM Generation and Predictions

Then, the LSTM model is generated and predictions are yielded:

# Generate LSTM network model = Sequential() model.add(LSTM(4, input_shape=(1, previous))) model.add(Dense(1)) model.compile(loss='mean_squared_error', optimizer='adam') model.fit(X_train, Y_train, epochs=100, batch_size=1, verbose=2) # Generate predictions trainpred = model.predict(X_train) testpred = model.predict(X_test) # Convert predictions back to normal values trainpred = scaler.inverse_transform(trainpred) Y_train = scaler.inverse_transform([Y_train]) testpred = scaler.inverse_transform(testpred) Y_test = scaler.inverse_transform([Y_test]) # calculate RMSE trainScore = math.sqrt(mean_squared_error(Y_train[0], trainpred[:,0])) print('Train Score: %.2f RMSE' % (trainScore)) testScore = math.sqrt(mean_squared_error(Y_test[0], testpred[:,0])) print('Test Score: %.2f RMSE' % (testScore)) # Train predictions trainpredPlot = np.empty_like(dataset) trainpredPlot[:, :] = np.nan trainpredPlot[previous:len(trainpred)+previous, :] = trainpred # Test predictions testpredPlot = np.empty_like(dataset) testpredPlot[:, :] = np.nan testpredPlot[len(trainpred)+(previous*2)+1:len(dataset)-1, :] = testpred # Plot all predictions inversetransform, =plt.plot(scaler.inverse_transform(dataset)) trainpred, =plt.plot(trainpredPlot) testpred, =plt.plot(testpredPlot) plt.title("Predicted vs. Actual Consumption") plt.show()

The model is trained over **100** epochs, and the predictions are generated.

### 2.4. Accuracy

When plotting the actual consumption (blue line) with the training and test predictions (orange and green lines), the two series follow each other quite closely, with the exception of certain spikes downward (or periods of abnormally low usage):

Moreover, here is our output when 100 epochs are generated:

Epoch 94/100 - 1s - loss: 0.0108 Epoch 95/100 - 1s - loss: 0.0108 Epoch 96/100 - 1s - loss: 0.0107 Epoch 97/100 - 1s - loss: 0.0108 Epoch 98/100 - 1s - loss: 0.0108 Epoch 99/100 - 1s - loss: 0.0108 Epoch 100/100 - 1s - loss: 0.0109 >>> # calculate RMSE ... trainScore = math.sqrt(mean_squared_error(Y_train[0], trainpred[:,0])) >>> print('Train Score: %.2f RMSE' % (trainScore)) Train Score: 353.25 RMSE >>> testScore = math.sqrt(mean_squared_error(Y_test[0], testpred[:,0])) >>> print('Test Score: %.2f RMSE' % (testScore)) Test Score: 255.13 RMSE

The model has an average error of **353.25** on the training dataset, and **255.13** on the test dataset (out of thousands of kilowatts).

# Conclusion

Of the two neural networks, LSTM proved to be more accurate at predicting fluctuations in electricity consumption.

In the case of neuralnet, the model was not completely adept at handling non-stationary data present in various explanatory variables.

Moreover, factors such as temperature already follow set historical trends generally (with the exception of abnormal weather patterns which might have an effect on consumption).

In this regard, a traditional neural network with explanatory variables proved less effective in this instance than LSTM, which was able to model fluctuations in consumption without the need for explanatory data.

## Leave a Reply