**Keras** is an API used for running high-level neural networks. The model runs on top of TensorFlow, and was developed by Google.

The main competitor to Keras at this point in time is PyTorch, developed by Facebook. While PyTorch has a somewhat higher level of community support, it is a particularly verbose language and I personally prefer Keras for greater simplicity and ease of use in building and deploying models.

In this particular example, a neural network will be built in Keras to solve a regression problem, i.e. one where our dependent variable (y) is in interval format and we are trying to predict the quantity of y with as much accuracy as possible.

## What Is A Neural Network?

A neural network is a computational system that creates predictions based on existing data. Let us train and test a neural network using the *neuralnet* library in R.

A neural network consists of:

**Input layers:**Layers that take inputs based on existing data**Hidden layers:**Layers that use backpropagation to optimise the weights of the input variables in order to improve the predictive power of the model**Output layers:**Output of predictions based on the data from the input and hidden layers

## Our Example

For this example, we use a **linear activation function** within the **keras** library to create a regression-based neural network. We will use the cars dataset. Essentially, we are trying to predict the value of a potential car sale (i.e. how much a particular person will spend on buying a car) for a customer based on the following attributes:

- Age
- Gender
- Average miles driven per day
- Personal debt
- Monthly income

Firstly, we import our libraries. Note that you will need TensorFlow installed on your system to be able to execute the below code. Depending on your operating system, you can find one of my YouTube tutorials on how to install on Windows 10 here.

## Libraries

import matplotlib.pyplot as plt import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.model_selection import cross_val_score from sklearn.model_selection import KFold from sklearn.pipeline import Pipeline from sklearn.preprocessing import MinMaxScaler from tensorflow.python.keras.models import Sequential from tensorflow.python.keras.layers import Dense from tensorflow.python.keras.wrappers.scikit_learn import KerasRegressor

## Set Directory

import os; path="C:/yourdirectory" os.chdir(path) os.getcwd()

Since we are implementing a neural network, the variables need to be normalized in order for the neural network to interpret them properly. Therefore, our variables are transformed using the **MaxMinScaler()**:

#Variables dataset=np.loadtxt("cars.csv", delimiter=",") x=dataset[:,0:5] y=dataset[:,5] y=np.reshape(y, (-1,1)) scaler = MinMaxScaler() print(scaler.fit(x)) print(scaler.fit(y)) xscale=scaler.transform(x) yscale=scaler.transform(y)

The data is then split into training and test data:

X_train, X_test, y_train, y_test = train_test_split(xscale, yscale)

## Keras Model Configuration: Neural Network API

Now, we train the neural network. We are using the five **input variables** (age, gender, miles, debt, and income), along with **two hidden layers** of **12** and **8** neurons respectively, and finally using the **linear activation function** to process the output.

model = Sequential() model.add(Dense(12, input_dim=5, kernel_initializer='normal', activation='relu')) model.add(Dense(8, activation='relu')) model.add(Dense(1, activation='linear')) model.summary()

The **mean_squared_error (mse)** and **mean_absolute_error (mae)** are our loss functions – i.e. an estimate of how accurate the neural network is in predicting the test data. We can see that with the validation_split set to 0.2, 80% of the training data is used to test the model, while the remaining 20% is used for testing purposes.

model.compile(loss='mse', optimizer='adam', metrics=['mse','mae'])

From the output, we can see that the more epochs are run, the lower our **MSE** and **MAE** become, indicating improvement in accuracy across each iteration of our model.

## Neural Network Output

Let’s now fit our model.

history = model.fit(X_train, y_train, epochs=150, batch_size=50, verbose=1, validation_split=0.2)

>>> history = model.fit(X_train, y_train, epochs=150, batch_size=50, verbose=1, validation_split=0.2) Train on 540 samples, validate on 135 samples Train on 577 samples, validate on 145 samples Epoch 1/150 2018-03-24 19:31:05.078618: I C:\tf_jenkins\workspace\rel-win\M\windows\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2 50/577 [=>............................] 50/577 [=>............................] - ETA: 6s - loss: 0.1718 - mean_squared577/577 [==============================]577/577 [==============================] - 1s 1ms/step - loss: 0.1522 - mean_squared_error: 0.1522 - mean_absolute_error: 0.3003 - val_loss: 0.1368 - val_mean_squared_error: 0.1368 - val_mean_absolute_error: 0.2714 Epoch 2/150 50/577 [=>............................] 50/577 [=>............................] - ETA: 0s - loss: 0.1506 - mean_squared577/577 [==============================]577/577 [==============================] - 0s 56us/step - loss: 0.1153 - mean_squared_error: 0.1153 - mean_absolute_error: 0.2524 - val_loss: 0.1027 - val_mean_squared_error: 0.1027 - val_mean_absolute_error: 0.2341 Epoch 3/150 50/577 [=>............................] 50/577 [=>............................] - ETA: 0s - loss: 0.0733 - mean_squared577/577 [==============================]577/577 [==============================] - 0s 56us/step - loss: 0.0843 - mean_squared_error: 0.0843 - mean_absolute_error: 0.2183 - val_loss: 0.0770 - val_mean_squared_error: 0.0770 - val_mean_absolute_error: 0.2095 Epoch 4/150 50/577 [=>............................] 50/577 [=>............................] - ETA: 0s - loss: 0.0577 - mean_squared577/577 [==============================]577/577 [==============================] - 0s 57us/step - loss: 0.0626 - mean_squared_error: 0.0626 - mean_absolute_error: 0.1952 - val_loss: 0.0583 - val_mean_squared_error: 0.0583 - val_mean_absolute_error: 0.1935 Epoch 5/150 50/577 [=>............................] 50/577 [=>............................] - ETA: 0s - loss: 0.0498 - mean_squared577/577 [==============================]577/577 [==============================] - 0s 58us/step - loss: 0.0475 - mean_squared_error: 0.0475 - mean_absolute_error: 0.1774 - val_loss: 0.0454 - val_mean_squared_error: 0.0454 - val_mean_absolute_error: 0.1798 ... Epoch 145/150 50/577 [=>............................] 50/577 [=>............................] - ETA: 0s - loss: 0.0138 - mean_squared577/577 [==============================]577/577 [==============================] - 0s 64us/step - loss: 0.0154 - mean_squared_error: 0.0154 - mean_absolute_error: 0.0902 - val_loss: 0.0161 - val_mean_squared_error: 0.0161 - val_mean_absolute_error: 0.0932 Epoch 146/150 50/577 [=>............................] 50/577 [=>............................] - ETA: 0s - loss: 0.0119 - mean_squared577/577 [==============================]577/577 [==============================] - 0s 61us/step - loss: 0.0154 - mean_squared_error: 0.0154 - mean_absolute_error: 0.0903 - val_loss: 0.0162 - val_mean_squared_error: 0.0162 - val_mean_absolute_error: 0.0936 Epoch 147/150 50/577 [=>............................] 50/577 [=>............................] - ETA: 0s - loss: 0.0161 - mean_squared577/577 [==============================]577/577 [==============================] - 0s 61us/step - loss: 0.0155 - mean_squared_error: 0.0155 - mean_absolute_error: 0.0913 - val_loss: 0.0161 - val_mean_squared_error: 0.0161 - val_mean_absolute_error: 0.0939 Epoch 148/150 50/577 [=>............................] 50/577 [=>............................] - ETA: 0s - loss: 0.0222 - mean_squared577/577 [==============================]577/577 [==============================] - 0s 63us/step - loss: 0.0153 - mean_squared_error: 0.0153 - mean_absolute_error: 0.0900 - val_loss: 0.0164 - val_mean_squared_error: 0.0164 - val_mean_absolute_error: 0.0934 Epoch 149/150 50/577 [=>............................] 50/577 [=>............................] - ETA: 0s - loss: 0.0147 - mean_squared577/577 [==============================]577/577 [==============================] - 0s 64us/step - loss: 0.0153 - mean_squared_error: 0.0153 - mean_absolute_error: 0.0897 - val_loss: 0.0161 - val_mean_squared_error: 0.0161 - val_mean_absolute_error: 0.0935 Epoch 150/150 50/577 [=>............................] 50/577 [=>............................] - ETA: 0s - loss: 0.0152 - mean_squared577/577 [==============================]577/577 [==============================] - 0s 59us/step - loss: 0.0153 - mean_squared_error: 0.0153 - mean_absolute_error: 0.0901 - val_loss: 0.0162 - val_mean_squared_error: 0.0162 - val_mean_absolute_error: 0.0934

Here, we can see that keras is calculating both the **training loss** and **validation loss**, i.e. the deviation between the predicted y and actual y as measured by the mean squared error.

As you can see, we have specified 150 epochs for our model. This means that we are essentially training our model over 150 **forward** and **backward** passes, with the expectation that our loss will decrease with each epoch, meaning that our model is predicting the value of y more accurately as we continue to train the model.

Let’s see what this looks like when we plot our respective losses:

print(history.history.keys()) # "Loss" plt.plot(history.history['loss']) plt.plot(history.history['val_loss']) plt.title('model loss') plt.ylabel('loss') plt.xlabel('epoch') plt.legend(['train', 'validation'], loc='upper left') plt.show()

Both the training and validation loss decrease in an exponential fashion as the number of epochs is increased, suggesting that the model gains a high degree of accuracy as our epochs (or number of forward and backward passes) is increased.

## Predictions

So, we’ve seen how we can train a neural network model, and then validate our training data against our test data in order to determine the accuracy of our model.

However, what if we now wish to use the model to estimate unseen data?

Let’s take the following array as an example:

Xnew = np.array([[40, 0, 26, 9000, 8000]])

Using this data, let’s plug in the new values to see what our calculated figure for car sales would be. As in the previous example, the values are scaled with MaxMinScaler and then converted back to the original format.

>>> xscalenew array([[-0.01563243, -0.01699178, -0.0161082 , 0.28886019, 0.25487664]]) >>> yscalenew array([[0.4066297]], dtype=float32) >>> yoriginal=scaler.inverse_transform(yscalenew) >>> yoriginal array([[12465.485]], dtype=float32)

The prediction is as follows:

>>> print("X=%s, Predicted=%s" % (Xnew[0], yoriginal[0])) X=[ 40 0 26 9000 8000], Predicted=[12465.485]

## Conclusion

In this tutorial, you have learned how to:

- Construct neural networks with Keras
- Scale data appropriately with MinMaxScaler
- Calculate training and test losses
- Make predictions using the neural network model

Many thanks for your time, and please feel free to leave any questions you have in the comments below.

How do you then use the model to make a prediction on an unseen piece of data…

The data is being partitioned into training and test to see how well the model could potentially perform on unseen data. That said, if we are looking to prevent the possibility of overfitting, then k-fold cross validation could be a solution here.