Keras: Regression-based neural networks

Keras is an API used for running high-level neural networks. The model runs on top of TensorFlow, and was developed by Google.

The main competitor to Keras at this point in time is PyTorch, developed by Facebook. While PyTorch has a somewhat higher level of community support, it is a particularly verbose language and I personally prefer Keras for greater simplicity and ease of use in building and deploying models.

In this particular example, a neural network will be built in Keras to solve a regression problem, i.e. one where our dependent variable (y) is in interval format and we are trying to predict the quantity of y with as much accuracy as possible.

What Is A Neural Network?

A neural network is a computational system that creates predictions based on existing data. Let us train and test a neural network using the neuralnet library in R.

A neural network consists of:

  • Input layers: Layers that take inputs based on existing data
  • Hidden layers: Layers that use backpropagation to optimise the weights of the input variables in order to improve the predictive power of the model
  • Output layers: Output of predictions based on the data from the input and hidden layers

neuralnet

Our Example

For this example, we use a linear activation function within the keras library to create a regression-based neural network. We will use the cars dataset. Essentially, we are trying to predict the value of a potential car sale (i.e. how much a particular person will spend on buying a car) for a customer based on the following attributes:

  • Age
  • Gender
  • Average miles driven per day
  • Personal debt
  • Monthly income

Firstly, we import our libraries. Note that you will need TensorFlow installed on your system to be able to execute the below code. Depending on your operating system, you can find one of my YouTube tutorials on how to install on Windows 10 here.

Libraries

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense
from tensorflow.python.keras.wrappers.scikit_learn import KerasRegressor

Set Directory

import os;
path="C:/yourdirectory"
os.chdir(path)
os.getcwd()

Since we are implementing a neural network, the variables need to be normalized in order for the neural network to interpret them properly. Therefore, our variables are transformed using the MaxMinScaler():

#Variables
dataset=np.loadtxt("cars.csv", delimiter=",")
x=dataset[:,0:5]
y=dataset[:,5]
y=np.reshape(y, (-1,1))
scaler = MinMaxScaler()
print(scaler.fit(x))
print(scaler.fit(y))
xscale=scaler.transform(x)
yscale=scaler.transform(y)

The data is then split into training and test data:

X_train, X_test, y_train, y_test = train_test_split(xscale, yscale)

Keras Model Configuration: Neural Network API

Now, we train the neural network. We are using the five input variables (age, gender, miles, debt, and income), along with two hidden layers of 12 and 8 neurons respectively, and finally using the linear activation function to process the output.

model = Sequential()
model.add(Dense(12, input_dim=5, kernel_initializer='normal', activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='linear'))
model.summary()

The mean_squared_error (mse) and mean_absolute_error (mae) are our loss functions – i.e. an estimate of how accurate the neural network is in predicting the test data. We can see that with the validation_split set to 0.2, 80% of the training data is used to test the model, while the remaining 20% is used for testing purposes.

model.compile(loss='mse', optimizer='adam', metrics=['mse','mae'])

From the output, we can see that the more epochs are run, the lower our MSE and MAE become, indicating improvement in accuracy across each iteration of our model.

Neural Network Output

Let’s now fit our model.

history = model.fit(X_train, y_train, epochs=150, batch_size=50,  verbose=1, validation_split=0.2)
>>> history = model.fit(X_train, y_train, epochs=150, batch_size=50,  verbose=1, validation_split=0.2)
Train on 540 samples, validate on 135 samples
Train on 577 samples, validate on 145 samples
Epoch 1/150
2018-03-24 19:31:05.078618: I C:\tf_jenkins\workspace\rel-win\M\windows\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
 50/577 [=>............................] 50/577 [=>............................] - ETA: 6s - loss: 0.1718 - mean_squared577/577 [==============================]577/577 [==============================] - 1s 1ms/step - loss: 0.1522 - mean_squared_error: 0.1522 - mean_absolute_error: 0.3003 - val_loss: 0.1368 - val_mean_squared_error: 0.1368 - val_mean_absolute_error: 0.2714

Epoch 2/150
 50/577 [=>............................] 50/577 [=>............................] - ETA: 0s - loss: 0.1506 - mean_squared577/577 [==============================]577/577 [==============================] - 0s 56us/step - loss: 0.1153 - mean_squared_error: 0.1153 - mean_absolute_error: 0.2524 - val_loss: 0.1027 - val_mean_squared_error: 0.1027 - val_mean_absolute_error: 0.2341

Epoch 3/150
 50/577 [=>............................] 50/577 [=>............................] - ETA: 0s - loss: 0.0733 - mean_squared577/577 [==============================]577/577 [==============================] - 0s 56us/step - loss: 0.0843 - mean_squared_error: 0.0843 - mean_absolute_error: 0.2183 - val_loss: 0.0770 - val_mean_squared_error: 0.0770 - val_mean_absolute_error: 0.2095

Epoch 4/150
 50/577 [=>............................] 50/577 [=>............................] - ETA: 0s - loss: 0.0577 - mean_squared577/577 [==============================]577/577 [==============================] - 0s 57us/step - loss: 0.0626 - mean_squared_error: 0.0626 - mean_absolute_error: 0.1952 - val_loss: 0.0583 - val_mean_squared_error: 0.0583 - val_mean_absolute_error: 0.1935

Epoch 5/150
 50/577 [=>............................] 50/577 [=>............................] - ETA: 0s - loss: 0.0498 - mean_squared577/577 [==============================]577/577 [==============================] - 0s 58us/step - loss: 0.0475 - mean_squared_error: 0.0475 - mean_absolute_error: 0.1774 - val_loss: 0.0454 - val_mean_squared_error: 0.0454 - val_mean_absolute_error: 0.1798
...
Epoch 145/150
 50/577 [=>............................] 50/577 [=>............................] - ETA: 0s - loss: 0.0138 - mean_squared577/577 [==============================]577/577 [==============================] - 0s 64us/step - loss: 0.0154 - mean_squared_error: 0.0154 - mean_absolute_error: 0.0902 - val_loss: 0.0161 - val_mean_squared_error: 0.0161 - val_mean_absolute_error: 0.0932

Epoch 146/150
 50/577 [=>............................] 50/577 [=>............................] - ETA: 0s - loss: 0.0119 - mean_squared577/577 [==============================]577/577 [==============================] - 0s 61us/step - loss: 0.0154 - mean_squared_error: 0.0154 - mean_absolute_error: 0.0903 - val_loss: 0.0162 - val_mean_squared_error: 0.0162 - val_mean_absolute_error: 0.0936

Epoch 147/150
 50/577 [=>............................] 50/577 [=>............................] - ETA: 0s - loss: 0.0161 - mean_squared577/577 [==============================]577/577 [==============================] - 0s 61us/step - loss: 0.0155 - mean_squared_error: 0.0155 - mean_absolute_error: 0.0913 - val_loss: 0.0161 - val_mean_squared_error: 0.0161 - val_mean_absolute_error: 0.0939

Epoch 148/150
 50/577 [=>............................] 50/577 [=>............................] - ETA: 0s - loss: 0.0222 - mean_squared577/577 [==============================]577/577 [==============================] - 0s 63us/step - loss: 0.0153 - mean_squared_error: 0.0153 - mean_absolute_error: 0.0900 - val_loss: 0.0164 - val_mean_squared_error: 0.0164 - val_mean_absolute_error: 0.0934

Epoch 149/150
 50/577 [=>............................] 50/577 [=>............................] - ETA: 0s - loss: 0.0147 - mean_squared577/577 [==============================]577/577 [==============================] - 0s 64us/step - loss: 0.0153 - mean_squared_error: 0.0153 - mean_absolute_error: 0.0897 - val_loss: 0.0161 - val_mean_squared_error: 0.0161 - val_mean_absolute_error: 0.0935

Epoch 150/150
 50/577 [=>............................] 50/577 [=>............................] - ETA: 0s - loss: 0.0152 - mean_squared577/577 [==============================]577/577 [==============================] - 0s 59us/step - loss: 0.0153 - mean_squared_error: 0.0153 - mean_absolute_error: 0.0901 - val_loss: 0.0162 - val_mean_squared_error: 0.0162 - val_mean_absolute_error: 0.0934

Here, we can see that keras is calculating both the training loss and validation loss, i.e. the deviation between the predicted y and actual y as measured by the mean squared error.

As you can see, we have specified 150 epochs for our model. This means that we are essentially training our model over 150 forward and backward passes, with the expectation that our loss will decrease with each epoch, meaning that our model is predicting the value of y more accurately as we continue to train the model.

Let’s see what this looks like when we plot our respective losses:

print(history.history.keys())
# "Loss"
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()

keras train and validation

Both the training and validation loss decrease in an exponential fashion as the number of epochs is increased, suggesting that the model gains a high degree of accuracy as our epochs (or number of forward and backward passes) is increased.

Predictions

So, we’ve seen how we can train a neural network model, and then validate our training data against our test data in order to determine the accuracy of our model.

However, what if we now wish to use the model to estimate unseen data?

Let’s take the following array as an example:

Xnew = np.array([[40, 0, 26, 9000, 8000]])

Using this data, let’s plug in the new values to see what our calculated figure for car sales would be:

Xnew = np.array([[40, 0, 26, 9000, 8000]])
ynew=model.predict(Xnew)
print("X=%s, Predicted=%s" % (Xnew[0], ynew[0]))

The prediction is as follows:

X=[  40    0   26 9000 8000], Predicted=[2653.6077]

Conclusion

In this tutorial, you have learned how to:

  • Construct neural networks with Keras
  • Scale data appropriately with MinMaxScaler
  • Calculate training and test losses
  • Make predictions using the neural network model

Many thanks for your time, and please feel free to leave any questions you have in the comments below.

2 comments

    1. The data is being partitioned into training and test to see how well the model could potentially perform on unseen data. That said, if we are looking to prevent the possibility of overfitting, then k-fold cross validation could be a solution here.

Leave a Reply

Your email address will not be published. Required fields are marked *

twelve − 6 =