Keras with R: Predicting car sales

loss keras

Keras is an API used for running high-level neural networks. The model runs on top of TensorFlow, and was developed by Google.

In this particular example, a neural network will be built in Keras to solve a regression problem, i.e. one where our dependent variable (y) is in interval format and we are trying to predict the quantity of y with as much accuracy as possible.

Previously, I demonstrated how such a neural network can be devised with Python. However, it just so happens that Keras also has an R interface as well, and therefore it is possible to conduct this analysis through R.

What Is A Neural Network?

A neural network is a computational system that creates predictions based on existing data. Let us train and test a neural network using the neuralnet library in R.

A neural network consists of:

  • Input layers: Layers that take inputs based on existing data
  • Hidden layers: Layers that use backpropagation to optimise the weights of the input variables in order to improve the predictive power of the model
  • Output layers: Output of predictions based on the data from the input and hidden layers

neuralnet

Our Example

For this example, we use a linear activation function within the keras library to create a regression-based neural network. We will use the cars dataset. Essentially, we are trying to predict the value of a potential car sale (i.e. how much a particular person will spend on buying a car) for a customer based on the following attributes:

  • Age
  • Gender
  • Average miles driven per day
  • Personal debt
  • Monthly income

Firstly, we import our libraries and set the directory.

Libraries and Set Directory

library(keras)

setwd("/home/michaeljgrogan/Documents/a_documents/computing/data science/datasets")

cars<-read.csv("cars.csv")

Since we are implementing a neural network, the variables need to be normalized in order for the neural network to interpret them properly. Therefore, our variables are transformed using max-min normalization:

#Max-Min Normalization
normalize <- function(x) {
  return ((x - min(x)) / (max(x) - min(x)))
}

maxmindf <- as.data.frame(lapply(cars, normalize))
attach(maxmindf)

The data is then split into training and test data:

# Random sample indexes
train_index <- sample(1:nrow(maxmindf), 0.8 * nrow(maxmindf))
test_index <- setdiff(1:nrow(maxmindf), train_index)

# Build X_train, y_train, X_test, y_test
X_train <- as.matrix(maxmindf[train_index, -15])
y_train <- as.matrix(maxmindf[train_index, "sales"])

X_test <- as.matrix(maxmindf[test_index, -15])
y_test <- as.matrix(maxmindf[test_index, "sales"])

Keras Model Configuration: Neural Network API

Now, we train the neural network. We are using the five input variables (age, gender, miles, debt, and income), along with two hidden layers of 12 and 8 neurons respectively, and finally using the linear activation function to process the output.

model <- keras_model_sequential() 

model %>% 
  layer_dense(units = 12, activation = 'relu', kernel_initializer='RandomNormal', input_shape = c(6)) %>% 
  layer_dense(units = 8, activation = 'relu') %>%
  layer_dense(units = 1, activation = 'linear')

summary(model)

The mean_squared_error (mse) and mean_absolute_error (mae) are our loss functions – i.e. an estimate of how accurate the neural network is in predicting the test data. We can see that with the validation_split set to 0.2, 80% of the training data is used to test the model, while the remaining 20% is used for testing purposes.

model %>% compile(
  loss = 'mean_squared_error',
  optimizer = 'adam',
  metrics = c('mae')
)

Neural Network Output

Let's now fit our model.

history <- model %>% fit(
  X_train, y_train, 
  epochs = 150, batch_size = 50, 
  validation_split = 0.2
)

Here is the output.

Train on 616 samples, validate on 154 samples
Epoch 1/150
2018-12-15 20:37:30.173521: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
616/616 [==============================] - 1s 2ms/step - loss: 0.1673 - mean_absolute_error: 0.3181 - val_loss: 0.1276 - val_mean_absolute_error: 0.2660
Epoch 2/150
616/616 [==============================] - 0s 82us/step - loss: 0.1014 - mean_absolute_error: 0.2360 - val_loss: 0.0786 - val_mean_absolute_error: 0.2119
Epoch 3/150
616/616 [==============================] - 0s 100us/step - loss: 0.0607 - mean_absolute_error: 0.1902 - val_loss: 0.0486 - val_mean_absolute_error: 0.1812
Epoch 4/150
616/616 [==============================] - 0s 94us/step - loss: 0.0396 - mean_absolute_error: 0.1698 - val_loss: 0.0350 - val_mean_absolute_error: 0.1653
Epoch 5/150
616/616 [==============================] - 0s 101us/step - loss: 0.0306 - mean_absolute_error: 0.1546 - val_loss: 0.0286 - val_mean_absolute_error: 0.1516
Epoch 6/150
616/616 [==============================] - 0s 82us/step - loss: 0.0250 - mean_absolute_error: 0.1402 - val_loss: 0.0231 - val_mean_absolute_error: 0.1355
......
Epoch 145/150
616/616 [==============================] - 0s 91us/step - loss: 2.5544e-07 - mean_absolute_error: 1.8456e-04 - val_loss: 5.7741e-06 - val_mean_absolute_error: 5.6603e-04
Epoch 146/150
616/616 [==============================] - 0s 101us/step - loss: 2.3567e-07 - mean_absolute_error: 1.6128e-04 - val_loss: 5.7590e-06 - val_mean_absolute_error: 5.4604e-04
Epoch 147/150
616/616 [==============================] - 0s 83us/step - loss: 2.2469e-07 - mean_absolute_error: 1.2962e-04 - val_loss: 5.7538e-06 - val_mean_absolute_error: 5.2605e-04
Epoch 148/150
616/616 [==============================] - 0s 94us/step - loss: 2.1616e-07 - mean_absolute_error: 1.2723e-04 - val_loss: 5.6361e-06 - val_mean_absolute_error: 5.6658e-04
Epoch 149/150
616/616 [==============================] - 0s 106us/step - loss: 2.1198e-07 - mean_absolute_error: 1.5137e-04 - val_loss: 5.5534e-06 - val_mean_absolute_error: 5.4657e-04
Epoch 150/150
616/616 [==============================] - 0s 80us/step - loss: 2.1450e-07 - mean_absolute_error: 1.8006e-04 - val_loss: 5.6074e-06 - val_mean_absolute_error: 5.6903e-04

Now, we can evaluate the loss (mean squared error) as well as mean absolute error.

> model %>% evaluate(X_test, y_test)

193/193 [==============================] - 0s 23us/step
$loss
[1] 9.17802e-07

$mean_absolute_error
[1] 0.0002379419

Here is the configuration of our model:

> model
Model
__________________________________________________________________
Layer (type)                 Output Shape               Param #   
==================================================================
dense_1 (Dense)              (None, 12)                 84        
__________________________________________________________________
dense_2 (Dense)              (None, 8)                  104       
__________________________________________________________________
dense_3 (Dense)              (None, 1)                  9         
==================================================================
Total params: 197
Trainable params: 197
Non-trainable params: 0
__________________________________________________________________

Here, we can see that keras is calculating both the training loss and validation loss, i.e. the deviation between the predicted y and actual y as measured by the mean squared error.

As you can see, we have specified 150 epochs for our model. This means that we are essentially training our model over 150 forward and backward passes, with the expectation that our loss will decrease with each epoch, meaning that our model is predicting the value of y more accurately as we continue to train the model.

Let's see what this looks like when we plot our respective losses:

loss keras

Both the training and validation loss decrease in an exponential fashion as the number of epochs is increased, suggesting that the model gains a high degree of accuracy as our epochs (or number of forward and backward passes) is increased.

Performance

So, we've seen how we can train a neural network model, and then validate our training data against our test data in order to determine the accuracy of our model.

Unlike when working with classification data, an accuracy reading does not particularly help us in this instance.

The predicted car sales may be within 1% of actual car sales, but unless the predicted value is exact, then it will not be regarded as accurate.

Therefore, we should redefine accuracy as the degree of deviation between the predicted and actual values.

Let's take a look at how we can calculate this:

> pred <- data.frame(y = predict(model, as.matrix(X_test)))
> df<-data.frame(pred,X_test)
> attach(df)
The following objects are masked from maxmindf:

    age, debt, gender, income, miles, sales

> deviation=((pred-sales)/sales)
> mean(deviation$y)*100
[1] 0.02819805

We see that the mean deviation between predicted and actual car sales stands at 2.819%.

Conclusion

In this tutorial, you have learned how to:

  • Construct neural networks with Keras
  • Scale data appropriately with MinMaxScaler
  • Calculate training and test losses
  • Make predictions using the neural network model

Many thanks for your time, and please feel free to leave any questions you have in the comments below.

Leave a Reply

Your email address will not be published. Required fields are marked *

one × two =