**Keras** is an API used for running high-level neural networks. The model runs on top of TensorFlow, and was developed by Google.

In this particular example, a neural network will be built in Keras to solve a regression problem, i.e. one where our dependent variable (y) is in interval format and we are trying to predict the quantity of y with as much accuracy as possible.

Previously, I demonstrated how such a neural network can be devised with Python. However, it just so happens that Keras also has an R interface as well, and therefore it is possible to conduct this analysis through R.

## What Is A Neural Network?

A neural network is a computational system that creates predictions based on existing data. Let us train and test a neural network using the *neuralnet* library in R.

A neural network consists of:

**Input layers:**Layers that take inputs based on existing data**Hidden layers:**Layers that use backpropagation to optimise the weights of the input variables in order to improve the predictive power of the model**Output layers:**Output of predictions based on the data from the input and hidden layers

## Our Example

For this example, we use a **linear activation function** within the **keras** library to create a regression-based neural network. We will use the cars dataset. Essentially, we are trying to predict the value of a potential car sale (i.e. how much a particular person will spend on buying a car) for a customer based on the following attributes:

- Age
- Gender
- Average miles driven per day
- Personal debt
- Monthly income

Firstly, we import our libraries and set the directory.

## Libraries and Set Directory

library(keras) setwd("/home/michaeljgrogan/Documents/a_documents/computing/data science/datasets") cars<-read.csv("cars.csv")

Since we are implementing a neural network, the variables need to be normalized in order for the neural network to interpret them properly. Therefore, our variables are transformed using **max-min normalization**:

#Max-Min Normalization normalize <- function(x) { return ((x - min(x)) / (max(x) - min(x))) } maxmindf <- as.data.frame(lapply(cars, normalize)) attach(maxmindf)

The data is then split into training and test data:

# Random sample indexes train_index <- sample(1:nrow(maxmindf), 0.8 * nrow(maxmindf)) test_index <- setdiff(1:nrow(maxmindf), train_index) # Build X_train, y_train, X_test, y_test X_train <- as.matrix(maxmindf[train_index, -15]) y_train <- as.matrix(maxmindf[train_index, "sales"]) X_test <- as.matrix(maxmindf[test_index, -15]) y_test <- as.matrix(maxmindf[test_index, "sales"])

## Keras Model Configuration: Neural Network API

Now, we train the neural network. We are using the five **input variables** (age, gender, miles, debt, and income), along with **two hidden layers** of **12** and **8** neurons respectively, and finally using the **linear activation function** to process the output.

model <- keras_model_sequential() model %>% layer_dense(units = 12, activation = 'relu', kernel_initializer='RandomNormal', input_shape = c(6)) %>% layer_dense(units = 8, activation = 'relu') %>% layer_dense(units = 1, activation = 'linear') summary(model)

The **mean_squared_error (mse)** and **mean_absolute_error (mae)** are our loss functions – i.e. an estimate of how accurate the neural network is in predicting the test data. We can see that with the validation_split set to 0.2, 80% of the training data is used to test the model, while the remaining 20% is used for testing purposes.

model %>% compile( loss = 'mean_squared_error', optimizer = 'adam', metrics = c('mae') )

## Neural Network Output

Let's now fit our model.

history <- model %>% fit( X_train, y_train, epochs = 150, batch_size = 50, validation_split = 0.2 )

Here is the output.

Train on 616 samples, validate on 154 samples Epoch 1/150 2018-12-15 20:37:30.173521: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 616/616 [==============================] - 1s 2ms/step - loss: 0.1673 - mean_absolute_error: 0.3181 - val_loss: 0.1276 - val_mean_absolute_error: 0.2660 Epoch 2/150 616/616 [==============================] - 0s 82us/step - loss: 0.1014 - mean_absolute_error: 0.2360 - val_loss: 0.0786 - val_mean_absolute_error: 0.2119 Epoch 3/150 616/616 [==============================] - 0s 100us/step - loss: 0.0607 - mean_absolute_error: 0.1902 - val_loss: 0.0486 - val_mean_absolute_error: 0.1812 Epoch 4/150 616/616 [==============================] - 0s 94us/step - loss: 0.0396 - mean_absolute_error: 0.1698 - val_loss: 0.0350 - val_mean_absolute_error: 0.1653 Epoch 5/150 616/616 [==============================] - 0s 101us/step - loss: 0.0306 - mean_absolute_error: 0.1546 - val_loss: 0.0286 - val_mean_absolute_error: 0.1516 Epoch 6/150 616/616 [==============================] - 0s 82us/step - loss: 0.0250 - mean_absolute_error: 0.1402 - val_loss: 0.0231 - val_mean_absolute_error: 0.1355 ...... Epoch 145/150 616/616 [==============================] - 0s 91us/step - loss: 2.5544e-07 - mean_absolute_error: 1.8456e-04 - val_loss: 5.7741e-06 - val_mean_absolute_error: 5.6603e-04 Epoch 146/150 616/616 [==============================] - 0s 101us/step - loss: 2.3567e-07 - mean_absolute_error: 1.6128e-04 - val_loss: 5.7590e-06 - val_mean_absolute_error: 5.4604e-04 Epoch 147/150 616/616 [==============================] - 0s 83us/step - loss: 2.2469e-07 - mean_absolute_error: 1.2962e-04 - val_loss: 5.7538e-06 - val_mean_absolute_error: 5.2605e-04 Epoch 148/150 616/616 [==============================] - 0s 94us/step - loss: 2.1616e-07 - mean_absolute_error: 1.2723e-04 - val_loss: 5.6361e-06 - val_mean_absolute_error: 5.6658e-04 Epoch 149/150 616/616 [==============================] - 0s 106us/step - loss: 2.1198e-07 - mean_absolute_error: 1.5137e-04 - val_loss: 5.5534e-06 - val_mean_absolute_error: 5.4657e-04 Epoch 150/150 616/616 [==============================] - 0s 80us/step - loss: 2.1450e-07 - mean_absolute_error: 1.8006e-04 - val_loss: 5.6074e-06 - val_mean_absolute_error: 5.6903e-04

Now, we can evaluate the loss (mean squared error) as well as mean absolute error.

> model %>% evaluate(X_test, y_test) 193/193 [==============================] - 0s 23us/step $loss [1] 9.17802e-07 $mean_absolute_error [1] 0.0002379419

Here is the configuration of our model:

> model Model __________________________________________________________________ Layer (type) Output Shape Param # ================================================================== dense_1 (Dense) (None, 12) 84 __________________________________________________________________ dense_2 (Dense) (None, 8) 104 __________________________________________________________________ dense_3 (Dense) (None, 1) 9 ================================================================== Total params: 197 Trainable params: 197 Non-trainable params: 0 __________________________________________________________________

Here, we can see that keras is calculating both the **training loss** and **validation loss**, i.e. the deviation between the predicted y and actual y as measured by the mean squared error.

As you can see, we have specified 150 epochs for our model. This means that we are essentially training our model over 150 **forward** and **backward** passes, with the expectation that our loss will decrease with each epoch, meaning that our model is predicting the value of y more accurately as we continue to train the model.

Let's see what this looks like when we plot our respective losses:

Both the training and validation loss decrease in an exponential fashion as the number of epochs is increased, suggesting that the model gains a high degree of accuracy as our epochs (or number of forward and backward passes) is increased.

## Performance

So, we've seen how we can train a neural network model, and then validate our training data against our test data in order to determine the accuracy of our model.

Unlike when working with classification data, an accuracy reading does not particularly help us in this instance.

The predicted car sales may be within 1% of actual car sales, but unless the predicted value is exact, then it will not be regarded as accurate.

Therefore, we should redefine accuracy as the degree of deviation between the predicted and actual values.

Let's take a look at how we can calculate this:

> pred <- data.frame(y = predict(model, as.matrix(X_test))) > df<-data.frame(pred,X_test) > attach(df) The following objects are masked from maxmindf: age, debt, gender, income, miles, sales > deviation=((pred-sales)/sales) > mean(deviation$y)*100 [1] 0.02819805

We see that the mean deviation between predicted and actual car sales stands at 2.819%.

## Conclusion

In this tutorial, you have learned how to:

- Construct neural networks with Keras
- Scale data appropriately with MinMaxScaler
- Calculate training and test losses
- Make predictions using the neural network model

Many thanks for your time, and please feel free to leave any questions you have in the comments below.

## Leave a Reply