neuralnet: Train and Test Neural Networks Using R

neuralplot

A neural network is a computational system frequently employed in machine learning to create predictions based on existing data. In this example, we will use the neuralnet package in R to train and test a neural network model.

A typical neural network consists of:

  • Input layers: Layers that take inputs based on existing data
  • Hidden layers: Layers that use backpropagation to optimise the weights of the input variables in order to improve the predictive power of the model
  • Output layers: Output of predictions based on the data from the input and hidden layers

Model Background

The below neural network is based on Milica Stojković’s example of using such a model to classify wineries by certain attributes (the actual dataset can be found here):

  1. Alcohol
  2. Malic Acid
  3. Ash
  4. Ash Alcalinity
  5. Magnesium
  6. Total Phenols
  7. Flavanoids
  8. Nonflavanoid Phenols
  9. Proanthocyanins
  10. Color Intensity
  11. Hue
  12. OD280/OD315 of dedulted wines
  13. Proline

Data Normalization

One of the most important procedures when forming a neural network is normalization of the data, i.e. adjusting the data to a common scale so that the predicted values of the neural network can be accurately compared to that of the actual data. Failure to normalize the data will typically result in the prediction value remaining the same across all observations regardless of the input.

We can choose to do this in two ways in R:

  • Scale the data frame automatically using the scale function in R
  • Transform the data using what is called a max-min normalization technique

For the purposes of this tutorial, both are run but we choose to use the max-min normalization technique.

#Scaled Normalization
scaleddata<-scale(mydata)

#Max-Min Normalization
normalize <- function(x) {
  return ((x - min(x)) / (max(x) - min(x)))
}

maxmindf <- as.data.frame(lapply(mydata, normalize))

Please see this link for further details on how to use the normalization function.

Our new dataset is now scaled and saved into a data frame titled maxmindf:

maxmindf

Our training data (trainset) is based on the first 142 observations, and the test data (testset) is based on the remainder of observations.

#Training and Test Data
trainset <- maxmindf[1:142, ]
testset <- maxmindf[143:178, ]

While normally a training dataset would account for 80% of the observations and the test dataset accounting for the remaining 20%, it is also important to ensure a representative sample across the two, and given the number of observations we choose to use a roughly 55%/45% split here.

Training a Neural Network Model using neuralnet

To set up the neural network, the neuralnet library is loaded into R and from the code below, observe that we are:

  • Using neuralnet to "regress" the dependent wineryscaled variable against the other independent variables
  • Setting the number of hidden layers to (2,1) based on the hidden=(2,1) formula
  • The linear.output variable is set to FALSE, given the impact of the independent variables on the dependent variable (sales) is assumed to be non-linear

Note that the process of deciding on the number of hidden layers in a neural network is not an exact science. As a matter of fact, there are instances where accuracy will likely be higher without any hidden layers. Therefore, trial and error plays a significant role in choosing the number of hidden layers.

One possibility is to compare how the accuracy of the predictions change as the number of hidden layers are modified, resulting in a more systematic means of accurately choosing the number of hidden layers.

However, I found that using a (2,1) configuration ultimately yielded 72% classification accuracy, which I deem reasonable given 142 observations in our training data and decided to go with this.

#Neural Network
library(neuralnet)
nn <- neuralnet(winery ~ alcohol + malic + ash + ash_alcalinity + magnesium + phenols + flavanoids + nonflavanoids + proanthocyanins + color_intensity + hue + od280 + proline, data=trainset, hidden=c(2,1), linear.output=FALSE, threshold=0.01)
nn$result.matrix
plot(nn)

Our neural network looks like this:

neuralnet

Moreover, the error of the neural network model as well as the weights between the inputs, hidden layers, and outputs are also printed:

> nn$result.matrix
                                            1
error                          0.052103509186
reached.threshold              0.007918868535
steps                       1410.000000000000
Intercept.to.1layhid1         -1.083867861177
alcohol.to.1layhid1            0.058782998726
malic.to.1layhid1             -1.389912517542
ash.to.1layhid1               -0.551614831886
ash_alcalinity.to.1layhid1    -2.118158540493
magnesium.to.1layhid1         -0.648224033756
phenols.to.1layhid1            0.718874765455
flavanoids.to.1layhid1         6.122425327758
nonflavanoids.to.1layhid1      0.373874042298
proanthocyanins.to.1layhid1    8.155770325494
color_intensity.to.1layhid1   -5.466829108812
hue.to.1layhid1                4.482418679851
od280.to.1layhid1              2.784020635719
proline.to.1layhid1           -4.811269297043
Intercept.to.1layhid2          0.205810529136
alcohol.to.1layhid2           -4.411136753329
malic.to.1layhid2             -1.689521058295
ash.to.1layhid2               -3.857094752264
ash_alcalinity.to.1layhid2     4.268766155835
magnesium.to.1layhid2         -1.217728932841
phenols.to.1layhid2            1.769719578190
flavanoids.to.1layhid2         1.020673251351
nonflavanoids.to.1layhid2      0.280279450037
proanthocyanins.to.1layhid2    7.536122244890
color_intensity.to.1layhid2   -2.958468172972
hue.to.1layhid2                5.646866707125
od280.to.1layhid2              1.698191285380
proline.to.1layhid2          -10.388711874839
Intercept.to.2layhid1          1.570686622007
1layhid.1.to.2layhid1         -4.366569866818
1layhid.2.to.2layhid1          2.574428685720
Intercept.to.winery           -5.406306976366
2layhid.1.to.winery           12.252214456773

Testing The Accuracy Of The Model

As already mentioned, our neural network has been created on the training data, which is then compared to the test data to gauge the accuracy of the neural network forecast. In the below:

  1. The "subset" function is used to eliminate the dependent variable from the test data
  2. The "compute" function then creates the prediction variable
  3. A "results" variable then compares the predicted data with the actual data
  4. A confusion matrix is then created with the table function to compare the number of true/false positives and negatives
#Test the resulting output
temp_test <- subset(testset, select = c("alcohol", "malic", "ash", "ash_alcalinity", "magnesium", "phenols", "flavanoids", "nonflavanoids", "proanthocyanins", "color_intensity", "hue", "od280", "proline"))
head(temp_test)
nn.results <- compute(nn, temp_test)

#Accuracy
results <- data.frame(actual = testset$winery, prediction = nn.results$net.result)
results
roundedresults<-sapply(results,round,digits=0)
roundedresultsdf=data.frame(roundedresults)
attach(roundedresultsdf)
table(actual,prediction)

The predicted results are compared to the actual results:

> results
    actual     prediction
143    0.5 0.543043044188
144    1.0 0.989204918591
145    1.0 0.991078885094
146    0.5 0.509822224181
147    0.5 0.522207564576
148    0.0 0.009985006935
149    1.0 0.991632930913
150    0.5 0.509800468823
151    0.0 0.012648538383
152    0.0 0.015505457199
153    0.0 0.009193526072
154    0.5 0.942799042540
155    0.5 0.515707638536
156    0.0 0.011294167416
157    0.5 0.074497852675
158    0.0 0.060706499836
159    0.0 0.010810687248
160    1.0 0.992823880777
161    1.0 0.990914522360
162    0.0 0.009788727793
163    1.0 0.987136899504
164    0.5 0.511240323832
165    0.5 0.511415717606
166    1.0 0.991329204858
167    0.0 0.040412959404
168    1.0 0.986658393972
169    0.0 0.089556958227
170    0.5 0.506644026733
171    0.5 0.201560122367
172    1.0 0.988175398882
173    0.5 0.502025119248
174    1.0 0.988233917400
175    0.0 0.010478332645
176    0.0 0.010808559391
177    0.0 0.014391386521
178    0.5 0.384616587455

Confusion Matrix

Then, a confusion matrix is created to compare the number of true/false positives and negatives:

> table(actual,prediction)
      prediction
actual  0  1
     0 16 10
     1  0 10

We see that the number of the number of true negatives is 16 (44% of the total), the number of true positives is 10 (28% of the total), while there are 10 false positives (28%).

Ultimately, the confusion matrix indicates that our neural network model has roughly 72% accuracy.