neuralplot

neuralnet: Train and Test Neural Networks Using R

A neural network is a computational system frequently employed in machine learning to create predictions based on existing data. In this example, we will use the neuralnet package in R to train and test a neural network model.

A typical neural network consists of:

  • Input layers: Layers that take inputs based on existing data
  • Hidden layers: Layers that use backpropagation to optimise the weights of the input variables in order to improve the predictive power of the model
  • Output layers: Output of predictions based on the data from the input and hidden layers

Model Background

The below neural network is based on Milica Stojković’s example of using such a model to classify wineries by certain attributes (the actual dataset can be found here):

  1. Alcohol
  2. Malic Acid
  3. Ash
  4. Ash Alcalinity
  5. Magnesium
  6. Total Phenols
  7. Flavanoids
  8. Nonflavanoid Phenols
  9. Proanthocyanins
  10. Color Intensity
  11. Hue
  12. OD280/OD315 of dedulted wines
  13. Proline

winerydata

Data Normalization

One of the most important procedures when forming a neural network is normalization of the data, i.e. adjusting the data to a common scale so that the predicted values of the neural network can be accurately compared to that of the actual data. Failure to normalize the data will typically result in the prediction value remaining the same across all observations regardless of the input.

We can choose to do this in two ways in R:

  • Scale the data frame automatically using the scale function in R
  • Transform the data using what is called a max-min normalization technique

For the purposes of this tutorial, both are run but we choose to use the max-min normalization technique.

#Scaled Normalization
scaleddata<-scale(df)

#Max-Min Normalization
wineryscaled=(winery-min(winery))/(max(winery)-min(winery))
alcoholscaled=(alcohol-min(alcohol))/(max(alcohol)-min(alcohol))
malicscaled=(malic-min(malic))/(max(malic)-min(malic))
ashscaled=(ash-min(ash))/(max(ash)-min(ash))
ash_alcalinity_scaled=(ash_alcalinity-min(ash_alcalinity))/(max(ash_alcalinity)-min(ash_alcalinity))
magnesiumscaled=(magnesium-min(magnesium))/(max(magnesium)-min(magnesium))
phenolsscaled=(phenols-min(phenols))/(max(phenols)-min(phenols))
flavanoidsscaled=(flavanoids-min(flavanoids))/(max(flavanoids)-min(flavanoids))
nonflavanoidsscaled=(nonflavanoids-min(nonflavanoids))/(max(nonflavanoids)-min(nonflavanoids))
proanthocyaninsscaled=(proanthocyanins-min(proanthocyanins))/(max(proanthocyanins)-min(proanthocyanins))
color_intensity_scaled=(color_intensity-min(color_intensity))/(max(color_intensity)-min(color_intensity))
huescaled=(hue-min(hue))/(max(hue)-min(hue))
od280scaled=(od280-min(od280))/(max(od280)-min(od280))
prolinescaled=(proline-min(proline))/(max(proline)-min(proline))

Our new dataset is now scaled and saved into a data frame titled maxmindf:

maxmindf

Our training data (trainset) is based on the first 142 observations, and the test data (testset) is based on the remainder of observations.

#Training and Test Data
trainset <- maxmindata[1:142, ]
testset <- maxmindata[143:178, ]

While normally a training dataset would account for 80% of the observations and the test dataset accounting for the remaining 20%, it is also important to ensure a representative sample across the two, and given the number of observations we choose to use a roughly 55%/45% split here.

Training a Neural Network Model using neuralnet

To set up the neural network, the neuralnet library is loaded into R and from the code below, observe that we are:

  • Using neuralnet to "regress" the dependent wineryscaled variable against the other independent variables
  • Setting the number of hidden layers to 5 based on the hidden=5 formula
  • The linear.output variable is set to FALSE, given the impact of the independent variables on the dependent variable (sales) is assumed to be non-linear

Note that the process of deciding on the number of hidden layers in a neural network is not an exact science. As a matter of fact, there are instances where accuracy will likely be higher without any hidden layers. Therefore, trial and error plays a significant role in choosing the number of hidden layers.

One possibility is to compare how the accuracy of the predictions change as the number of hidden layers are modified, resulting in a more systematic means of accurately choosing the number of hidden layers.

However, while Stojković's example used 17 hidden neurons, I found that using a (5,5) configuration yielded an error below 0.01 and decided to go with this.

#Neural Network
library(neuralnet)
nn <- neuralnet(wineryscaled ~ alcoholscaled + malicscaled + ashscaled + ash_alcalinity_scaled + magnesiumscaled + phenolsscaled + flavanoidsscaled + nonflavanoidsscaled + proanthocyaninsscaled + color_intensity_scaled + huescaled + od280scaled + prolinescaled, data=trainset, hidden=c(5,5), linear.output=FALSE, threshold=0.01)
nn$result.matrix
plot(nn)

Our neural network looks like this:

neuralnet

Moreover, the error of the neural network model as well as the weights between the inputs, hidden layers, and outputs are also printed:

error                                0.007827069795
reached.threshold                    0.009032751558
steps                              321.000000000000
Intercept.to.1layhid1                0.855507900713
alcoholscaled.to.1layhid1           -2.924011442434
malicscaled.to.1layhid1             -2.456272117846
ashscaled.to.1layhid1                0.834721753266
ash_alcalinity_scaled.to.1layhid1   -1.072472322512
magnesiumscaled.to.1layhid1         -0.946694341396
phenolsscaled.to.1layhid1            1.251378884853
flavanoidsscaled.to.1layhid1         6.834714038272
nonflavanoidsscaled.to.1layhid1      0.704768409962
proanthocyaninsscaled.to.1layhid1    3.665757237050
color_intensity_scaled.to.1layhid1  -7.232860653667
huescaled.to.1layhid1                4.018310663933
od280scaled.to.1layhid1              3.486109224197
prolinescaled.to.1layhid1           -4.366854104402
Intercept.to.1layhid2               -1.681766540233
alcoholscaled.to.1layhid2            3.913840975037
malicscaled.to.1layhid2              0.112007947516
ashscaled.to.1layhid2                0.618303630467
ash_alcalinity_scaled.to.1layhid2   -1.665864518869
magnesiumscaled.to.1layhid2         -1.982693239026
phenolsscaled.to.1layhid2            0.126328036123
flavanoidsscaled.to.1layhid2         1.248614378799
nonflavanoidsscaled.to.1layhid2     -0.267467474199
proanthocyaninsscaled.to.1layhid2   -1.343657346762
color_intensity_scaled.to.1layhid2  -0.463319991584
huescaled.to.1layhid2               -0.548163317427
od280scaled.to.1layhid2              0.551611375591
prolinescaled.to.1layhid2            5.324459877786
Intercept.to.1layhid3                2.208863517296
alcoholscaled.to.1layhid3           -0.720143273061
malicscaled.to.1layhid3             -1.967933150514
ashscaled.to.1layhid3               -3.531014608836
ash_alcalinity_scaled.to.1layhid3    2.703634300206
magnesiumscaled.to.1layhid3          0.992866861345
phenolsscaled.to.1layhid3            0.956858389042
flavanoidsscaled.to.1layhid3        -0.719316956501
nonflavanoidsscaled.to.1layhid3     -0.635816394852
proanthocyaninsscaled.to.1layhid3   -0.461502069611
color_intensity_scaled.to.1layhid3   2.166753492882
huescaled.to.1layhid3                0.913466873946
od280scaled.to.1layhid3              0.058860401488
prolinescaled.to.1layhid3           -5.319551669783
Intercept.to.1layhid4               -1.213064388741
alcoholscaled.to.1layhid4            1.567122456780
malicscaled.to.1layhid4              0.751809428075
ashscaled.to.1layhid4                1.133967364478
ash_alcalinity_scaled.to.1layhid4    0.152279237762
magnesiumscaled.to.1layhid4          0.459611380070
phenolsscaled.to.1layhid4           -0.702348806997
flavanoidsscaled.to.1layhid4        -6.853581486782
nonflavanoidsscaled.to.1layhid4      1.779723001861
proanthocyaninsscaled.to.1layhid4   -4.045133834106
color_intensity_scaled.to.1layhid4   4.902332838409
huescaled.to.1layhid4               -1.565135661148
od280scaled.to.1layhid4             -4.739510461797
prolinescaled.to.1layhid4            3.511647158522
Intercept.to.1layhid5                1.131558148467
alcoholscaled.to.1layhid5           -1.452270698674
malicscaled.to.1layhid5             -0.238199016867
ashscaled.to.1layhid5               -1.725333651656
ash_alcalinity_scaled.to.1layhid5    1.295090061355
magnesiumscaled.to.1layhid5          0.183749935015
phenolsscaled.to.1layhid5           -0.419790511394
flavanoidsscaled.to.1layhid5        -1.371340744649
nonflavanoidsscaled.to.1layhid5      1.102733327858
proanthocyaninsscaled.to.1layhid5    0.482319699528
color_intensity_scaled.to.1layhid5  -0.088195693847
huescaled.to.1layhid5                2.559005229722
od280scaled.to.1layhid5              0.572775315754
prolinescaled.to.1layhid5           -6.392195275180
Intercept.to.2layhid1               -1.073879847141
1layhid.1.to.2layhid1               -6.789053673566
1layhid.2.to.2layhid1               -2.794900401128
1layhid.3.to.2layhid1                0.930777226781
1layhid.4.to.2layhid1                3.450121402952
1layhid.5.to.2layhid1               -1.015985936436
Intercept.to.2layhid2               -0.848267836120
1layhid.1.to.2layhid2                1.157301526297
1layhid.2.to.2layhid2              -16.740473027952
1layhid.3.to.2layhid2               17.805998593591
1layhid.4.to.2layhid2                8.381030088800
1layhid.5.to.2layhid2                9.573064863548
Intercept.to.2layhid3               -1.757086115702
1layhid.1.to.2layhid3               -2.159234217360
1layhid.2.to.2layhid3               -0.447968138719
1layhid.3.to.2layhid3                1.721093259652
1layhid.4.to.2layhid3               -3.264801202778
1layhid.5.to.2layhid3                0.367383579259
Intercept.to.2layhid4                0.937424701172
1layhid.1.to.2layhid4                4.113137737700
1layhid.2.to.2layhid4                5.285945682483
1layhid.3.to.2layhid4                0.432029491231
1layhid.4.to.2layhid4               -0.193692899116
1layhid.5.to.2layhid4                2.102303208197
Intercept.to.2layhid5               -0.179191327029
1layhid.1.to.2layhid5               -7.822137595075
1layhid.2.to.2layhid5               -2.513319984669
1layhid.3.to.2layhid5                0.898812584555
1layhid.4.to.2layhid5                7.521968298428
1layhid.5.to.2layhid5               -3.213013310961
Intercept.to.wineryscaled           -2.214560161127
2layhid.1.to.wineryscaled            1.403943656214
2layhid.2.to.wineryscaled            3.865144038233
2layhid.3.to.wineryscaled            0.482941205331
2layhid.4.to.wineryscaled           -1.689631989453
2layhid.5.to.wineryscaled            9.106153432521

Testing The Accuracy Of The Model

As already mentioned, our neural network has been created on the training data, which is then compared to the test data to gauge the accuracy of the neural network forecast. In the below:

  1. The "subset" function is used to eliminate the dependent variable from the test data
  2. The "compute" function then creates the prediction variable
  3. A "results" variable then compares the predicted data with the actual data
  4. A confusion matrix is then created with the table function to compare the number of true/false positives and negatives
#Test the resulting output
temp_test <- subset(testset, select = c("alcoholscaled", "malicscaled", "ashscaled", "ash_alcalinity_scaled", "magnesiumscaled", "phenolsscaled", "flavanoidsscaled", "nonflavanoidsscaled", "proanthocyaninsscaled", "color_intensity_scaled", "huescaled", "od280scaled", "prolinescaled"))
head(temp_test)
nn.results <- compute(nn, temp_test)

#Accuracy
results <- data.frame(actual = testset$winery, prediction = nn.results$net.result)
results
roundedresults<-sapply(results,round,digits=0)
roundedresultsdf=data.frame(roundedresults)
attach(roundedresultsdf)
table(actual,prediction)

The predicted results are compared to the actual results:

> results
    actual    prediction
108    1.0 0.99995414962
109    0.5 0.50151191316
110    1.0 0.99995228733
111    0.5 0.49930952274
112    0.5 0.50028617438
113    1.0 0.99995875557
114    0.5 0.49716042776
115    1.0 0.99996767672
116    0.5 0.50188649166
117    0.5 0.49076029553
118    1.0 0.99994897566
119    1.0 0.99994161781
120    0.0 0.01992618101
121    0.5 0.49715145653
122    0.5 0.49774208945
123    0.5 0.43024708585
124    0.5 0.49851663486
125    0.0 0.01996328691
126    0.5 0.49803671192
127    1.0 0.99995602207
128    0.0 0.02000726194
129    0.5 0.49986920929
130    0.5 0.49775276574
131    1.0 0.99996243010
132    0.5 0.50475471492
133    0.5 0.50053929771
134    0.5 0.48856017998
135    0.0 0.02011796292
136    0.5 0.50085477400
137    0.0 0.01995194973
138    0.0 0.01994526763
139    0.5 0.50081452856
140    0.5 0.49558658990
141    0.0 0.01994843270
142    0.0 0.01993184134
143    0.5 0.50250871984
144    1.0 0.99991503961
145    1.0 0.99994094143
146    0.5 0.50067301770
147    0.5 0.50147374124
148    0.0 0.01992908579
149    1.0 0.99919464198
150    0.5 0.50195934621
151    0.0 0.01995550670
152    0.0 0.01994112330
153    0.0 0.01989655212
154    0.5 0.63320599427
155    0.5 0.50101543943
156    0.0 0.01996333086
157    0.5 0.49605621369
158    0.0 0.02287871724
159    0.0 0.02000574323
160    1.0 0.99993414737
161    1.0 0.99937373065
162    0.0 0.01993904516
163    1.0 0.99986001548
164    0.5 0.50427691544
165    0.5 0.50326245154
166    1.0 0.99996060911
167    0.0 0.02012344943
168    1.0 0.99990343187
169    0.0 0.02170184981
170    0.5 0.49814056737
171    0.5 0.49371963003
172    1.0 0.99995012871
173    0.5 0.50230482951
174    1.0 0.99993593100
175    0.0 0.01994894854
176    0.0 0.01991605207
177    0.0 0.02000725399
178    0.5 0.47763066724

Confusion Matrix

Then, a confusion matrix is created to compare the number of true/false positives and negatives:

> table(actual,prediction)
      prediction
actual  0  1
     0 37 16
     1  0 18

We see that the number of the number of true negatives is 37 (52% of the total), the number of true positives is 18 (25% of the total), while there are 16 false positives (22%).

Ultimately, the confusion matrix indicates that our neural network model has roughly 78% accuracy, and as mentioned an error of 0.007 was obtained.