A neural network is a computational system frequently employed in machine learning to create predictions based on existing data. In this example, we will use the **neuralnet** package in R to train and test a neural network model.

A typical neural network consists of:

**Input layers:**Layers that take inputs based on existing data**Hidden layers:**Layers that use backpropagation to optimise the weights of the input variables in order to improve the predictive power of the model**Output layers:**Output of predictions based on the data from the input and hidden layers

# Model Background

The below neural network is based on Milica Stojković’s example of using such a model to classify wineries by certain attributes (the actual dataset can be found here):

- Alcohol
- Malic Acid
- Ash
- Ash Alcalinity
- Magnesium
- Total Phenols
- Flavanoids
- Nonflavanoid Phenols
- Proanthocyanins
- Color Intensity
- Hue
- OD280/OD315 of dedulted wines
- Proline

# Data Normalization

One of the most important procedures when forming a neural network is normalization of the data, i.e. adjusting the data to a common scale so that the predicted values of the neural network can be accurately compared to that of the actual data. Failure to normalize the data will typically result in the prediction value remaining the same across all observations regardless of the input.

We can choose to do this in two ways in R:

- Scale the data frame automatically using the
**scale**function in R - Transform the data using what is called a
**max-min normalization**technique

For the purposes of this tutorial, both are run but we choose to use the max-min normalization technique.

#Scaled Normalization scaleddata<-scale(mydata) #Max-Min Normalization normalize <- function(x) { return ((x - min(x)) / (max(x) - min(x))) } maxmindf <- as.data.frame(lapply(mydata, normalize))

Please see this link for further details on how to use the normalization function.

Our new dataset is now scaled and saved into a data frame titled **maxmindf**:

Our training data (trainset) is based on the first 142 observations, and the test data (testset) is based on the remainder of observations.

#Training and Test Data trainset <- maxmindf[1:142, ] testset <- maxmindf[143:178, ]

While normally a training dataset would account for **80%** of the observations and the test dataset accounting for the remaining **20%**, it is also important to ensure a representative sample across the two, and given the number of observations we choose to use a roughly 55%/45% split here.

# Training a Neural Network Model using neuralnet

To set up the neural network, the **neuralnet** library is loaded into R and from the code below, observe that we are:

- Using neuralnet to "regress" the dependent
**wineryscaled**variable against the other independent variables - Setting the number of hidden layers to (2,1) based on the hidden=(2,1) formula
- The linear.output variable is set to FALSE, given the impact of the independent variables on the dependent variable (sales) is assumed to be non-linear

Note that the process of deciding on the number of hidden layers in a neural network is not an exact science. As a matter of fact, there are instances where accuracy will likely be higher without any hidden layers. Therefore, trial and error plays a significant role in choosing the number of hidden layers.

One possibility is to compare how the accuracy of the predictions change as the number of hidden layers are modified, resulting in a more systematic means of accurately choosing the number of hidden layers.

However, I found that using a (2,1) configuration ultimately yielded **72%** classification accuracy, which I deem reasonable given 142 observations in our training data and decided to go with this.

#Neural Network library(neuralnet) nn <- neuralnet(winery ~ alcohol + malic + ash + ash_alcalinity + magnesium + phenols + flavanoids + nonflavanoids + proanthocyanins + color_intensity + hue + od280 + proline, data=trainset, hidden=c(2,1), linear.output=FALSE, threshold=0.01) nn$result.matrix plot(nn)

Our neural network looks like this:

Moreover, the error of the neural network model as well as the weights between the inputs, hidden layers, and outputs are also printed:

> nn$result.matrix 1 error 0.052103509186 reached.threshold 0.007918868535 steps 1410.000000000000 Intercept.to.1layhid1 -1.083867861177 alcohol.to.1layhid1 0.058782998726 malic.to.1layhid1 -1.389912517542 ash.to.1layhid1 -0.551614831886 ash_alcalinity.to.1layhid1 -2.118158540493 magnesium.to.1layhid1 -0.648224033756 phenols.to.1layhid1 0.718874765455 flavanoids.to.1layhid1 6.122425327758 nonflavanoids.to.1layhid1 0.373874042298 proanthocyanins.to.1layhid1 8.155770325494 color_intensity.to.1layhid1 -5.466829108812 hue.to.1layhid1 4.482418679851 od280.to.1layhid1 2.784020635719 proline.to.1layhid1 -4.811269297043 Intercept.to.1layhid2 0.205810529136 alcohol.to.1layhid2 -4.411136753329 malic.to.1layhid2 -1.689521058295 ash.to.1layhid2 -3.857094752264 ash_alcalinity.to.1layhid2 4.268766155835 magnesium.to.1layhid2 -1.217728932841 phenols.to.1layhid2 1.769719578190 flavanoids.to.1layhid2 1.020673251351 nonflavanoids.to.1layhid2 0.280279450037 proanthocyanins.to.1layhid2 7.536122244890 color_intensity.to.1layhid2 -2.958468172972 hue.to.1layhid2 5.646866707125 od280.to.1layhid2 1.698191285380 proline.to.1layhid2 -10.388711874839 Intercept.to.2layhid1 1.570686622007 1layhid.1.to.2layhid1 -4.366569866818 1layhid.2.to.2layhid1 2.574428685720 Intercept.to.winery -5.406306976366 2layhid.1.to.winery 12.252214456773

# Testing The Accuracy Of The Model

As already mentioned, our neural network has been created on the training data, which is then compared to the test data to gauge the accuracy of the neural network forecast. In the below:

- The "subset" function is used to eliminate the dependent variable from the test data
- The "compute" function then creates the prediction variable
- A "results" variable then compares the predicted data with the actual data
- A confusion matrix is then created with the table function to compare the number of true/false positives and negatives

#Test the resulting output temp_test <- subset(testset, select = c("alcohol", "malic", "ash", "ash_alcalinity", "magnesium", "phenols", "flavanoids", "nonflavanoids", "proanthocyanins", "color_intensity", "hue", "od280", "proline")) head(temp_test) nn.results <- compute(nn, temp_test) #Accuracy results <- data.frame(actual = testset$winery, prediction = nn.results$net.result) results roundedresults<-sapply(results,round,digits=0) roundedresultsdf=data.frame(roundedresults) attach(roundedresultsdf) table(actual,prediction)

The predicted results are compared to the actual results:

> results actual prediction 143 0.5 0.543043044188 144 1.0 0.989204918591 145 1.0 0.991078885094 146 0.5 0.509822224181 147 0.5 0.522207564576 148 0.0 0.009985006935 149 1.0 0.991632930913 150 0.5 0.509800468823 151 0.0 0.012648538383 152 0.0 0.015505457199 153 0.0 0.009193526072 154 0.5 0.942799042540 155 0.5 0.515707638536 156 0.0 0.011294167416 157 0.5 0.074497852675 158 0.0 0.060706499836 159 0.0 0.010810687248 160 1.0 0.992823880777 161 1.0 0.990914522360 162 0.0 0.009788727793 163 1.0 0.987136899504 164 0.5 0.511240323832 165 0.5 0.511415717606 166 1.0 0.991329204858 167 0.0 0.040412959404 168 1.0 0.986658393972 169 0.0 0.089556958227 170 0.5 0.506644026733 171 0.5 0.201560122367 172 1.0 0.988175398882 173 0.5 0.502025119248 174 1.0 0.988233917400 175 0.0 0.010478332645 176 0.0 0.010808559391 177 0.0 0.014391386521 178 0.5 0.384616587455

# Confusion Matrix

Then, a confusion matrix is created to compare the number of true/false positives and negatives:

> table(actual,prediction) prediction actual 0 1 0 16 10 1 0 10

We see that the number of the number of true negatives is 16 (44% of the total), the number of true positives is 10 (28% of the total), while there are 10 false positives (28%).

Ultimately, the confusion matrix indicates that our neural network model has roughly 72% accuracy.