R
library(MASS)
data("Boston")
#install.packages("neuralnet")
library(neuralnet)
Boston$medv[Boston$medv>=5 & Boston$medv<=20]<-1
Boston$medv[Boston$medv>20 & Boston$medv<=35]<-2
Boston$medv[Boston$medv>35 & Boston$medv<=50]<-3
train_idx<-read.csv(file="train_idx.csv")
test_idx<-read.csv(file="test_idx.csv")
train_idx<-as.numeric(train_idx[[1]])
test_idx<-as.numeric(test_idx[[1]])
train<-Boston[train_idx,]
test<-Boston[test_idx,]
The min-max scale method is used to normalize the Boston, train and test data. The reason to normalize data is that avoiding normalization may lead to useless results or to a very difficult training process (most of the times the algorithm will not converge before the number of maximum iterations allowed).
#scale data by min-max method
maxs <- apply(Boston, 2, max)
mins <- apply(Boston, 2, min)
scaled <- as.data.frame(scale(Boston, center = mins, scale = maxs - mins))
train_ <- scaled[train_idx,]
test_ <- scaled[test_idx,]
neuralnet function
Usage
neuralnet(formula, data, hidden = 1, threshold = 0.01, stepmax = 1e+05, rep = 1, startweights = NULL, learningrate.limit = NULL, learningrate.factor = list(minus = 0.5, plus = 1.2), learningrate=NULL, lifesign = “none”, lifesign.step = 1000, algorithm = “rprop+”, err.fct = “sse”, act.fct = “logistic”, linear.output = TRUE, exclude = NULL, constant.weights = NULL, likelihood = FALSE)
Arguments
formula a symbolic description of the model to be fitted.
data a data frame containing the variables specified in formula.
hidden a vector of integers specifying the number of hidden neurons (vertices) in each layer.
algorithm a string containing the algorithm type to calculate the neural network. The following types are possible: ‘backprop’, ‘rprop+’, ‘rprop-’, ‘sag’, or ‘slr’. ‘backprop’ refers to backpropagation, ‘rprop+’ and ‘rprop-’ refer to the resilient backpropagation with and without weight backtracking, while ‘sag’ and ‘slr’ induce the usage of the modified globally convergent algorithm (grprop).
Application of neuralnet
Formula y~. is not accepted in the neuralnet() function, therefore, we need to first write the formula and then pass it as an argument in the fitting function.
n <- names(train_)
f <- as.formula(paste("medv ~", paste(n[!n %in% "medv"], collapse = " + ")))
ONE LAYERS
With the case of one hidden layer,8 neurons are selected in the hidden layer. Then the configration is 13:8:1. Here, set.seed function is used to get the same result of every trial. Neural network is initialised with random values of gains or weights. If we do not use set.seed,we would have different starting points per each simulation during the training phase.
#1 layer
set.seed(200)
nn1 <- neuralnet(f,data=train_,hidden=c(8),algorithm='sag',linear.output=T)
We scale back “medv” of the test data in order to make meaningful analysis below.
test.r <- (test_$medv)*(max(Boston$medv)-min(Boston$medv))+min(Boston$medv)
The net will output a normalized prediction, so we need to scale it back in order to compare with the test data. Then the prediction values are rounded to the integer to compare with the test medv.
pr.nn1_ <- compute(nn1,test_[,1:13])#normalized
pr.nn1 <- pr.nn1_$net.result*(max(Boston$medv)-min(Boston$medv))+min(Boston$medv)#unnormalized
actual1 = test.r
prediction1= pr.nn1
prediction1<-round(prediction1)
table(actual1,prediction1)
## prediction1
## actual1 1 2 3 4
## 1 40 8 0 0
## 2 6 35 3 0
## 3 0 2 4 2
Through this table, we can know how many predictions are mathced with the test data. Then the F1 measure is derived to test the accuracy of the prediction.
p11<-40/(40+6)
p12<-36/(8+36+2)
p13<-4/(3+4)
r11<-40/(40+8)
r12<-36/(6+36+3)
r13<-4/(2+4+2)
f11<-2*p11*r11/(p11+r11)
f12<-2*p12*r12/(p12+r12)
f13<-2*p13*r13/(p13+r13)
print(c(f11,f12,f13))
## [1] 0.8510638298 0.7912087912 0.5333333333
F1score1<-(f11+f12+f13)/3
F1score1
## [1] 0.7252019848
Since medv is divided into three groups, we can get 3 F1 measure regards to each group. Then by getting the average of the F1 measure, the Macro F1 measure with 1 layer is 72.52%.
TWO LAYERS
With the case of two hidden layers, we choose 5 and 3 neurons in the hidden layer. Then the configration is 13:5:3:1.
#2 layers
set.seed(200)
nn2 <- neuralnet(f,data=train_,hidden=c(5,3),algorithm='sag',linear.output=T)
pr.nn2_ <- compute(nn2,test_[,1:13])#normalized
pr.nn2 <- pr.nn2_$net.result*(max(Boston$medv)-min(Boston$medv))+min(Boston$medv)#unnormalized
actual2 = test.r
prediction2 = pr.nn2
prediction2<-round(prediction2)
table(actual2,prediction2)
## prediction2
## actual2 1 2 3
## 1 34 14 0
## 2 6 35 3
## 3 0 2 6
p21<-34/(34+6)
p22<-36/(14+36+2)
p23<-6/(3+6)
r21<-34/(34+14)
r22<-36/(6+36+3)
r23<-6/(2+6)
f21<-2*p21*r21/(p21+r21)
f22<-2*p22*r22/(p22+r22)
f23<-2*p23*r23/(p23+r23)
print(c(f21,f22,f23))
## [1] 0.7727272727 0.7422680412 0.7058823529
F1score2<-(f21+f22+f23)/3
F1score2
## [1] 0.7402925556
The Macro F1 measure with 2 layers is 74.03%.
THREE LAYERS
With the case of three hidden layers, we choose 13 neurons in each hidden layer. Then the configration is 13:13:13:13:1.
#3 layers
set.seed(200)
nn3 <- neuralnet(f,data=train_,hidden=c(13,13,13),algorithm='sag',linear.output=T)
pr.nn3_ <- compute(nn1,test_[,1:13])#normalized
pr.nn3 <- pr.nn3_$net.result*(max(Boston$medv)-min(Boston$medv))+min(Boston$medv)#unnormalized
actual3 = test.r
prediction3 = pr.nn3
prediction3 = round(prediction3)
table(actual3,prediction3)
## prediction3
## actual3 1 2 3 4
## 1 40 8 0 0
## 2 6 35 3 0
## 3 0 2 4 2
p31<-40/(40+6)
p32<-36/(8+36+2)
p33<-4/(3+4)
r31<-40/(40+8)
r32<-36/(6+36+3)
r33<-4/(2+4+2)
f31<-2*p31*r31/(p31+r31)
f32<-2*p32*r32/(p32+r32)
f33<-2*p33*r33/(p33+r33)
print(c(f31,f32,f33))
## [1] 0.8510638298 0.7912087912 0.5333333333
F1score3<-(f31+f32+f33)/3
F1score3
## [1] 0.7252019848
The Macro F1 measure with 3 layers is 72.52%.
In conclusion, Macro F1 are 72.52%, 74.03%, 72.52% for 1 layer (13:8:1), 2 layers (13:5:3:1) and 3 layers(13:13:13:13:1) respectively.
Python
Prepare data
The Boson data can also be loaded through the Python library. The following is a code that takes Boston data from the sklearn library and divides training and testing set by 8:2.
from sklearn import datasets
from sklearn.model_selection import train_test_split
dataset = datasets.load_boston()
data, target = dataset.data, dataset.target
x_train1, x_test1, y_train1, y_test1 = train_test_split(data, target, train_size=0.8)
However, we separated the index for training and testing data in advance to compare the result between languages. The following is a code that divides the data by using the index. x_(data) means input variables and y_(data) means response variable MEDV.
import pandas as pd
# read csv files for indexing testing and training data set
train_data = pd.read_csv('train_data.csv')
test_data = pd.read_csv('test_data.csv')
y_train = train_data[['medv']]
x_train = train_data.drop('medv', axis=1)
y_test = test_data[['medv']]
x_test = test_data.drop('medv', axis=1)
x_train=x_train.values ; y_train=y_train.values ; x_test=x_test.values; y_test=y_test.values
# convert two dimensional array into one dimensional array
y_test=y_test.flatten(); y_train=y_train.flatten()
Data preprocessing
We preprocess the input variables usnig min-max scale method to improve the model performance. The sklearn library allows us to scale data easily with the function preprocessing.MinMaxScaler(). It preprocesses the input variables to fall in between -1 to 1.
from sklearn import preprocessing
data_scaler = preprocessing.MinMaxScaler()
x_train = data_scaler.fit_transform(x_train) ; x_test = data_scaler.fit_transform(x_test)
Next, we divide the response variable into three groups. To include the observations with MEDV=5, We set the minimum criteria as 4.99. For example, if a observation has 5 <= MEDV < 20, the response value will be changed into 0. Similarly, 20 <= MEDV < 35 will be changed into 1 and 35 <= MEDV < 50 will be changed into 2.
bins = [4.99,20,35,50]
group_names = [0,1,2]
y_train = pd.cut(y_train, bins, labels=group_names)
y_test = pd.cut(y_test, bins, labels=group_names)
Produce an ANN classifier with 1-hidden layer (13 nodes)
Now, we produce an ANN classifier with sklearn library. In the code, MLP means the Multilayer Perceptron. We use a logistic sigmoid fuction as the activation fuction. After fitting with training data, we predict with the testing data and generate a confusion matrix so that we can calculate the F1 score. For modeling, we use the 1-hidden layer with 13 nodes and get 82.69% of F1 score as follwing result.
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import classification_report,confusion_matrix
from sklearn.metrics import precision_recall_fscore_support
# ANN Classifier
mlp1 = MLPClassifier(hidden_layer_sizes=(13),max_iter=100000000, activation='logistic',solver='lbfgs')
mlp1.fit(x_train,y_train)
pred1 = mlp1.predict(x_test)
print(confusion_matrix(y_test,pred1))
precision_recall_fscore_support(y_test,pred1,average="macro")
pred1 = mlp1.predict(x_test)
2-hidden layer (5-3 nodes)
In the same way we produce a new model, but with 2-hidden layers having 5-3 nodes and get 84.27% of F1 score as below.
mlp2 = MLPClassifier(hidden_layer_sizes=(5,3),max_iter=100000000, activation='logistic',solver='lbfgs')
mlp2.fit(x_train,y_train)
pred2 = mlp2.predict(x_test)
print(confusion_matrix(y_test,pred2))
precision_recall_fscore_support(y_test,pred2,average="macro")
3-hidden layers (13-13-13 nodes)
Finally, we produce a model with 3-hidden layers having 13-13-13 nodes and get 80.73% of F1 score as below.
mlp3 = MLPClassifier(hidden_layer_sizes=(13,13,13),max_iter=100000000, activation='logistic',solver='lbfgs')
mlp3.fit(x_train,y_train)
pred3 = mlp3.predict(x_test)
print(confusion_matrix(y_test,pred3))
precision_recall_fscore_support(y_test,pred3,average="macro")