Home

“MXNet” Introduction

MXNet is a multi-language deep learning framework that allows you to mix the flavours of deep learning programs together to maximize the efficiency and your productivity. It can interface with R, Python, Julia, and C++. Embedded in the host language, it combines declarative symbolic expression with imperative tensor computation. It provides auto differentiation to derive gradients. MXNet is computation and memory efficient and runs on various systems from mobile devices to distributed GPU clusters. In recent benchmarks, it performed comparably or faster than its counterparts such as TensorFlow, Torch, or Caffe.
If you are insterested in MXNet or its API in R, visit https://s3.amazonaws.com/mxnet-prod/docs/R/mxnet-r-reference-manual.pdf or https://www.cs.cmu.edu/~muli/file/mxnet-learning-sys.pdf for more information.

Package Installation Guide

Before starting, install R packages “mxnet” and “caret” if haven’t done so. In case you may encounter some problems installing the package “mxnet” in R 3.5.1, below is a chunk of code that might be of help.
Execute the R code below first. If nothing goes wrong, then you are good to go. Otherwise you may need to execute the bash code to fix the problem.

In R, do:

cran = getOption("repos")
cran["dmlc"] = "https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/R/CRAN/"
options(repos = cran)
install.packages("mxnet")
install.packages("caret")

In command line, do:

# if you've already installed Homebrew, openblas and opencv, you can just skip the following three lines
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
brew install openblas 
brew install opencv

# skip following two lines if your openblas and opencv are up-to-date
brew upgrade openblas 
brew upgrade opencv

ln -sf /usr/local/opt/openblas/lib/libopenblasp-r0.3.3.dylib /usr/local/opt/openblas/lib/libopenblasp-r0.3.1.dylib

For more detailed help, you may refer to https://github.com/apache/incubator-mxnet/issues/12066.

Steps:

1. Load the packages and data:

require(mxnet) # this package enables us to train neural network model
library(caret) # the createDataPartition function would allow us to do cross validation

churn = read.csv('./WA_Fn-UseC_-Telco-Customer-Churn.csv')
churn = churn[complete.cases(churn), ]

2. Define the output and input, and change them from categorical data into numerics to fit in mxnet:

op = churn[,'Churn']
op = as.numeric(op) - 1
ip = churn[,1:20]
ip = sapply(ip, as.numeric)

Creating indices, the trainIndex object, and use it to split data into training and test datasets:

set.seed(123) # randomization that controls the random process in createDataPartition
trainIndex = createDataPartition(1:dim(churn)[1], p = 0.75, list = FALSE)
train_op = op[trainIndex] 
test_op = op[-trainIndex] 
train_ip = ip[trainIndex, ]
test_ip = ip[-trainIndex, ]
train_ip = data.matrix( scale(train_ip) )
test_ip = data.matrix(   scale( test_ip, attr(train_ip, "scaled:center"), 
                                attr(train_ip, "scaled:scale") )   )

3. Train the model in two steps

3a. Configure the model using the symbol parameter.

Here we configure a neuralnetwork with two hidden layers, where the first hidden layer contains 20 neurons and the second contains 2 neurons:

# configure a two layer neuralnetwork
data1 = mx.symbol.Variable("data")
fc1 = mx.symbol.FullyConnected(data1, num_hidden = 20)
act2 = mx.symbol.Activation(fc1, act_type = "relu")
fc2 = mx.symbol.FullyConnected(act2, num_hidden = 2)
softmax = mx.symbol.SoftmaxOutput(fc2)
3b. Create the model by calling the model.FeedForward.create() method, you’ll see the process of model training below:
devices = mx.cpu()
mx.set.seed(0)
# create a MXNet Feedorward neural net model with the specified training
model = 
  mx.model.FeedForward.create(softmax, # the symbolic configuration of the neural network
                              X = train_ip, # the training data
                              y = train_op, # optional label of the data
                              ctx = devices, # the devices used to perform training (GPU or CPU)
                              num.round = 30, # the number of iterations over training data 
                              array.batch.size = 40, # the batch size used for R array training
                              learning.rate = 0.1, 
                              momentum = 0.9,
                              eval.metric = mx.metric.accuracy, # the evaluation function on the results
                              initializer = mx.init.uniform(0.07), # the initialization scheme for parameters
                              epoch.end.callback = mx.callback.log.train.metric(100) ) # the callback when one                                                                                             # mini-batch iteration ends
## Start training with 1 devices
## [1] Train-accuracy=0.769696969426039
## [2] Train-accuracy=0.782196971051621
## [3] Train-accuracy=0.782196971503171
## [4] Train-accuracy=0.787689395926215
## [5] Train-accuracy=0.789015151786082
## [6] Train-accuracy=0.792424243959514
## [7] Train-accuracy=0.79375000027093
## [8] Train-accuracy=0.795643941019521
## [9] Train-accuracy=0.796969696879387
## [10] Train-accuracy=0.797159092444362
## [11] Train-accuracy=0.797159090638161
## [12] Train-accuracy=0.800757577925017
## [13] Train-accuracy=0.79848485101353
## [14] Train-accuracy=0.799242426951726
## [15] Train-accuracy=0.799621210856871
## [16] Train-accuracy=0.803030301224102
## [17] Train-accuracy=0.803030302578753
## [18] Train-accuracy=0.801515151605462
## [19] Train-accuracy=0.801136366345666
## [20] Train-accuracy=0.800189394390944
## [21] Train-accuracy=0.804356062954122
## [22] Train-accuracy=0.802272728446758
## [23] Train-accuracy=0.805492424603665
## [24] Train-accuracy=0.805113636634567
## [25] Train-accuracy=0.806628788962509
## [26] Train-accuracy=0.808143940838901
## [27] Train-accuracy=0.809469698053418
## [28] Train-accuracy=0.810795453461734
## [29] Train-accuracy=0.810227273088513
## [30] Train-accuracy=0.809848486474066

4. Make a prediction and get the probability matrix, then calculate the accuracy rate:

# make a prediction use the model trained
preds = predict(model, test_ip)
predict_test_df = data.frame(t(preds))
pred_test = predict_test_df
pred_label = max.col(pred_test) - 1
df_pred = data.frame( table(pred_label, test_op) )
# get the probability matrix
knitr::kable(df_pred, col.names = c('Prediction Label','Test Output','Frequency'), align = 'c')
Prediction Label Test Output Frequency
0 0 1200
1 0 103
0 1 251
1 1 202
df_pred$pred_label = as.numeric( as.character(df_pred$pred_label) )
df_pred$op = as.numeric( as.character(df_pred$test_op) ) 
# get the index where the prediction is correct
ind = which( df_pred$pred_label == df_pred$op ) 
# calculate the accuracy rate
pred_accuracy = sum(df_pred[,3][ind]) / sum(df_pred[,3])
print(pred_accuracy)

Prediction accuracy:

## [1] 0.7984055

5. To get an idea of what is happening, view the computation graph from R:

graph.viz(model$symbol)

In conclusion