# XOR with Extreme Learning Machine

Extreme Learning Machine (ELM) is a novel neural network algorithm with very fast learning. It can be seen as a generalization of Support Vector Machines (SVM). Traditional neural network learning is slow because it is based on an iterative gradient algorithm. Also it can converge to a local suboptimal solution. ELM works for the “generalized” single-hidden-layer feed-forward networks (SLFNs), but the hidden layer (or called feature mapping) in ELM need not be tuned. With randomly chosen weights for the hidden layer the network can be trained by calculating $$H^\dagger$$, the Moore–Penrose generalized inverse of matrix $$H$$, the output of the hidden-layer on the training examples. In MATLAB or Julia $$H^\dagger$$ can be calculated by the backslash operator.
Below is Julia code to train XOR with ELM:

# two input neurons
X = [[-1 -1], [-1 +1], [+1 -1], [+1 +1]]'
# one output neuron
Y = [+1 -1 -1 +1]

nTrainData = size(X, 2)
nInputNeurons = size(X, 1)
nHiddenNeurons = 30

# regularization parameter
C = 100.

# generate random input weight matrix and bias
inW = rand(nHiddenNeurons, nInputNeurons) * 2 - 1
bias = rand(nHiddenNeurons, 1)

# feed forward training data through hidden layer
H = zeros(nHiddenNeurons, nTrainData)
for i=1:nTrainData
H[:,i] = 1 ./ (1 + exp(-inW*X[:,i]-bias))
end

# regularized output weights matrix
outW = (eye(nHiddenNeurons)/C + H * H') \ H * Y'

# output for the training data
scores = sign(H' * outW)


The regularization parameter C allows a tread off between overfitting the training examples and generalization performance on unknown inputs. The hidden layer uses a sigmoid activation function. Note that the hidden layer has 30 neurons. We could, in theory, teach XOR with only two hidden nodes but ELM requires more to learn the examples correctly. This is a small price to pay for very fast learning.

We can visualize the decision boundary with the PyPlot Package:

using PyPlot

function meshgrid{T}(vx::AbstractVector{T}, vy::AbstractVector{T})
m, n = length(vy), length(vx)
vx = reshape(vx, 1, n)
vy = reshape(vy, m, 1)
(repmat(vx, m, 1), repmat(vy, 1, n))
end

# generate a grid
span = linspace(-1.5, 1.5, 100)
P1, P2 = meshgrid(span, span)
pp = [P1[:] P2[:]]'

nInputPatterns = size(pp, 2)

# simulate neural network on a grid
# feed forward data through hidden layer
H = zeros(nHiddenNeurons, nInputPatterns)
for i=1:nInputPatterns
H[:,i] = 1 ./ (1 + exp(-inW*pp[:,i]-bias))
end
# output layer
aa = H' * outW
aa = reshape(aa, length(span), length(span))

figure()
clf()
pcolormesh(P1, P2, sign(aa), cmap="cool")


The result is:

The decision boundary can be made sharper by using a larger hidden layer, e.g. with 2500 hidden neurons the boundary becomes: