Each module of a network is composed of Modules and there
are several sub-classes of
Module available: container classes like
Sequential, Parallel and
Concat , which can contain simple layers like
Linear, Mean, Max and
Reshape, as well as convolutional layers, and transfer
functions like Tanh.
Loss functions are implemented as sub-classes of Criterion. They are helpful to train neural network on classical tasks. Common criterions are the Mean Squared Error criterion implemented in MSECriterion and the cross-entropy criterion implemented in ClassNLLCriterion.
This section provides a detailed overview of the neural network package. First the omnipresent Module is examined, followed by some examples for combining modules together. The last part explores facilities for training a neural network.
A neural network is called a Module (or simply
module in this documentation) in Torch.
Module is an abstract
class which defines four main methods:
* forward(input) which computes the output of the module given the
* backward(input, gradOutput) which computes the gradients of the module with respect to its own parameters, and its own inputs.
* zeroGradParameters() which zeroes the gradient with respect to the parameters of the module.
* updateParameters(learningRate) which updates the parameters after one has computed the gradients with
Two other perhaps less used but handy methods are also defined:
* share(mlp,s1,s2,...,sn) which makes this module share the parameters s1,..sn of the module
mlp. This is useful if you want to have modules that share the same weights.
* clone(...) which produces a deep copy of (i.e. not just a pointer to) this Module, including the current state of its parameters (if any).
Some important remarks:
output contains only valid values after a forward(input).
gradInput contains only valid values after a backward(input, gradOutput).
* backward(input, gradOutput) uses certain computations obtained during forward(input). You must call
forward() before calling a
backward(), on the same
input, or your gradients are going to be incorrect!
Plug and play
Building a simple neural network can be achieved by constructing an available layer. A linear neural network (perceptron!) is built only in one line:
mlp = nn.Linear(10,1) -- perceptron with 10 inputs
More complex neural networks are easily built using container classes
Sequential and Concat.
layer in a feed-forward fully connected manner.
Concat concatenates in
one layer several modules: they take the same inputs, and their output is
Creating a one hidden-layer multi-layer perceptron is thus just as easy as:
mlp = nn.Sequential() mlp:add( nn.Linear(10, 25) ) -- 10 input, 25 hidden units mlp:add( nn.Tanh() ) -- some hyperbolic tangent transfer function mlp:add( nn.Linear(25, 1) ) -- 1 output
Concat can contains other
Concat, allowing you to try the craziest neural
networks you ever dreamt of! See the [[#nn.Modules|complete list of
Training a neural network
Once you built your neural network, you have to choose a particular Criterion to train it. A criterion is a class which describes the cost to be minimized during training.
You can then train the neural network by using the StochasticGradient class.
criterion = nn.MSECriterion() -- Mean Squared Error criterion trainer = nn.StochasticGradient(mlp, criterion) trainer:train(dataset) -- train using some examples
StochasticGradient expect as a
dataset an object which implements
dataset[index] and implements the method
size() methods returns the number of
dataset[i] has to return the i-th example.
example has to be an object which implements the operator
field might take the value
2 (corresponding label which will be given to the
criterion). The input is usually a Tensor (except if you use special
kind of gradient modules, like table layers). The
label type depends of the criterion. For example, the
MSECriterion expect a Tensor, but the
ClassNLLCriterion except a integer number (the
Such a dataset is easily constructed by using Lua tables, but it could
C object for example, as long as required operators/methods
are implemented. See an example.
StochasticGradient being written in
Lua, it is extremely easy
to cut-and-paste it and create a variant to it adapted to your needs
(if the constraints of
StochasticGradient do not satisfy you).
Low Level Training
If you want to program the
StochasticGradient by hand, you
essentially need to control the use of forwards and backwards through
the network yourself. For example, here is the code fragment one
would need to make a gradient step given an input
x, a desired
y, a network
mlp and a given criterion
and learning rate
function gradUpdate(mlp, x, y, criterion, learningRate) local pred = mlp:forward(x) local err = criterion:forward(pred, y) local gradCriterion = criterion:backward(pred, y) mlp:zeroGradParameters() mlp:backward(x, gradCriterion) mlp:updateParameters(learningRate) end
For example, if you wish to use your own criterion you can simple replace
gradCriterion with the gradient vector of your criterion of choice.