In this tutoral we will discuss about mathematical basis of single-layer neural network training methods.
Gradient descent method
Gradient descent method is a method of finding a local extremum (minimum or maximum) of a function by moving along a gradient of error function.
According to gradient descent method the weights and thresholds of neurons calculated by the formulas:
(1)
(2)
Here, is the error function, and
is the learning rate of training algorithm.
Delta rule
Delta rule also called Widrow-Hoff’s learning rule was introduced by Bernard Widrow and Marcian Hoff, to minimize the error over all training patterns. It implies the minimization of the root-mean-square error of the neural network, determined by the formula: , where d- is a target value.
Each neuron calculates a weighted sum of its inputs according to the formula: . If the linear activation function
is used, then the error functional will be equal to:
(3)
The derivatives of the error function by weighs ang threshold expressed as :
(4)
(5)
(6)
During the training process the weights and the thresholds of the neuron calculated by the formulas:
(7)
(8)
Consider a neural network consisting of one layer with three neurons
Here – is input vector and
– target vector.
In this case, the error function will be equal to
and weights and biases of neurons calculated by the formulas:
(9)
(10)
Single-layer neural network training algorithm
The sequence of steps for training single-layer neural network by using Widrow-Hoff learning rule:
- Specify the learning step α (0 <α <1) and the desired root-mean-square error of the network
.
- Initialize the weighting coefficients
and the threshold values
of neurons by random numbers.
- Feed vectors from the training sample to the input of the neural network. Calculate the output values of the neurons.
- Change the weight coefficients and thresholds of neural elements according to formulas (9,10).
- Calculate the total error of the neural network
- If
, then go to step 3, otherwise stop the execution of algorithm.