In this tutoral we will discuss about mathematical basis of single-layer neural network training methods.

Gradient descent method

Gradient descent method is a method of finding a local extremum (minimum or maximum) of a function by moving along a gradient of error function.

 

According to gradient descent method the weights and thresholds of neurons calculated by the formulas:

(1)   \begin{equation*} w_{ij}(t+1)=w_{ij}(t)-\alpha \frac{\partial E}{\partial w_{ij}} \end{equation*}

(2)   \begin{equation*} b_{i}(t+1)=b_{i}(t)-\alpha \frac{\partial E}{\partial w_{ij}} \end{equation*}

Here, E is the error function, and \alpha is the learning rate of training algorithm.

Delta rule

Delta rule also called  Widrow-Hoff’s learning rule was introduced by Bernard Widrow and Marcian Hoff, to minimize the error over all training patterns.  It implies the minimization of the root-mean-square error of the neural network, determined by the formula: E=\frac{1}{2}(Y-d)^2, where d- is a target value.

Each neuron calculates a weighted sum of its inputs according to the formula: S = w_1X_1 + w_2X_2-b. If the linear activation function Y = x is used, then the error functional will be equal to:

(3)   \begin{equation*} E=\frac{1}{2}(w_1X_1 + w_2X_2-b-d)^2 \end{equation*}

The derivatives of the error function by weighs ang threshold expressed as :

(4)   \begin{equation*} \frac{\partial E}{\partial w_1}=(Y-d)X_1 \end{equation*}

(5)   \begin{equation*} \frac{\partial E}{\partial w_2}=(Y-d)X_2 \end{equation*}

(6)   \begin{equation*} \frac{\partial E}{\partial b}=-(Y-d) \end{equation*}

During the training process the weights and the thresholds of the neuron calculated by the formulas:

(7)   \begin{equation*} w_{i}(t+1)=w_{i}(t)-\alpha (Y-d)X \end{equation*}

(8)   \begin{equation*} b(t+1)=b(t)-\alpha (Y-d) \end{equation*}

Consider a neural network consisting of one layer with three neurons

Here X_1,X_2,X_3– is input vector and d_1, d_2,d_3 – target vector.

In this case, the error function will be equal to E=\frac{1}{2}\sum_i(Y_i-d_i)^2
and weights and biases of neurons calculated by the formulas:

(9)   \begin{equation*} w_{ij}(t+1)=w_{ij}(t)-\alpha (Y_i-d_i)X_i \end{equation*}

(10)   \begin{equation*} b_{j}(t+1)=b_{j}(t)-\alpha (Y_i-d_i) \end{equation*}

Single-layer neural network training algorithm 

The sequence of steps for training single-layer neural network by using Widrow-Hoff learning rule:

  1. Specify the learning step α (0 <α <1) and the desired root-mean-square error of the network E_m.
  2. Initialize the weighting coefficients w_{i, j} and the threshold values b_j ​​of neurons by random numbers.
  3. Feed vectors from the training sample to the input of the neural network. Calculate the output values ​​of the neurons.
  4. Change the weight coefficients and thresholds of neural elements according to formulas (9,10).
  5. Calculate the total error of the neural network E
  6. If E> E_m, then go to step 3, otherwise stop the execution of algorithm.