The  backpropagation algorithm is one of the methods of multilayer neural networks training. Training process by error back-propagation algorithm involves two passes of information through all layers of the network: direct pass and reverse pass. During a direct pass the input vector is fed to the input layer of the neural network, after which it spreads across the network from layer to layer. As a result, a set of output signals is generated, which is the actual response of the network to this input image. During the direct pass all synaptic weights of the network are fixed. During the back pass, all synaptic weights are adjusted in accordance with the error correction rule, namely: the actual output of the network is subtracted from the desired one, as a result of which an error signal is generated. This signal subsequently propagates through the network in the direction opposite to the direction of synaptic connections.  Synaptic weights are adjusted in order to maximize the output vector of the network to the target vector.

Let us introduce the following notation: X– input vector, Y– output vector, w_{ij}^k, j-th weight coefficient of the j-th neuron of the k-th layer, b_{ki} – threshold of the i-th neuron of the k-th layer, and d_i is a target output of i– th neuron.

The output of the k-th layer j-th neuron  is calculated by the formula:

(1)   \begin{equation*} Y_j^k=F(\sum{w_{i,j}^k Y_i^{k-1}}-b_j^k) \end{equation*}

The output value of the j-th neuron of the output layer is calculated by the formula:

(2)   \begin{equation*} Y_j=F(\sum{w_{i,j} Y_i^{n-1}}-b_j) \end{equation*}

The error function of the network is E=\frac{1}{2}\sum(Y_j-d_j)^2, and \gamma_j=Y_j-d_j  is the error of the j-th neuron of the output layer. Error of the j-th element of the k-th hidden layer is

(3)   \begin{equation*} \gamma_j^k=\frac{\partial E}{\partial Y_j^k}=\sum_j \frac{\partial E}{\partial Y_j}  \frac{\partial Y_j}{\partial S_j} \frac{\partial S_j}{\partial Y_j^k}=   \sum_j \frac{\partial E}{\partial Y_j}  \frac{\partial Y_j}{\partial S_j} w_{i,j} \end{equation*}

(4)   \begin{equation*} \gamma_j^k=\sum_j (Y_j-d_j)F'(S_j)w_{i,j}=\sum_j \gamma_j F'(S_j) w_{i,j} \end{equation*}

The partial derivatives of error function by weight coefficients for hidden layers is equal to:

(5)   \begin{equation*} \frac{\partial E}{\partial w_{i,j}^k}=\sum_j \frac {\partial E}{\partial Y_j} \frac {\partial Y_j}{\partial S_j}\frac {\partial S_j}{\partial Y_j^{k-1}} \frac {\partial Y_j^{k-1}}{\partial S_j^{k-1}} \frac {\partial S_j^{k-1}}{\partial w_{i,j}^k}}=\gamma_j F'(S_j^k) Y_j^k \end{equation*}

The partial derivatives of error function by weight coefficients for output layers is equal to:

(6)   \begin{equation*} \frac{\partial E}{\partial w_{i,j}}=\frac {\partial E}{\partial Y_j} \frac {\partial Y_j}{\partial S_j}\frac {\partial S_j}{\partial w_{i,j}}=\gamma_j F'(S_j) Y_j^k \end{equation*}

The partial derivatives of error function by thresholds is equal to:

(7)   \begin{equation*} \frac{\partial E}{\partial b_j}=\frac {\partial E}{\partial Y_j} \frac {\partial Y_j}{\partial S_j}\frac {\partial S_j}{\partial b_j}=\gamma_j F'(S_j) \end{equation*}

Therefore, weight coefficients and thresholds of neurons can be calculated by the formulas:

(8)   \begin{equation*} w_{ij}^k(t+1)=w_{ij}^k(t)-\alpha \gamma_j^k F'(S_j^k) Y_j^k \end{equation*}

(9)   \begin{equation*} b_{j}^k(t+1)=b_{j}^k(t)-\alpha \gamma_j^k F'(S_j^k) \end{equation*}

Multilayer neural network training algorithm:

  1. Specify the learning rate α (0 <α <1) and the desired root-mean-square error of the network E_m.
  2. The synaptic  weights  w_{ij}^k and the threshold values ​​b_j^k are initialized randomly.
  3. Sequential presentation of the training set vectors to the input of the neural network. For each input vector, the following actions are performed:
  4. The phase of direct pass of the input vector over the network is performed, and the output values of all neurons Y_j^k are calculated.
  5. The errors \gamma_j of neurons  in output and hidden layers are calculated.
  6. The weights and thresholds of the neural elements are changed for each layer of the neural network aacording to algorithm presented above.
  7. Calculation the of the neural networks total error  E.
  8. If E> E_m, then go to step 3, otherwise the execution of the algorithm is completed.