Binarized Neural Networks (BNNs) A Review of Binarized Neural Networks are deep neural networks that use binary values for activations and weights, instead of full precision values.  BNNs can execute computations using bitwise operations, which reduces execution time. Model sizes of BNNs are much smaller than their full precision counterparts.  At training-time the binary weights and activations are used for computing the parameters gradients. During the forward pass, BNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations, which is expected to substantially improve power-efficiency. While the accuracy of a BNN model is generally less than full precision models, BNNs have been closing accuracy gap and are becoming more accurate on larger datasets like ImageNet.  Binary Neural Networks (BNNs) show promising progress in reducing computational and memory costs, but suffer from substantial accuracy degradation compared to their real-valued counterparts on large-scale datasets. Convolutional Neural Networks (CNNs) have achieved state-of-the-art on a variety of tasks related to computer vision, for example, classification detection and text recognition. By reducing memory footprint and accelerating inference, there are two main approaches which allow for the execution of neural networks on devices with low computational power. By quantized floating point numbers with lower precision (binary ,1 bit of storage) weights and activations  are used in these approaches. By using binary weights and inputs a BNN can achieve up to 32×memory saving and 58× speedup on CPUs by representing both weights and activations with binary values.

Binary Neural Network
Binary Neural Network

In BNNs, there are real valued weights which are learned and binary versions of those weights which are used in the dot product with binary activations. In BNNs, the output of the activation function is a binary value and the activation function is the sign function. All values that are learned by the network through backpropagation. This includes weights, biases, gains and other values. Gain is a scaling factor that is usually learned from statistics. The term scaling factor is used at times in the literature, but we use gain here to emphasis its correlation with bias. The connection and layout of digital hardware.

The paper Deep Learning Binary Neural Network on an FPGA presents the architecture design of convolutional neural network with binary weights and activations by using an FPGA platform. Weights and input activations are binarized with only two values, +1 and -1, and proposed design uses only on-chip memories. An efficient implementation of batch normalization operation is introduced. When evaluating the CIFAR-10 benchmark, the proposed FPGA design can achieve a processing rate of 332,158 images per second with accuracy of 86.06% using 1-bit quantized weights and activations.

Another autrors Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to +1 or 1 have introduced BNNs, DNNs with binary weights and activations at run-time and when computing the parameters gradients at train-time. They have conducted two sets of experiments on two different frameworks, on both, BNNs achieved nearly state-of-the-art results over the MNIST, CIFAR-10 and SVHN datasets. They wrote a binary matrix multiplication GPU kernel with which it is possible to run our MNIST BNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy.  During the forward pass (both at run-time and train-time), BNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations, which might lead to a great increase in power-efficiency. Last they programed a binary matrix multiplication GPU kernel with which it is possible to run our MNIST MLP 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy.

A novel scheme to train binary convolutional neural networks (CNNs) – CNNs with weights and activations constrained to {-1,+1} at run-time presented in paper Towards Accurate Binary Convolutional Neural Network. Authors address with two major innovations: approximating full-precision weights with the linear combination of multiple binary weight bases and employing multiple binary activations to alleviate information loss. The implementation of the resulting binary CNN, denoted as ABC-Net, is shown to achieve much closer performance to its full-precision counterpart, and even reach the comparable prediction accuracy on ImageNet and forest trail datasets, given adequate binary weight bases and activations.

In paper BinaryDenseNet: Developing an Architecture for Binary Neural Networks  autjors study existing BNN architectures and revisit the commonly used technique to include scaling factors. Several architectural design principles for BNNs, based on our studies on architectures was suggested and a novel BNN architecture BinaryDenseNet was developed.  BinaryDenseNet achieves 18.6% and 7.6% relative improvement over the well-known XNOR-Network and the current state-of-the-art Bi-Real Net in terms of top-1 accuracy on ImageNet, respectively. Further,authors show the competitiveness of BinaryDenseNet regarding memory requirements and computational complexity.