Natural vs Artificial Neural Networks
For the past few years, deep learning and Artificial Neural Networks (ANNs) gained a lot of popularity as a machine learning algorithm in a wide variety of fields. These include computer vision, natural language processing / machine translation, speech processing and generation, robotics and self-driving cars. Many tasks which were previously reserved exclusively for humans slowly become automated with ANNs, often with equal or even better performance.
Looking at the many successes of ANNs and deep learning, one may be inclined to believe that they are to some degree able to emulate how humans think. After all, the idea of neural networks evolved from observing biological nervous systems such as the human brain. In this blog post, I would therefore like to highlight some very important differences between ANNs and nervous systems in vertebrates such as humans. I will explain how both of these work and learn. Furthermore, I will outline some current developments in neuromorphic computing. Neuromorphic computing is a novel computing domain trying to more closely emulate biological nervous systems.
Artifical Neural Networks (ANNs)
As already mentioned, ANNs were developed as very crude approximations of nervous systems found in biological organisms. The idea of an Artificial Neural Network is to transport information along a predefined path between “neurons”. Neurons have the ability to add up information from multiple sources and they generally apply a non-linear transformation to this information in order to allow for more complex interactions.
Contrary to some popular beliefs, the idea of ANNs is already very old. One of the first neural networks to be invented was the perceptron. The perceptron was a very simple neural network with only one neuron and the Heaviside function as a non-linearity. In other words, the perceptron implements the following decision function:
Trending AI Articles:
Its extension, the multilayer perceptron, can be regarded as the first “real” neural network, with multiple neurons connected in 2 layers:
This neural network has much more expressive power than a single neuron. In fact, it can be shown that the multilayer perceptron is a universal function approximator.
For specific tasks, more complex ANNs have been invented over time, including the Convolutional Neural Network (CNN), thought to roughly emulate the human visual system, as well as the Recurrent Neural Network (RNN), used to interpret and generate sequential data such as text and video. Although the architectures of these networks is much more complicated than that of the multilayer perceptron, they can be regarded as variations of the same idea. This can be seen for instance in the fact that all types of ANNs may use the same learning algorithms, e.g. gradient descent using backpropagation.
In general, we can summarize the workings of ANNs as follows: ANNs modify the multidimensional representation of some input data by gradually applying (multi-)linear and non-linear transformations to its components and thus changing the representation up to a certain final result. The final result could be some classification or some other output signal, such as translated text. Most importantly, there is no notion of time in these networks. As far as the network is concerned, the output is generated instantaneously. Furthermore, the outputs of ANN neurons are continuously valued.
Biological Nervous Systems (BNSs)
Biological nervous systems, such as those found in vertebrates, operate in a different way. Take a look at the diagram in Figure 2. It shows a neuron cell connecting to another neuron. The cell consists of a cell body, with dendrites acting as connecting wires for other neurons to connect to. In most cases, a neuron has one axon capable of transmitting electric currents actively to other connecting cells. The connections between neurons are established using synapses located at the end of the axon.
These synapses are responsible for a lot of the magic of computing and memory in the nervous system. To see how this works, let us first examine the response of a single neuron to an incoming signal from its dendrites (Figure 3).
We can observe that the synaptic membrane of the neuron typically has a resting potential of -70mV. In the resting state the cell does not carry any signal. In order to transmit a signal the cell has to be stimulated by its input synapses. In our example, the voltage threshold required to elicit a response is -55mV. This means that the total sum of the input has to amount to +15mV. If each input neuron only contributes +5mV, we need at least 3 such inputs to be active at the same time to provoke a response in the neuron. The mechanism is very similar to that of the artificial neurons in that the inputs get summed up before passing through a non-linearity. Inputs that cause a positive change in the membrane potential are called excitatory. In contrast to ANNs, in BNSs we can also find inhibitory inputs causing a negative change in the potential. These act to prevent the firing of an action potential.
In Figure 3 we see that if the input is weaker than +15mV, there is no signal at the output. However, above the threshold, a strong active signal in generated, called the action potential. The potential rises sharply and is carried through the axon to the output synapses. After a short amount of time, the potential declines again. For some time after the decline, the cell may not be stimulated again (absolute refractory period).
The sharp thresholding of the input is equivalent to the non-linearity present in ANNs (the threshold in BNSs corresponds to a Heaviside non-linearity, just like in the early perceptron). Note that the firing of the action potential is binary: either it occurs or it does not. There is nothing in-between. This is called the all-or-nothing principle. Therefore, BNSs carry binary signals. This is in contrast to ANNs which carry continuous signals. However, the binary signals in NNNs are changing over time, which makes up for the reduced expressive power of binary values.
Of course, in order to transfer information in its binary form, BNSs have to use some kind of binary encoding, just as computers do. The most popular theories are that BNSs use frequency-based encoding and/or temporal encoding. In frequency-based encoding, the average frequency of impulses encodes the strength of a signal. Therefore, we might say that BNSs use frequency modulation. Notice that the average frequency is important in this mechanism, not the arrival time of individual impulses. This form of signal transfer is not very effective because it ignores the small time differences between individual impulses that could in principle carry information. However, frequency encoding is very robust to noise. Noise is a big concern in a electrochemical systems such as the nervous system.
Temporal encoding is similar to frequency encoding, except that the arrival time of individual impulses does matter in this case. A number of studies (see e.g. this paper) suggest that temporal encoding may be used at various places in our nervous system, such as in sensory systems. Temporal coding is very sensitive to noise, but may be necessary if a large amount of information needs to be transferred quickly.
Learning Process In ANNs
Due to their different concepts, the process of learning differs between ANNs and BNSs.
For ANNs, we usually use a technique called gradient descent to train the network. Gradient descent requires a large amount of data (the data has to be annotated for supervised learning) in order to determine the gradient of a predefined cost function and to converge to a (ideally global) minimum. Once the minimum is found, the network is considered trained and can be used for inference. Usually, no further training occurs at this stage. Multiple versions of gradient descent have been invented over time, with some of the more prominent being the Stochastic Gradient Descent (SGD), Adadelta or Adam.
The advantages of gradient descent include fast and simple implementation and relatively fast convergence as compared to algorithms such as Genetic Algorithms (GA).
One of the disadvantages is that in order to apply gradient descent, the network has to be static, i.e. the connections between neurons may change in strength but not in number. Therefore, during the final inference stage, the computational load remains the same regardless of the complexity of the actual problem.
Another issue with gradient descent is its tendency to overfit the training data. Overfitting causes the network to have good performance on training data and bad performance on test data. For complex problems such as computer vision problems that require networks with millions of parameters, the only way to alleviate this issue is to provide more training data. This can limit the applicability of deep learning for problems with small amounts of available data.
Learning Process In BNSs
The learning process of the human brain is not yet completely understood. However, we can say almost with certainty that the brain doesn’t use a global learning method such as gradient descent. Gradient descent as is used in ANNs requires backpropagation. There is no biological mechanism for errors to be backpropagated further than a single neuron. Another issue is the definition of the loss function: do we as humans have inherent loss functions for various tasks defined in our brains? This is not very likely.
On the other hand, we found strong evidence (thanks largely to Eric Kandel, a Nobel laureate) that the brain is learning by using local methods such as Hebbian learning or Spike-Timing-Dependent Plasticity (STDP). These methods are similar to learning in Restricted Boltzmann Machines in that they strengthen connections between neurons that often transmit signals between each other. The foundational concept of Hebbian learning is best explained by the author himself:
Let us assume that the persistence or repetition of a reverberatory activity (or “trace”) tends to induce lasting cellular changes that add to its stability. […] When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased. (Hebb, D.O. (1949). The Organization of Behavior. New York: Wiley & Sons.)
STDP further enhances the Hebbian learning concept by making it time-asymmetric. More specifically, STDP strengthens the connection between neurons if input spikes in the first neuron occur immediately before output spikes in the second neuron. Otherwise, if the input spike of the first neuron occurs immediately after the output spike of the second neuron, the connection is weakened.
Hebbian learning is probably the most important learning mechanism occurring in the human brain. However, there is evidence that long-term memory may also work by regulating the presence/absence of connections between individual neurons, thus changing the very structure of the network.
Attempts At Emulation Of The Human Nervous System
Despite of the current success of ANNs in the field of AI and Machine Learning, BNSs still exhibit multiple advantages, the most important being their extremely low power consumption. Current consensus among researchers suggests that the brain consumes approx. 20W of energy. Compare that to an infinitely less capable Nvidia GPU consuming 300W running a deep neural network. The low power consumption in turn requires much less cooling and allows the brain to build 3-D structures as opposed to electronic chips mostly following of 2-D designs.
Multiple research groups are currently trying to emulate some aspects of the nervous system in silico. Such systems are called neuromorphic. Advantages of such systems should include, among others, high power efficiency, inherent massive parallelism and potentially real-time performance in inference.
One of the most prominent examples of such systems is SpiNNaker, a novel computing architecture developed by the University of Manchester and part of the Human Brain Project (HBP). SpiNNaker implements a concept known as Spiking Neural Network (SNN) that closely resembles biological nervous systems. The goal of the project is to simulate up to a billion neurons in real time. In comparison, the human brain contains approx. 85 billion neurons. To achieve this, the SpiNNaker engine will contain up to 1,036,800 ARM9 cores and 7TB of RAM distributed throughout the system in 57,000 nodes. The neural network model itself is described in PyNN, the Python Neural Networks language. Multiple learning algorithms such as back-propagation for supervised learning as well as STDP can be implemented on SpiNNaker, making it extremely versatile with respect to specific applications.
A concurrent project developing a hardware architecture for Spiking Neural Networks is IBM TrueNorth, part of the DARPA SyNAPSE initiative. The TrueNorth chip is a totally custom neuromorphic CMOS chip consisting of 4,096 hardware cores, each one simulating 256 programmable neurons for a total of just over a million neurons. Each neuron has 256 programmable synapses for communication. This means that there are just over a million neurons and 268 million programmable synapses on the chip, which makes it roughly equivalent to the computing power of a rodent. The energy consumption of the chip is merely 70mW. Compared to SpiNNaker, the TrueNorth chip is highly optimized for a specific neural architecture and is currently limited to inference, i.e. the network weights have to be fixed before computation. On the other hand, the power consumption of TrueNorth is much lower for suitable problems.
A major issue with neuromorphic computing is that it requires completely different thinking in terms of algorithms. This is because the non-von Neumann architecture of the chips is not well-suited for algorithms developed for von Neumann systems. Training spiking neural networks usually involves STDP and its multiple variants. Although back-propagation can be used on some neuromorphic systems (such as SpiNNaker), it was originally proposed for von Neumann computers. Similarly to other novel hardware architectures, computer scientists and developers will first have to get familiar with the architecture in order to design and optimize new and existing machine learning algorithms for these architectures. Another goal of neuromorphic computing are simulations of mammal brains, which will definitely require some new algorithm concepts and a lot of experimentation.
Until now, neuromorphic computing is a niche field. However, interest in it is slowly rising and in the next couple of years, I predict that we will see if the technology will be here to stay with us or if it will vanish due to new developments in GPUs and Deep Learning-specific hardware, such as the Google TPUs or Microsoft’s Project Brainwave.