Neural Nets
Home Up History Ethics Resources Architecture

 

Since the 1980s interest has been growing in AI systems that aren’t based on a von Neumann computer structure. These systems are called connectionist networks and are modeled on the structure of the brain—they are also called neural networks. In fact, scientists have been attempting to create electronic models of the brain since the 1940s.

A conventional von Neumann computer stores information in binary form explicitly within the memory. This information includes both the program that specifies the operations to be carried out by the computer and the data used by the computer. A neural network operates on entirely different principles to the digital computer—information is stored in the interconnection of the neural elements and the parameters (or weightings) associated with the neural elements. Moreover, a neural network does not execute a program.

Artificial electronic neurons that simulate biological neurons were first investigated by McCulloch and Pitt in 1943. A primitive artificial neuron is essentially a gate with two or more inputs and an output. A real neuron in the human brain might have 10,000 inputs.

 

The neuron

Specific hardware systems or computer languages are often well suited to particular problems. Like the human brain itself, the neural network is well suited to an area of AI called pattern recognition. Even more significantly, the neural network is programmed or trained by providing it with examples—the user of a neural network doesn’t have to know anything about the network’s operation.

Pattern recognition is concerned with extracting and characterizing features in input data. For example, you might feed the output of an electro-cardiogram, EKG, into a neural network and the neural network would indicate whether or not the heart was healthy. Similarly, you might feed past and current share prices and economic data into a neural network in order to predict the future behavior of the market.

In principle, pattern recognition is a simple operation. First, you have to divide the universe into distinct regions; for example, if you were attempting to recognize text made up of the 26 uppercase characters, you would have to determine the features that distinguish between these letters and divide space into 26 regions. An unknown letter is recognized by extracting its features and then comparing these features with the features of each of the 26 letters. The unknown letter is classified when one of the 26 possible letters provides the best match.

The Linear Classifier

A linear classifier that can be used to determine whether a vegetable is either plant A or plant B. The figure has two axes: weight and size. The circles represent instances of plant A and the crosses instances of plant B. These points are made by measuring the weight and size of actual vegetables. Because plants vary in weight and size, you can see that there are a range of points on the graph for both plant A and plant B. However, you can also see that is it possible to draw a line between the instances of plant A and plant B. If you plot the location of a plant of unknown type on the graph, you can classify it as plant A or plant B depending on which side of the line it falls. Neural networks perform this type of classification and discrimination automatically.

Linear classification

A linear classifier is so-called because a straight line can be used to divide the world into two categories. Furthermore, the classification problem is said to be linearly-separable because the space represented by the graph can be partitioned into two regions by a single line.

Consider the system below. In this case, the instances of plant A and plant B are distributed in such a way that two lines are required to separate the two classifications. Such a system is called non-linearly separable. Early research into neural networks indicated that it was impossible to design a simple neuron to classify patterns that were non-linearly separable. However, we shall learn that networks of neurons can handle patterns like this. 

Non-linearly separable regions

 

The McCulloch-Pitts Neuron

The diagram below describes the internal structure of a McCulloch-Pitts neuron. Unlike binary logical AND, OR and NOT gates, the neuron shares many of the characteristics of an analog device; for example, its inputs can take any value within a range rather than the 0 or 1 of a digital system. In this diagram, the neuron’s two inputs x1 and x2 are multiplied by two constants or weights, w1 and w2 respectively—these are the values we referred to when we said that information in a neural net was stored in the form of parameters. The values of the weights w1 and w2 determine the behavior of the neuron. The two products w1.x1 and w2.x2 are transmitted to a summer (i.e., adder) to generate the sum w1.x1 + w2.x2. Finally, the output of the summer is passed through a threshold device or limiter that produces an output +1 if w1.x1 + w2.x2 is greater than a threshold, and an output 0 if w1.x1 + w2.x2 is less than a threshold. This device is called a hard limiter and is used by simple neurons.

 

The neuron

Suppose that this neuron has inputs x1 = 1.0 and x2 = 0.5, weights w1 = 1.5 and w2 = 0.6, and a threshold value of 1.2. The weighted sum of the inputs sent to the threshold device is given by w1.x1 + w2.x2  = 1.5 x 1.0 + 0.6 x 0.5 = 1.5 + 0.3 = 1.8, and the output of the neuron is 1 (because 1.8 > 1.2).

In general, the output Z of a neuron with n inputs x0, x1, x2, ..., xn-1, threshold T, and n weightings w0, w1, w2, ..., wn-1  is given by:

IF  Swixi > T THEN Z = 1 ELSE Z = 0.

where xi is the weighting of the ith input. Unlike many digital logic gates, the order of the neuron’s inputs is important—a four-input neuron with inputs 1, -0.5, 0.2, 0.4 will (in general) behave differently than the same neuron with inputs 1, 0.4, -0.5, 0.2.

A neural network is composed of a set of interconnected neurons. The behavior of a neural network is determined by the pattern of the interconnections between the neurons (which are normally fixed) and the values of the weightings used by each neuron. Note that the neuron doesn’t function in exactly the same way as a neuron in the brain; for example, the threshold mechanism of a real neuron is rather more complex. However, our simplified neuron forms the basis of practical neural networks.

What Does a McCulloch-Pitts Neuron do?

We’re now going to describe what a neuron does. To avoid too high a level of suspense, we will state what a neuron does before we demonstrate how it does it—a neuron is a discriminator or classifier that divides all its possible input patterns into two groups rather like the linear classifier.

We explain the action of a neuron graphically by plotting its output as a function of the two inputs x1 and x2 and weights w1 and w2, respectively. The input to the summer is the linear sum x1w1 + x2w2. The graph below plots input x1 against x2 for the case w1 = 1.0, w2 = 2.0, and the threshold = 1.5. All points along the line represent the condition 1.0x1 + 2.0x2 = 1.5. Points above the line represent the condition 1.0x1 + 2.0x2 > 1.5 and points below the line represent 1.0x1 + 2.0x2 < 1.5. Note how this neuron is discriminating between the region of space above the line and the region of space below the line.

The behavior of a simple two-input neuron

Once again, we must stress that the simple 2-input neuron represented by this graph acts as a discriminator—it divides the world into two regions depending on the values of the inputs x1 and x2, and the threshold T. Let’s look at an example of a neuron with weightings w1,w2 = 1,1. The graph below describes the situation for three values of the threshold: T = 0.5, T = 1.0 and T = 1.5.

Interpreting the behavior of a neuron as a function of the threshold

In this example there are three lines for the cases T = 0.5, T = 1.0, and T = 1.5. As you can see, the different values of the threshold, T, don’t affect the slope of the line, but determine where the line crosses the x1 and x2 axes. Changing the value of the threshold T simply modifies the way in which the space is divided into two regions. Let’s take the simplification process one stage further by limiting the range of values that x1 and x2 can take.

The next diagram plots the graph of 1.0x1 + 1.0x2 = T but with the values of x1 and x2 constrained to 0 and 1 (i.e., inputs x1 and x2 are treated as conventional digital signals that have only two possible values). The four possible values of x1,x2 are drawn as circles. In figure (a) the threshold T is set at less than 1 and the output of the neuron is 1 if x1,x2 = 0,1, 1,0, or 1,1; that is the neuron behaves as an OR gate. In (b) the threshold is increased to T > 1 and the output is now 1 only if x1,x2 = 1,1; that is the neuron behaves as an AND gate.

A neuron with inputs constrained to 0 or 1

These figures demonstrate that the neuron’s ability to divide the world into two regions can be used to synthesize logic functions simply by modifying a single parameter—the threshold. In a real system, it is not necessary to have an explicit variable threshold, T. It is perfectly possible to set the threshold at a value, say 0, and use an additional input to control the effective threshold. Consider the equation of the straight line:

w1x1 + w2x2  = T.

We can rewrite this equation as 1.0w0 + w1x1 + w2x2 = 0, where w0 is a bias term equal to -T. The weight wo controls the effective threshold of the neuron. We could use the simple neuron we have just described to implement a linear discriminator that distinguishes between plant A and plant B on the basis of weight and size. We just feed the values of the weight and size of a plant into the neuron, and it provides a 1 or 0 output to indicate plant A or plant B.

Limitations of the McCulloch-Pitts Neuron

The basic neuron suffers from an important limitation that severely restricted its development in the 1970’s. The McCulloch-Pitts neuron can recognize only inputs that are said to be linearly separable (i.e., a single line divide the input space into two regions). Not all patterns are linearly separable as we have pointed out. Consider the exclusive OR function, EOR. The output of an EOR gate is true if exactly one of its inputs is true and false if both its inputs are false or both its inputs are true. The simple neuron is unable to synthesize the exclusive EOR.

The next figure illustrates the EOR function—the true outputs corresponding to inputs 1,0 and 0,1 are indicated by filled black circles and the false outputs corresponding to inputs 0,0 and 1,1 are indicated by gray circles. As we have already pointed out, the neuron divides the input space into two regions according to the inputs, their weightings, and the threshold. If you look at this diagram you will see there is no way in which a single line can divide the input space into two parts—one part containing the black circles and one part containing the gray circles.

The EOR function and the neuron

We can implement an EOR function by using two lines to divide the input space into three regions as the figure below demonstrates. By using two different thresholds, a single region can be created that contains the two true EOR outputs corresponding to inputs 0,1 and 1,0. In order to synthesize an EOR function with neurons, we have to use a system with three neurons. This system makes it possible to use neurons to detect patterns that are non-linearly separable.

EOR space

 

Implementing the EOR function

This simple two-layer structure can be generalized to the general neural network below. A set of n inputs are connected to each of the n neurons of the input layer. Each of the outputs of the neurons in the input layer is connected to a neuron in the hidden layer. The hidden layer is so called because its terminals cannot be accessed at the inputs or the outputs of the neural network. Finally, each of the outputs in the hidden layer is connected to neurons in the output layer. The outputs of these neurons form the outputs of the neural network. The actual number of neurons in each layer depends on the application for which the neural network is intended.

The neural network

 

Learning and Neural Networks

Having described the properties of an individual neuron and shown how they are interconnected, the next step is to consider how they are programmed to solve a problem. A typical practical neural network is composed of three (or more) layers of interconnected neurons as the above figure demonstrates. The output of each individual neuron is determined by the weighted sum of its inputs, a threshold, and the transfer function of a limiter. The programming of the neural network is carried out by setting all the weights and thresholds of the neural network to the “appropriate” values— the above network has a total of 43 parameters to set.

As we have already stated, neural networks are not programmed like conventional von Neumann digital computers. A digital computer requires you to construct an algorithm and then to express it in a high-level language that is itself translated into the machine’s machine code by a compiler. Neural networks are not programmed—they are trained by means of examples. Initially, a set of sample input values are connected to the inputs of a neural network and the weights adjusted until the output has the correct value. You can do this because you use input data for which you already know the output. Suppose, for example, you were trying to design a neural network to predict the weather. You could present yesterday’s weather conditions to the network and adjust the parameters until the output corresponds to today’s weather.

Once you have adjusted the parameters of the neural network, you provide another set of inputs and adjust the weights to give the required output. This entire process is carried out hundreds or thousands of times to train the network. After training, an unknown set of inputs should provide the “correct” output. The training process is similar to the way in which people actually learn. They are presented with examples and specimen solutions and, eventually, they learn to solve new problems.

Training a single neuron is easy. Suppose you have a neuron with four inputs x1 to x4 and five weights w0 to w4 (remember that weight w0 determines the threshold of the limiter). A known input is applied on x1 to x4 and the weights are set to random values. The output will then be either incorrect or correct. With random weights, the neuron has a 50% chance of generating the correct output. If the output is correct, you do nothing and start again with a new set of inputs. If the output is incorrect, you have to modify the weights in order to train the neuron.

A simple training rule was proposed by Donald Hebb in 1949. Hebb’s rule requires you to increase the weights associated with active inputs whenever the neuron produces an incorrect response to a set of inputs; that is, if the output is 0 when it should be 1, each weight associated with a positive (or active) input is increased by a small amount. This process is now called Hebbian learning. It is also called supervised learning, because you have to provide the neuron with examples from which it is to learn.

The simple Hebbian learning mechanism that reinforces active inputs can be improved by inhibiting negative inputs; that is, when the output is one and it should be zero, the weights of all inactive inputs are reduced by a small amount. Consider the following algorithm.

Sum = 0
FOR i = 0 to n
     Sum = Sum + wixi
END_FOR

IF Sum < 0 THEN Z = 0
           ELSE Z = 1

IF Z is correct THEN
                    Start again with a new set of inputs
                ELSE
                    FOR i = 0 to n
                       IF Z = 0 AND xi = 1 THEN wi = wi - d
                       IF Z = 1 AND xi = 0 THEN wi = wi + d
                    END_FOR

This algorithm increases weight wi by +d or by -d whenever the neuron produces a correct output. If the pattern the neuron is being trained to recognize is linearly separable, this training algorithm will eventually converge to a correct solution.

Another approach to the training of neurons is to take large steps initially and then fine-tune the weights as the optimum is approached. Widrow and Hoff suggested that the output of a neuron should be compared with the correct value to generate an error function. The size of the error function can be used to determine the amount by which each weight changes. In order to implement this algorithm, the threshold device must implement a soft limiter, rather than a hard limiter with a 0 or a 1 output. The following figure provides the transfer function of one of the most widely used threshold devices—the sigmoid (i.e., “S” shaped) function y = 1/(1 + e-kx). As the figure demonstrates, for large values of input x, the output is either 0 or 1. However, for values of x close to the threshold, the output is approximately linear. The parameter k determines the size of the linear region. As k increases, the sigmoid limiter approaches the hard limiter.

Neurons that employ the Widrow-Hoff algorithm to change their weights, are called ADALINEs (adaptive linear neurons).

             

            k = 2.0                                                              k = 0.5

The sigmoid function y = 1/(1 + e-kx)

The learning or training mechanisms we have just described work for single neurons. Unfortunately, it is much harder to train a multilayer neural network. Indeed, the inability to train multilayer networks dampened the interest in neural networks after the 1960s. In the 1980s David Rumelhart suggested a learning rule for multilayer networks that was to prove very successful. This algorithm employs a mechanism called backpropagation.

A multilayer network consists of groups of neurons arranged in layers. In order to train the network, the weights of the neurons in each layer have to be adjusted in response to an incorrect output. But, where do you begin? The backpropagation algorithm begins by altering the weights of the neurons in the last layer. Then the weights of neurons in the next layer back are altered. This process is repeated until the weights in the neurons in the first layer have been adjusted—hence the term backpropagation.

We have only introduced the basic principles of the neural network in this section. Neural networks are often synthesized in software on conventional digital computers—dedicated special-purpose neural network hardware is not yet widely available. However, neural networks have successfully been used to solve a wide range of problems requiring pattern recognition. Moreover, neural networks are intrinsically robust or fault-tolerant and are able to tolerate errors and noise in the input and errors (e.g., incorrect weights) within individual neurons.