How to build a simple neural network in 9 lines of Python code
As part of my quest to learn about AI, I set myself the goal of building a simple neural network in Python. To ensure I truly understand it, I had to build it from scratch without using a neural network library. Thanks to an excellent blog post by Andrew Trask I achieved my goal. Here it is in just 9 lines of code:
from numpy import exp, array, random, dot training_set_inputs = array([[0, 0, 1], [1, 1, 1], [1, 0, 1], [0, 1, 1]]) training_set_outputs = array([[0, 1, 1, 0]]).T random.seed(1) synaptic_weights = 2 * random.random((3, 1)) - 1 for iteration in xrange(10000): output = 1 / (1 + exp(-(dot(training_set_inputs, synaptic_weights)))) synaptic_weights += dot(training_set_inputs.T, (training_set_outputs - output) * output * (1 - output)) print 1 / (1 + exp(-(dot(array([1, 0, 0]), synaptic_weights))))
In this blog post, I’ll explain how I did it, so you can build your own. I’ll also provide a longer, but more beautiful version of the source code.
But first, what is a neural network? The human brain consists of 100 billion cells called neurons, connected together by synapses. If sufficient synaptic inputs to a neuron fire, that neuron will also fire. We call this process “thinking”.
We can model this process by creating a neural network on a computer. It’s not necessary to model the biological complexity of the human brain at a molecular level, just its higher level rules. We use a mathematical technique called matrices, which are grids of numbers. To make it really simple, we will just model a single neuron, with three inputs and one output.
We’re going to train the neuron to solve the problem below. The first four examples are called a training set. Can you work out the pattern? Should the ‘?’ be 0 or 1?
You might have noticed, that the output is always equal to the value of the leftmost input column. Therefore the answer is the ‘?’ should be 1.
But how do we teach our neuron to answer the question correctly? We will give each input a weight, which can be a positive or negative number. An input with a large positive weight or a large negative weight, will have a strong effect on the neuron’s output. Before we start, we set each weight to a random number. Then we begin the training process:
- Take the inputs from a training set example, adjust them by the weights, and pass them through a special formula to calculate the neuron’s output.
- Calculate the error, which is the difference between the neuron’s output and the desired output in the training set example.
- Depending on the direction of the error, adjust the weights slightly.
- Repeat this process 10, 000 times.
Eventually the weights of the neuron will reach an optimum for the training set. If we allow the neuron to think about a new situation, that follows the same pattern, it should make a good prediction.
This process is called back propagation.
Formula for calculating the neuron’s output
You might be wondering, what is the special formula for calculating the neuron’s output? First we take the weighted sum of the neuron’s inputs, which is:
Next we normalise this, so the result is between 0 and 1. For this, we use a mathematically convenient function, called the Sigmoid function:
If plotted on a graph, the Sigmoid function draws an S shaped curve.
So by substituting the first equation into the second, the final formula for the output of the neuron is:
You might have noticed that we’re not using a minimum firing threshold, to keep things simple.
Formula for adjusting the weights
During the training cycle (Diagram 3), we adjust the weights. But how much do we adjust the weights by? We can use the “Error Weighted Derivative” formula:
Why this formula? First we want to make the adjustment proportional to the size of the error. Secondly, we multiply by the input, which is either a 0 or a 1. If the input is 0, the weight isn’t adjusted. Finally, we multiply by the gradient of the Sigmoid curve (Diagram 4). To understand this last one, consider that:
- We used the Sigmoid curve to calculate the output of the neuron.
- If the output is a large positive or negative number, it signifies the neuron was quite confident one way or another.
- From Diagram 4, we can see that at large numbers, the Sigmoid curve has a shallow gradient.
- If the neuron is confident that the existing weight is correct, it doesn’t want to adjust it very much. Multiplying by the Sigmoid curve gradient achieves this.
The gradient of the Sigmoid curve, can be found by taking the derivative:
So by substituting the second equation into the first equation, the final formula for adjusting the weights is:
There are alternative formulae, which would allow the neuron to learn more quickly, but this one has the advantage of being fairly simple.
Constructing the Python code
Although we won’t use a neural network library, we will import four methods from a Python mathematics library called numpy. These are:
- exp — the natural exponential
- array — creates a matrix
- dot — multiplies matrices
- random — gives us random numbers
For example we can use the array() method to represent the training set shown earlier:
training_set_inputs = array([[0, 0, 1], [1, 1, 1], [1, 0, 1], [0, 1, 1]]) training_set_outputs = array([[0, 1, 1, 0]]).T
The ‘.T’ function, transposes the matrix from horizontal to vertical. So the computer is storing the numbers like this.
Ok. I think we’re ready for the more beautiful version of the source code. Once I’ve given it to you, I’ll conclude with some final thoughts.
I have added comments to my source code to explain everything, line by line. Note that in each iteration we process the entire training set simultaneously. Therefore our variables are matrices, which are grids of numbers. Here is a complete working example written in Python:
from numpy import exp, array, random, dot class NeuralNetwork(): def __init__(self): # Seed the random number generator, so it generates the same numbers # every time the program runs. random.seed(1) # We model a single neuron, with 3 input connections and 1 output connection. # We assign random weights to a 3 x 1 matrix, with values in the range -1 to 1 # and mean 0. self.synaptic_weights = 2 * random.random((3, 1)) - 1 # The Sigmoid function, which describes an S shaped curve. # We pass the weighted sum of the inputs through this function to # normalise them between 0 and 1. def __sigmoid(self, x): return 1 / (1 + exp(-x)) # The derivative of the Sigmoid function. # This is the gradient of the Sigmoid curve. # It indicates how confident we are about the existing weight. def __sigmoid_derivative(self, x): return x * (1 - x) # We train the neural network through a process of trial and error. # Adjusting the synaptic weights each time. def train(self, training_set_inputs, training_set_outputs, number_of_training_iterations): for iteration in xrange(number_of_training_iterations): # Pass the training set through our neural network (a single neuron). output = self.think(training_set_inputs) # Calculate the error (The difference between the desired output # and the predicted output). error = training_set_outputs - output # Multiply the error by the input and again by the gradient of the Sigmoid curve. # This means less confident weights are adjusted more. # This means inputs, which are zero, do not cause changes to the weights. adjustment = dot(training_set_inputs.T, error * self.__sigmoid_derivative(output)) # Adjust the weights. self.synaptic_weights += adjustment # The neural network thinks. def think(self, inputs): # Pass inputs through our neural network (our single neuron). return self.__sigmoid(dot(inputs, self.synaptic_weights)) if __name__ == "__main__": #Intialise a single neuron neural network. neural_network = NeuralNetwork() print "Random starting synaptic weights: " print neural_network.synaptic_weights # The training set. We have 4 examples, each consisting of 3 input values # and 1 output value. training_set_inputs = array([[0, 0, 1], [1, 1, 1], [1, 0, 1], [0, 1, 1]]) training_set_outputs = array([[0, 1, 1, 0]]).T # Train the neural network using a training set. # Do it 10,000 times and make small adjustments each time. neural_network.train(training_set_inputs, training_set_outputs, 10000) print "New synaptic weights after training: " print neural_network.synaptic_weights # Test the neural network with a new situation. print "Considering new situation [1, 0, 0] -> ?: " print neural_network.think(array([1, 0, 0]))
Also available here: https://github.com/miloharper/simple-neural-network
Try running the neural network using this Terminal command:
You should get a result that looks like:
Random starting synaptic weights: [[-0.16595599] [ 0.44064899] [-0.99977125]] New synaptic weights after training: [[ 9.67299303] [-0.2078435 ] [-4.62963669]] Considering new situation [1, 0, 0] -> ?: [ 0.99993704]
We did it! We built a simple neural network using Python!
First the neural network assigned itself random weights, then trained itself using the training set. Then it considered a new situation [1, 0, 0] and predicted 0.99993704. The correct answer was 1. So very close!
Traditional computer programs normally can’t learn. What’s amazing about neural networks is that they can learn, adapt and respond to new situations. Just like the human mind.