What is a Neural Network actually? a quick intro
A Neural network is essentially a set of neurons that are connected together. A Neuron takes a numeric value as an input and maps it to one output value. Essentially, a neuron is a “multi-input linear-regression function”. But in a neuron, the output is passed through another function called “the activation function” or “squashing function”. Some commonly used activation functions are:
- Logistic function
- Tanh function
- ReLu function
These functions are called squashing functions because they take any value and map it into a small pre-defined range. They apply nonlinear mappings into a neuron to allow the network to learn more complex, nonlinear, functions.
Each neuron in a neural network does a relatively simple set of operations:
- Multiplies each input by a weight
- Adds together the results of the multiplication
- Passes the result of the multiplication through an activation function
It is important to note that all connections between neurons in a neural network are directed and have a weight. Additionally, neurons also have a level of Bias, which is simply how high the weighted sum needs to be before it’s considered active.
It is helpful to think of the neurons as organized into layers. Activations on one layer, trigger activations on the next layer. The main layers are the input layer, the hidden layer, and the output layer.
Generally speaking, it is possible to create many different types of neural networks by changing the:
- number of layers
- numbers of neurons in each layer
- type of activation functions used
- the direction of connection between layers
- and other parameters
Now, how do you train a neural network?
It essentially involves finding the correct weights for the connections in the network. Now, how do we do that? Well, it helps to think of a single neuron.
If we take a relationship that we already know, for instance, the level of gray skies (10/10) and the level of happiness of a person (2/10) (more gray skies, less happy) and we present the value of gray skies (10/10) to the network, the neuron will predict an output value (say, level of happiness = 5) we can compare the prediction of the neuron (5) to the expected value (2). By subtracting the prediction value (5) from the target value (2) we can measure the neuron’s error in that instance (5-2 = 3).
This very simplified example can be understood as a cost function. And it essentially lets the network know how “bad” it is at predicting the expected outputs. In reality, cost functions usually require adding up the square of the difference between the expected output value and the predicted output value.
With some calculus, we can derive a rule that can update the weights on the connections coming into a neuron to reduce its error, according to the measure of the neurons’ error. The exact definition will depend on the activation function used, but the weight-update rule works more or less like this:
- If the error is 0 we don’t need to change the weights on the inputs (as the prediction was okay)
- If the error is positive, we should increase the neuron’s weight for all the connections where the input is positive and decrease the weight for all the connections where the input is negative
- If the error is negative, we should decrease the weight for all the connections where the input is positive and increase the weight for all the connections where the input is negative
One of the difficulties in training a neural network is that the weight-update rule needs an estimate of the error at each neuron. And even though it is relatively straightforward to calculate the estimate of an error in the output layer of a network, it becomes complicated to calculate the error for the earlier layers.
The standard way to train a network is to use the backpropagation algorithm, which is essentially a supervised machine learning algorithm. We usually begin the training by assigning random weights to each of the connectors in the network. Of course, this leads to terrible predictions on behalf of the network. The algorithm iteratively updates the weight in the network by showing training instances from the data set and updating the network until it behaves as expected.
After each training data point is presented to the network, the algorithm passes the error back through the network starting at the output layer, and at each layer, it calculates the error for the neurons in that layer before sharing the error with the preceding layer.
The algorithm works in the following 4 steps:
- It calculates the error for the neurons in the output layer and uses the weight-update rule to update the weights coming into these neurons
- It shares the error calculated at each neuron with each of the connected neurons in the preceding layer, in proportion to the weight of the connection between the two neurons
- For each neuron in the preceding layer, it calculates the overall error of the network that the neuron is responsible for by adding up the errors that have been passed (backpropagated) to it and uses the result to update the weights on the connections coming into the neuron
- Then, it continues through the rest of the layers in the network by repeating step 2 and 3 until the weights in the neurons between the input layer and the first hidden layer have been updated
In each instance, the weights are scaled to reduce but not to eliminate the error. Because the goal of the training is to enable the network to generalize to new instances that are not in the training set. Instead of simply memorizing the training data.
Essentially, a network learning is a network minimizing a cost function.
Now, then what is a Deep Learning Network?
They are simply neural networks with multiple layers of hidden units. Essentially, they are deep in terms of the amount of hidden layers they have. We can have different numbers of neurons in each layer. It is also possible to have multiple neurons in the output layer, which is useful if the target is a nominal data type or a nominal data type with different levels.
There are more neural network topologies. There are recurrent neural networks (RNNs) to introduce loops in the network. The output of a neuron is fed back into the neuron during the processing of the next input, giving the network “memory” that allows it to process each input in the context of the previous input. This makes these networks work well with sequential data like a language.
Another type is the convoluted neural network (CNNs), originally produced to work with image data. It can work with it by having groups of neurons that share the same set of weights on their inputs. Basically, each group of neurons that share a set of weights learns to identify a particular visual feature, in each neuron in the group tries to detect that feature. The neurons in each group are organized in a way that enables each neuron to analyze a different location in the image, with the group covering the entire image.
Now, deep neural networks are powerful precisely because they can automatically learn useful attributes like the feature detectors in CNNs.
These networks are essentially learning a new representation of the input data that is very good at predicting the target output.
The deep neural network process of mapping inputs to new attributes and feeding these new attributes as inputs to new functions continues all along the network, and as the network gets deeper, it can learn increasingly more complex mappings from raw inputs to new attribute representations.
Apparently, it has been known for a long time that making neural networks deeper allows the network to learn more complex mappings of data. But, the backpropagation algorithm didn’t work too well with deep neural networks. However, researchers have found new types of neurons and adaptations to the backpropagation algorithm that deal with this problem. Another two factors that have helped the explosion in deep learning are that we have more computing power being devoted to training, coupled with a lot of training data.
To conclude, we can say that an entire neural network is nothing more than a function. A very complicated function with lots of inputs and outputs, but a function nonetheless.
Sources
Kelleher, J. D., & Tierney, B. (2018). Data science (MIT Press Essential Knowledge Series). MIT Press.
3Blue1Brown. (2017, October 5). But what is a neural network? | Chapter 1, Deep learning [Video]. YouTube. https://www.youtube.com/watch?v=aircAruvnKk
3Blue1Brown. (2018, April 13). Gradient descent, how neural networks learn | Chapter 2, Deep learning [Video]. YouTube. https://www.youtube.com/watch?v=IHZwWFHWa-w