This post was originally published by Łukasz Gebel at Medium [AI]
One time when bias is not something bad
The term “bias” has a lot of pejorative connotations. When we think of it, we see unfair treatment, prejudice, discrimination, or favoring someone or something. And it’s natural. We live in a world where, unfortunately, all of these things take place. However, words have many meanings depending on the context, and surprisingly even bias can be something helpful.
Machine Learning is a domain where we can meet bias in a couple of contexts. Let’s go through these meanings and find the one which makes Neural Networks useful.
For starters let’s discuss the most general context of bias. It’s the bias inside the data used to train models. Every time we feed our Neural Network or other a model with data, it determines the model’s behavior. We cannot expect any fair or neutral treatment from algorithms that were built from biased data.
One of the well-known examples of such biased data was Amazon recruiting tool. It was supposed to do some pre-filtering of resumes so recruiters could choose from the most promising ones. And it was great in filtering out resumes! Especially in filtering out female resumes… Unfortunately, the wonderful, AI-powered solution was biased. The system was favoring male candidates as engineers mainly used male resumes in the training process .
Another example of a biased model is Tay. Tay was a chatbot released by Microsoft. It was supposed to carry conversations by posting Tweets. Tay was also able to learn from content posted by users. And that doomed Tay. It learned how to be offensive and aggressive. Tay became biased and switched off. Actually, it was made biased by irresponsible users who spoiled it with abusive posts .
So biased data is definitely a negative phenomenon. Being responsible and aware of it is an important part of building models. When you create an artificial brain you must be careful what you put inside it. Otherwise, you may bring to life a monster.
Let’s take a look at the second context of bias. When we train and test our Neural Networks or other Machine Learning models, we can observe two main trends:
- Model overfits to data.
- Model cannon learn patterns from data.
Overfitting is like learning by heart. Your model did remember a vast majority of your training data, however, when something new comes up it doesn’t work correctly. You can think of it as it’s good at answering questions it’s already been asked, but when you ask something out of the box the model fails.
Such an issue can be nicely visualized if we plot validation and training set errors depending on the training set size. Then we can use learning curves to alert.
If we get a relatively low error for the training set, but the error is high for validation set it means we have a high variance model. A big gap between validation and training set error values visible in the plot is specific to overfitting [3,4].
Let’s get back to the bias. When we speak about bias in the context of a model’s performance we can say that model has a high bias. Basically, it means that the model doesn’t do well during training, and what is expected, during validation. It behaves like a student that cannot grasp the idea we’re trying to teach them. There might be something wrong with the model or with our data [3,4].
When we take a look at learning curves plots and we see that the error is high for training set as well as for validation set, it may mean your model has high-bias. The gap between training and validation set curves will be small, as the model performs poorly in general. It lacks the ability to generalize and find patterns in data.
High bias is also something bad. Adding more data probably won’t help much. However, you can try to add extra features to data set samples. This additional information may give the model more clues while searching for patterns.
You may also need to change a model. Sometimes models are too rigid to learn from data. Think of non-linearly distributed data points, which look like a parabola. If you will try to fit a simple line to this parabola your model will fail due to high bias. In such a case, a more flexible model (like a quadratic equation), which has more parameters is needed.
Let’s analyze the third context, the bias in a particular Neural Network. In literature, we can find the term “bias neuron” . Why we need this special kind of neurons? Take a look at the picture:
This simple neural network consists of 3 types of neurons. Input neuron simply passes feature (x₁) from the data set. Bias neuron mimics additional feature, let’s call it x₀. This additional input is always equal to 1. Finally, there is an output neuron, which is a full-fledged artificial neuron that takes inputs, processes them, and generates the output of the whole network.
Now let’s have a detailed look at our output neuron:
How does it work? We take inputs (x₀, x₁) and multiply them by corresponding weights (w₀, w₁). For the sake of simplicity the output neuron returns the sum os such inputs-weights products:
In our case i=1 and x₀=1. As a result, such Neural Network is actually a linear regression model:
Now the crucial part. To understand why we need bias neuron, let’s see what happens when there is no bias input at all. It means that there will be only one input x₁ and nothing more:
Such a model is not very flexible. It means that the line needs to go through the point (0, 0). A Slope of the line may change, however, it is tied to the coordinate system’s origin. Take a look at this visualization:
To gain more flexibility we need to get back to the original model with bias. It will equip us with weight w₀, not tied to any input. This weight allows the model to move up and down if it’s needed to fit the data.
That’s the reason why we need bias neurons in neural networks. Without these spare bias weights, our model has quite limited “movement” while searching through solution space.
To give you one more example take a look at a neuron that uses non-linear activation function, like sigmoid:
In this scenario bias also gives our activation function the possibility “to move”. Thanks to it, sigmoid can be shifted to the left (negative bias) or to the right (positive bias). This situation is visualized in the following diagram containing sigmoid plots for different bias values:
Finally, after going through concepts of biased data and models with high bias, we reached the positive context of word bias. We understand why bias neurons are crucial elements of Neural Networks, but there is one last thing that raises a question. Why something with positive effects was named using a negative word such as bias?
That’s because bias weight is not tied to any element of input data. However, it is used to make decisions about it. So bias neuron or bias weight reflects our beliefs or prejudice about data set examples. It’s like adjusting our thoughts about someone or something using our experience instead of facts. Quite biased, isn’t it?
- Joel Grus, Data Science from Scratch, 2nd Edition, ISBN: 978–1492041139.
- Aurélien Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, 2nd edition, ISBN: 978–1492032649.
This post was originally published by Łukasz Gebel at Medium [AI]