Multiclass Classification and Information Bottleneck — An example using Keras

towards-data-science

This post was originally published by Rakshit Raj at Towards Data Science

Initially, we will train our models for 20 epochs in mini-batches of 512 samples. We will also pass our validation set to the fit method.

Code by rakshitraj hosted on GitHub

Calling the fit method returns a History object. This object contains a member history which stores all data about the training process, including the values of observable or monitored quantities as the epochs proceed. We will save this object since the information it holds will help us to determine the fine-tuning better to apply to the training step.

At the end of the training, we have attained a training accuracy of 95% and validation accuracy of 80.9%

Now that we have trained our network, we will observe its performance metrics stored in the History object.

Calling the fit method returns a History object. This object has an attribute history which is a dictionary containing four entries: one per monitored metric.

Code by rakshitraj hosted on GitHub

history_dict contains values of

  • Training loss
  • Training Accuracy
  • Validation Loss
  • Validation Accuracy

at the end of each epoch.

Let’s use Matplotlib to plot Training and validation losses and Training and Validation Accuracy side by side.

Training and Validation Loss

Loss versus Epochs

Code by rakshitraj hosted on GitHub

Training and Validation Accuracy

Accuracy of model versus Epochs

Code by rakshitraj hosted on GitHub

Overfitting: Trends in Loss and Accuracy Data

We observe that minimum validation loss and maximum validation Accuracy is achieved at around 9–10 epochs. After that, we observe two trends:

  • increase in validation loss and a decrease in training loss
  • decrease in validation accuracy and an increase in training accuracy

This implies that the model is getting better at classifying the sentiment of the training data, but making consistently worse predictions when it encounters new, previously unseen data, which is the hallmark of overfitting. After the 10th epoch, the model begins to fit too closely to the training data.

To address overfitting, we will reduce the number of epochs to 9. These results may vary depending on your machine and due to the very nature of the random assignment of weights that may differ from one model to model.

In our case, we will stop training after nine epochs.

Now that we know that excessive epochs are causing our model to overfit, we will limit the number of epochs and retrain our model from scratch.

Code by rakshitraj hosted on GitHub

Spread the word

This post was originally published by Rakshit Raj at Towards Data Science

Related posts