A practical introduction to Early Stopping in Machine Learning

A Practical Introduction to Early Stopping in Machine Learning

towards-data-science

This post was originally published by B. Chen at Towards Data Science

Next, let’s create X and y. Keras and TensorFlow 2.0 only take in Numpy array as inputs, so we will have to convert DataFrame back to Numpy array.

# Creating X and yX = df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']]
# Convert DataFrame into np array
X = np.asarray(X)y = df[['label_setosa', 'label_versicolor', 'label_virginica']]
# Convert DataFrame into np array
y = np.asarray(y)

Finally, let’s split the dataset into a training set (80%)and a test set (20%) using train_test_split() from sklearn library.

X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.20
)

Great! our data is ready for building a Machine Learning model.

There are 3 ways to create a machine learning model with Keras and TensorFlow 2.0. Since we are building a simple fully connected neural network and for simplicity, let’s use the easiest way: Sequential Model with Sequential().

Let’s go ahead and create a function called create_model() to return a Sequential model.

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Densedef create_model(): 
model = Sequential([
Dense(64, activation='relu', input_shape=(4,)),
Dense(128, activation='relu'),
Dense(128, activation='relu'),
Dense(128, activation='relu'),
Dense(64, activation='relu'),
Dense(64, activation='relu'),
Dense(64, activation='relu'),
Dense(3, activation='softmax')
])
return model

Our model has the following specifications:

  • The first layer (also known as the input layer) has the input_shape to set the input size (4,)
  • The input layer has 64 units, followed by 3 dense layers, each with 128 units. Then there are further 3 dense layers, each with 64 units. All these layers use the ReLU activation function.
  • The output Dense layer has 3 units and the softmax activation function.

Compile and train the model

In order to train a model, we first have to configure our model using compile() and pass the following arguments:

  • Use Adam (adam) optimization algorithm as the optimizer
  • Use categorical cross-entropy loss function (categorical_crossentropy) for our multiple-class classification problem
  • For simplicity, use accuracy as our evaluation metrics to evaluate the model during training and testing.
model.compile(
optimizer='adam', 
loss='categorical_crossentropy', 
metrics=['accuracy']
)

After that, we can call model.fit() to fit our model to the training data.

history = model.fit(
X_train, 
y_train, 
epochs=200, 
validation_split=0.25, 
batch_size=40, 
verbose=2
)

If all runs smoothly, we should get an output like below

Train on 84 samples, validate on 28 samples
Epoch 1/200
84/84 - 1s - loss: 1.0901 - accuracy: 0.3214 - val_loss: 1.0210 - val_accuracy: 0.7143
Epoch 2/200
84/84 - 0s - loss: 1.0163 - accuracy: 0.6905 - val_loss: 0.9427 - val_accuracy: 0.7143
......
Epoch 200/200
84/84 - 0s - loss: 0.5269 - accuracy: 0.8690 - val_loss: 0.4781 - val_accuracy: 0.8929

Plot the learning curves

Finally, let’s plot the loss vs. epochs graph on the training and validation sets.

It is preferable to create a small function for plotting metrics. Let’s go ahead and create a function plot_metric().

%matplotlib inline
%config InlineBackend.figure_format = 'svg'def plot_metric(history, metric):
train_metrics = history.history[metric]
val_metrics = history.history['val_'+metric]
epochs = range(1, len(train_metrics) + 1)
plt.plot(epochs, train_metrics)
plt.plot(epochs, val_metrics)
plt.title('Training and validation '+ metric)
plt.xlabel("Epochs")
plt.ylabel(metric)
plt.legend(["train_"+metric, 'val_'+metric])
plt.show()

By running plot_metric(history, 'loss') to get a picture of loss progress.

From the above graph, we can see that the model has overfitted the training data, so it outperforms the validation set.

Adding Early Stopping

The Keras module contains a built-in callback designed for Early Stopping [2].

First, let’s import EarlyStopping callback and create an early stopping object early_stopping .

from tensorflow.keras.callbacks import EarlyStoppingearly_stopping = EarlyStopping()

EarlyStopping() has a few options and by default:

  • monitor='val_loss': to use validation loss as performance measure to terminate the training.
  • patience=0: is the number of epochs with no improvement. The value 0 means the training is terminated as soon as the performance measure gets worse from one epoch to the next.

Next, we just need to pass the callback object to model.fit() method.

history = model.fit(
X_train, 
y_train, 
epochs=200, 
validation_split=0.25, 
batch_size=40, 
verbose=2,
callbacks=[early_stopping]
)

You can see that early_stopping get passed in a list to the callbacks argument. It is a list because in practice we might be passing a number of callbacks for performing different tasks, for example debugging and learning rate scheduler.

By executing the statement, you should get an output like below:

Note: your output can be different due to the different weight initialization.

The training gets terminated at Epoch 6 due to the increase of val_loss value and that is exactly the conditions monitor='val_loss' and patience=0.

It’s often more convenient to look at a plot, let’s run plot_metric(history, 'loss') to get a clear picture. In the below graph, validation loss is shown in orange and it’s clear that validation error increases at Epoch 6.

Customizing Early Stopping

Apart from the options monitor and patience we mentioned early, the other 2 options min_delta and mode are likely to be used quite often.

  • monitor='val_loss': to use validation loss as performance measure to terminate the training.
  • patience=0: is the number of epochs with no improvement. The value 0 means the training is terminated as soon as the performance measure gets worse from one epoch to the next.
  • min_delta: Minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement.
  • mode='auto': Should be one of auto, min or max. In 'min' mode, training will stop when the quantity monitored has stopped decreasing; in 'max' mode it will stop when the quantity monitored has stopped increasing; in 'auto' mode, the direction is automatically inferred from the name of the monitored quantity.

And here is an example of a customized early stopping:

custom_early_stopping = EarlyStopping(
monitor='val_accuracy', 
patience=8, 
min_delta=0.001, 
mode='max'
)

monitor='val_accuracy' to use validation accuracy as performance measure to terminate the training. patience=8 means the training is terminated as soon as 8 epochs with no improvement. min_delta=0.001 means the validation accuracy has to improve by at least 0.001 for it to count as an improvement. mode='max' means it will stop when the quantity monitored has stopped increasing.

Let’s go ahead and run it with the customized early stopping.

history = model.fit(
X_train, 
y_train, 
epochs=200, 
validation_split=0.25, 
batch_size=40, 
verbose=2,
callbacks=[custom_early_stopping]
)

This time, the training gets terminated at Epoch 9 as there are 8 epochs with no improvement on validation accuracy (It has to be ≥ 0.001 to count as an improvement). For a clear picture, let’s look at a plot representation of accuracy by running plot_metric(history, 'accuracy'). In the below graph, validation accuracy is shown in orange and it’s clear that validation accuracy hasn’t got any improvement.

Thanks for reading.

Spread the word

This post was originally published by B. Chen at Towards Data Science

Related posts