# A practical introduction to Early Stopping in Machine Learning This post was originally published by B. Chen at Towards Data Science

Next, let’s create `X` and `y`. Keras and TensorFlow 2.0 only take in Numpy array as inputs, so we will have to convert DataFrame back to Numpy array.

```# Creating X and yX = df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']]
# Convert DataFrame into np array
X = np.asarray(X)y = df[['label_setosa', 'label_versicolor', 'label_virginica']]
# Convert DataFrame into np array
y = np.asarray(y)
```

Finally, let’s split the dataset into a training set (80%)and a test set (20%) using `train_test_split()` from sklearn library.

```X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.20
)
```

Great! our data is ready for building a Machine Learning model.

There are 3 ways to create a machine learning model with Keras and TensorFlow 2.0. Since we are building a simple fully connected neural network and for simplicity, let’s use the easiest way: Sequential Model with `Sequential()`.

Let’s go ahead and create a function called `create_model()` to return a Sequential model.

```from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Densedef create_model():
model = Sequential([
Dense(64, activation='relu', input_shape=(4,)),
Dense(128, activation='relu'),
Dense(128, activation='relu'),
Dense(128, activation='relu'),
Dense(64, activation='relu'),
Dense(64, activation='relu'),
Dense(64, activation='relu'),
Dense(3, activation='softmax')
])
return model
```

Our model has the following specifications:

• The first layer (also known as the input layer) has the `input_shape` to set the input size `(4,)`
• The input layer has 64 units, followed by 3 dense layers, each with 128 units. Then there are further 3 dense layers, each with 64 units. All these layers use the ReLU activation function.
• The output Dense layer has 3 units and the softmax activation function.

### Compile and train the model

In order to train a model, we first have to configure our model using `compile()` and pass the following arguments:

• Use Adam (`adam`) optimization algorithm as the optimizer
• Use categorical cross-entropy loss function (`categorical_crossentropy`) for our multiple-class classification problem
• For simplicity, use `accuracy` as our evaluation metrics to evaluate the model during training and testing.
```model.compile(
loss='categorical_crossentropy',
metrics=['accuracy']
)
```

After that, we can call `model.fit()` to fit our model to the training data.

```history = model.fit(
X_train,
y_train,
epochs=200,
validation_split=0.25,
batch_size=40,
verbose=2
)
```

If all runs smoothly, we should get an output like below

```Train on 84 samples, validate on 28 samples
Epoch 1/200
84/84 - 1s - loss: 1.0901 - accuracy: 0.3214 - val_loss: 1.0210 - val_accuracy: 0.7143
Epoch 2/200
84/84 - 0s - loss: 1.0163 - accuracy: 0.6905 - val_loss: 0.9427 - val_accuracy: 0.7143
......
Epoch 200/200
84/84 - 0s - loss: 0.5269 - accuracy: 0.8690 - val_loss: 0.4781 - val_accuracy: 0.8929
```

### Plot the learning curves

Finally, let’s plot the loss vs. epochs graph on the training and validation sets.

It is preferable to create a small function for plotting metrics. Let’s go ahead and create a function `plot_metric()`.

```%matplotlib inline
%config InlineBackend.figure_format = 'svg'def plot_metric(history, metric):
train_metrics = history.history[metric]
val_metrics = history.history['val_'+metric]
epochs = range(1, len(train_metrics) + 1)
plt.plot(epochs, train_metrics)
plt.plot(epochs, val_metrics)
plt.title('Training and validation '+ metric)
plt.xlabel("Epochs")
plt.ylabel(metric)
plt.legend(["train_"+metric, 'val_'+metric])
plt.show()
```

By running `plot_metric(history, 'loss')` to get a picture of loss progress.

From the above graph, we can see that the model has overfitted the training data, so it outperforms the validation set.

The Keras module contains a built-in callback designed for Early Stopping .

First, let’s import `EarlyStopping` callback and create an early stopping object `early_stopping` .

```from tensorflow.keras.callbacks import EarlyStoppingearly_stopping = EarlyStopping()
```

`EarlyStopping()` has a few options and by default:

• `monitor='val_loss'`: to use validation loss as performance measure to terminate the training.
• `patience=0`: is the number of epochs with no improvement. The value `0` means the training is terminated as soon as the performance measure gets worse from one epoch to the next.

Next, we just need to pass the callback object to `model.fit()` method.

```history = model.fit(
X_train,
y_train,
epochs=200,
validation_split=0.25,
batch_size=40,
verbose=2,
callbacks=[early_stopping]
)
```

You can see that `early_stopping` get passed in a list to the `callbacks` argument. It is a list because in practice we might be passing a number of callbacks for performing different tasks, for example debugging and learning rate scheduler.

By executing the statement, you should get an output like below:

Note: your output can be different due to the different weight initialization.

The training gets terminated at Epoch 6 due to the increase of `val_loss` value and that is exactly the conditions `monitor='val_loss'` and `patience=0`.

It’s often more convenient to look at a plot, let’s run `plot_metric(history, 'loss')` to get a clear picture. In the below graph, validation loss is shown in orange and it’s clear that validation error increases at Epoch 6.

### Customizing Early Stopping

Apart from the options `monitor` and `patience` we mentioned early, the other 2 options `min_delta` and `mode` are likely to be used quite often.

• `monitor='val_loss'`: to use validation loss as performance measure to terminate the training.
• `patience=0`: is the number of epochs with no improvement. The value `0` means the training is terminated as soon as the performance measure gets worse from one epoch to the next.
• `min_delta`: Minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less than `min_delta`, will count as no improvement.
• `mode='auto'`: Should be one of `auto`, `min` or `max`. In `'min'` mode, training will stop when the quantity monitored has stopped decreasing; in `'max'` mode it will stop when the quantity monitored has stopped increasing; in `'auto'` mode, the direction is automatically inferred from the name of the monitored quantity.

And here is an example of a customized early stopping:

```custom_early_stopping = EarlyStopping(
monitor='val_accuracy',
patience=8,
min_delta=0.001,
mode='max'
)
```

`monitor='val_accuracy'` to use validation accuracy as performance measure to terminate the training. `patience=8` means the training is terminated as soon as 8 epochs with no improvement. `min_delta=0.001` means the validation accuracy has to improve by at least 0.001 for it to count as an improvement. `mode='max'` means it will stop when the quantity monitored has stopped increasing.

Let’s go ahead and run it with the customized early stopping.

```history = model.fit(
X_train,
y_train,
epochs=200,
validation_split=0.25,
batch_size=40,
verbose=2,
callbacks=[custom_early_stopping]
)
```

This time, the training gets terminated at Epoch 9 as there are 8 epochs with no improvement on validation accuracy (It has to be ≥ 0.001 to count as an improvement). For a clear picture, let’s look at a plot representation of accuracy by running `plot_metric(history, 'accuracy')`. In the below graph, validation accuracy is shown in orange and it’s clear that validation accuracy hasn’t got any improvement.