Knowing when to stop


This post was originally published by Richard Farnworth at Towards Data Science

At what point should you stop chasing percentage points and label your model “done”?

Image for postPhoto by John Matychuk on Unsplash

In predictive analytics, it can be a tricky thing to know when to stop.

Unlike many of life’s activities, there’s no definitive finishing line, after which you can say “tick, I’m done”. The possibility always remains that a little more work can yield an improvement to your model. With so many variables to tweak, it’s easy to end up obsessing over tenths of a percentage point, pouring huge amounts of effort into the details before looking up and wondering “Where did the time go?”.

Iterating your model, via feature engineering, model selection and hyper-parameter tuning is a key skill of any data scientist. But knowing when to stop is something that rarely gets addressed, and can vastly alter the cost of model development and the ROI of a Data Science project.

I’m not talking about over vs under fitting here. Over-fitting is where your model is too closely fit to your training data and can be detected by comparing the training set error with a validation set error. There are many great tutorials on Medium and elsewhere which explain all this in much more detail.

I’m referring to the time you spend working on the entire modelling pipeline, and how you quantify the rewards and justify the cost.


Some strategies that can help you decide when to wrap things up might be:

  • Set a deadline — Parkinson’s law states that “work expands so as to fill the time available for its completion”. Having an open ended time-frame invites you to procrastinate by spending time on things that ultimately don’t provide much value to the end result. Setting yourself a deadline is a good way of keeping costs low and predictable by forcing you to prioritise effectively. The down-side is of course that if you set your deadline too aggressively, you may deliver a model that is of poor quality.
  • Acceptable error rate — You could decide beforehand on an acceptable error rate and stop once you reach it. For example, a self-driving car might try to identify cyclists with a 99.99% level of accuracy. The difficulty of this approach is that before you start experimenting, it’s very hard to set expectations as to how accurate your model could be. Your desired accuracy rate might be impossible, given the level of irreducible error. On the other hand, you might stop prematurely whilst there is still room to easily improve your model.
  • Value gradient method — By plotting the real-world cost of error in your model, vs the effort required to enhance it, you gain an understanding of what the return on investment is for each incremental improvement. This allows you to keep developing your model, only stopping when the predicted value of additional tuning fall below the value of your time.

The law of diminishing returns

As you invest time into tweaking your model, you may find that your progress is fast in the beginning, but quickly plateaus. You’ll likely perform the most obvious improvements first, but as time goes by you’ll end up working harder and harder for smaller gains. Within the data itself, the balance between reducible and irreducible error puts an upper limit on the level of accuracy that your model can achieve.

In a learning exercise or a Kaggle competition, you can iterate to your heart’s content, chasing those incremental improvements further and further down the decimal places.

However, for a commercial project, the cost of tuning this model climbs linearly with respect to the amount of time you have invested. This means there comes a point where scraping out an extra 0.1% will not be worth the investment.

This varies from project to project. If you’re working with supermarket data, given the huge number of purchases on a daily basis, an additional hundredth of a percentage point of accuracy might be worth a lot of money. This puts a strong ROI on continuing efforts to improve your model. But for projects of more modest scale, you might have to draw the line a bit sooner.

The real-world cost of model error

When tuning a model, the values you’re likely to be paying attention to are statistical in nature. MSE, % accuracy, R² and AIC are defined by their mathematical formulae, and are indifferent to the real-world problem you’re attempting to solve.

Rather than solely considering statistical measures of accuracy and error, these should be converted into something that can be weighed against the time investment you’re making, i.e. money.

Let’s say we run an ice-cream kiosk, and we’re trying to predict how many ice-creams we’ll sell on a daily basis, using variables like the weather, day of week, time of year etc.

No model we create will be perfect, and for any given day it will usually either;

  • overestimate — meaning we buy more ingredients than we need for the number of ice-creams sold.
  • underestimate — meaning we run out of stock and lose out on potential business.

Both of these types of error introduce a monetary cost to the business. If we run out of stock at midday, we’ve lost the margin on half a day’s sales. And if we overestimate, we may end up spending money on ingredients that end up being thrown away.

We can introduce business rules on top of our model to help reduce some of this loss. The cost of losing 1 ice-cream’s worth of sales is likely higher than the cost of throwing away 1 ice-cream’s worth of out-of-date milk (given we’re hopefully making a profit). Therefore, we’ll want to be biased in favour of over-stocking, for example by holding 20% more ingredients than suggested by the model’s prediction. This will greatly reduce the frequency and cost of stock outages, at the expense of having to throw out a few bottles of out-of-date milk.

Optimising this 20% rule, falls under the umbrella of Prescriptive Analytics. Using the training set, we can tweak this rule, until the average estimated real-world cost of the error in the model is at its lowest.

The value gradient method

Now that we have an estimated real-world cost for the accuracy of the model, we gain an idea of what the time we’re investing in the model is worth. With each iteration, we subtract the real-world cost from that of the previous version, to work out the value added by our extra effort. From there, we can extrapolate to a window of ROI.

For example, your validation set may contain 1,000 rows and your latest model saved $40 vs the previous iteration. If you are expecting to collect 100,000 data-points per year, then you can multiply the added value by 100 to get an annual rate. Therefore, the work you put in to produce the latest version of the model gives a return of $4,000 per year.

Comparing this to the cost of our time gives us an expected return on investment. E.g. if the above enhancement required a day’s work for someone earning $400 per day, it pays for itself very quickly.

However, as the law of diminishing returns eats away at our rate of improvement, our margin will begin to fall. When it approaches zero, it’s time to take what we have and move on to the next stage in our project.

Of course, this is an inexact science. It assumes that improvements to our model will occur in a smooth and predictable way and that future gains will smaller than previous improvements. Whenever you call it a day, there will be the possibility that a significant breakthrough lies just around the corner.

But it’s always a good idea to keep a commercial eye on the time you’re investing in a model, allowing you to do more valuable work by keeping costs down and freeing up time to spend on the most important things.

Spread the word

This post was originally published by Richard Farnworth at Towards Data Science

Related posts