Scaling your Time Series Forecasting project

towards-data-science

This post was originally published by Zion Badash at Towards Data Science

Our Design — Based on separate building blocks working together

  • Feature Store — Uses the data from the Data Sources to create time-series features. Given a list of requested features it will return a time-series dataset.
  • Models — This is included for clarity, references open source models for time-series.
  • Models Library — Wrappers for Models, with our choice of hyper-params and compatible to datasets originating from the Feature Store.
  • Forecaster — Takes a dataset from the Feature Store and a model from the Model Library and creates a time-series forecaster for our required task.
  • Airflow DAG — The DAG plays several roles in our system, in this case (more will follow) we’re referring to it scheduling the Forecaster’s run (daily in our case)

The data is always the most important part of every data science project. When forecasting a time-series originating from your business, naturally you would want to add ‘inside’ information available to you. When you want to add features to time-series forecasting you have to make sure you know what the future values of those features are (or you’ll have to forecast them as well which adds a lot of complexity). That being said, those features might change over time and might have mistakes of different kinds that will drastically affect your results.

  • The Feature Store validates the features follow our requirements for the dataset.
  • The Airflow DAG validates prior to running the Forecasters that the tables were updated and previous processes completed successfully.

We can divide business changes to two kinds:

  • External Changes — For example a new competitor or even something as far fetched as a worldwide pandemic.
  1. Understanding the change — Is there a trend to the change? Can we quantify the impact? Where is it coming from? This is very different between internal and external changes and it’s always a very challenging task. Sometimes even understanding that we don’t quite understand the change could be our conclusion.
  2. Helping the model grasp the change — This strongly depends on our conclusions from the previous stage, if we have a good understanding of the change and its future behaviour we might introduce it to the model through a feature for example. We might also decide to see if the model will pick up on the change by itself.

Photo by Michał Parzuchowski on Unsplash

If it ain’t broken, don’t fix it?

The Forecaster is working good, but you start thinking of a new feature that should be very relevant — should you add it?

If it starts to squeak, then fix it?

The Forecaster’s error is slightly increasing, you’re going back to the previous section and marking a few possible changes. But you’re not sure how this changes in Forecaster would turn out — Should you add them?

When is change a good change?

Changing the Forecaster or adding a feature adds complexity and possible future technical debt (although our system is designed to reduce that). What is the impact of this new feature? Would this impact increase over time? Is the improvement significant enough to justify the cost of adding a new feature or changing something in the forecaster?

Following up on the change and rewriting history

Let’s say we’ve added the new change, now is the time to go back to monitoring the change and seeing its effect. But in a sense, we’re now comparing two different Forecasters (Apples vs. Oranges?). On the one hand, we want to see our new Forecaster vs. the old, but on the other hand, we want to see our new Forecaster compared to the new Forecaster as if it ran, for example, a week ago (hereby apples vs. apples).

  • Updated History the history as if our current model with the current features and data was with us all along (we’re running all the previous dates for that).
Spread the word

This post was originally published by Zion Badash at Towards Data Science

Related posts