The scourge of analytical variability in AI systems

Analytical Variability


This post was originally published by Deepak Karunakaran at Towards Data Science

In the ICT industry, engineers are increasingly moving towards building AI systems to add value to customers by solving existing problems and making processes more efficient. With the seemingly successful application of deep learning, experts are opining, with conviction, that the AI winter has finally come to an end.

But, there are at least three major issues (as also reported by various experts) which we are dealing with while building the AI systems.

Building over Blackbox Models

ML engineers work at various abstraction layers. More often than not, the underlying machine learning algorithm is a black box to the engineer who intends to integrate it with the overall system. The use of a particular model, generally, comes with a number of assumptions which are seldom verified.

ML engineers tend to use a lot of open-source packages. Generally, in the case of the python environment, a particular package comes with a slew of dependencies. If the intended outcome is achieved, fine, else another package with another slew of dependencies is tried out. There is no investment in understanding why something worked or why it has not. The packages do not have standards and more often than not there is no rigorous testing and evaluation of these models. This kind of black-box approach is detrimental to ensure the reliability of AI systems. Moreover, when correctness is itself under question, efficiency is forced to take a backseat.

But the problem is not just about the best practices to be followed by an engineer. The deep learning models are currently not transparent in their working and their underlying characteristics are still a mystery to researchers. In fact, the empirical results contradict the existing theories from statistics and optimization. In fact, it has alleged that deep learning researchers behave like medieval alchemists trying to create magic akin to them trying to make gold.

This lack of understanding is also a contributor, in part, to another problem which is the lack of reproducibility.

Lack of Reproducibility

It is expected that an algorithm published by a reputed researcher would produce the same results when correctly reimplemented, independently by someone else (either a human or a machine).

Due to poor academic practices partly arising due to the hype around AI many researchers have been taking shortcuts in developing algorithms. For instance, recently it was shown that many deep learning models which were expected to outperform the existing state-of-the-art either failed to do so convincingly or with a simple heuristic applied, the traditional ML algorithm it could do the same on the prescribed data set. Another example of the malpractice is reporting only the best results from multiple runs of the algorithm while not disclosing the details of poor results.

Adding to that the lack of understanding of deep learning models, as explained above, researchers fail to explain the part of their algorithm which could be attributed to the improvement in the results. This makes it difficult for another researcher to analyze why the results vary on re-implementation.

The Hidden Technical Debt

When building AI systems, the machine learning component is minuscule while the ‘plumbing’ around it consumes the most of the effort. Scully et al, from Google, presented their work in 2015 highlighting the risk factor in building ML systems which could lead to high maintenance cost in future due to factors like the undeclared data dependencies, entanglement, software anti-patterns etc.

As an example, consider a scenario where data is extracted in the form of logs for a particular purpose. Another group builds an ML system (or multiple systems interdependent on each other) on top of it assuming that the data will maintain its consistency. At some point in time, if the data capture methodology or the nature of data itself is altered as suitable to the original purpose it will lead to cascading failures in the hierarchy of dependent systems.

Spread the word

This post was originally published by Deepak Karunakaran at Towards Data Science

Related posts