Why 87% of Machine Learning projects fail

Hacker NoonThis post was originally published by Prajeen Vijayan at Hacker Noon

This article will serve as a lesson on the shocking reasons for your AI adoption disaster. We see news about machine learning everywhere. Indeed, there is lot of potential in machine learning. According to Gartner’s predictions, “Through 2020, 80% of AI projects will remain alchemy, run by wizards whose talents will not scale in the organization” and Transform 2019 of VentureBeat predicted that 87% of AI projects will never make it into production.

Why is that? Why do so many projects fail? Here are 10 reasons why.

1. Not Enough Expertise

One of the reasons is that the technology is still new to a large audience. In addition, most of the organizations are still unfamiliar with the software tools and the required hardware.

It seems that today, anyone who has worked in data analytics or software development who has done some sample data science projects are labeling themselves as data scientists after taking a short course online.

The fact is that experienced data scientists are needed to handle most of
the machine learning and AI projects especially when it comes to defining the success criteria, final deployment and continuous monitoring of the model.

2. Disconnect Between Data Science and Traditional Software Development

Disconnect between data science and traditional software development is another major factor. Traditional software development tends to be more
predictable and measurable.

However, Data science is still part research and part engineering.

Data science research moves ahead with multiple iterations and experimentation. Sometimes, the whole project will have to loop back from the deployment phase to the planning phase since the metric that was picked is not driving user behaviour.

Traditional Agile based project deliveries may not be expected from a Data science project. This will cause large scale confusion for the leadership who
has been working with clear deliveries at the end of each task cycles
for normal software development projects.

3. Volume and Quality of Data

Everyone knows that larger the dataset, better the prediction from the AI
system. Apart from the direct implications of the higher volumes, as the size of the data increases, lot of new challenges arise.

In many of such cases, you will have to merge data from multiple sources.
Once you start doing it, you will realize that they are often not in sync. This will result in lot of confusion. Sometimes you will end up merging data that were not supposed to merge which will result in having data points with same name but different meaning.

Bad data at best will produce results that aren’t actionable or insightful. Bad data can also lead to misleading results.

Source: DZone

4. Labeling of data

Unavailability of labeled data is another challenge that stalls many of the machine learning projects. According to MIT Sloan Management Review,

76% of the people combat this challenge by attempting to label and annotate training data on their own and 63% go so far as to try to build their own labeling and annotation automation technology.

This means that a huge percentage of expertise of those data scientists are
lost for labeling process. This is a major challenge for the effective execution of an AI project.

This is the reason many of the companies are outsourcing the labeling task to other companies. However, it is a challenge to outsource the
labeling task if it requires enough domain knowledge. Companies will have to invest in formal and standardized training of annotators if they need to maintain quality and consistency across datasets.

Other option is to develop own data labeling tool if the data to be labelled is complex. However, this often require more engineering overhead than the Machine learning task itself.

5. Organizations are Siloed

Data is the most important entity of a machine learning project. In most of
the organizations, these data would reside in different places with different security constraints and in different formats — structured, unstructured, video files, audio files, text, and images.

Having these data in different places in different format itself is a challenge to handle. However, the challenge doubles when the organization is siloed, and responsible individuals are not collaborating each other.

Photo by Dmitry Demidov on Pexels

6. Lack of collaboration

Lack of collaboration between different teams such as Data scientists, Data
engineers, data stewards, BI specialists, DevOps, and engineering, is another major challenge. This is especially important for the teams in
engineering scheme of things to the Data science since there is lot many differences in way they work and the technology they use to fulfiLl the project.

It is the engineering team who is going to implement the machine learning model and take it to the production. So, there needs to be proper understanding and strong collaboration between them.

7. Technically Infeasible Projects

Since cost of Machine learning projects tends to be extremely expensive, most of the enterprises tend to target a hyper ambitious “moon-shot”
project that will completely transform the company or the product and
give oversized return or investment.

Such projects will take forever to complete and will push the data science team to their limits.

Ultimately, the business leaders will lose the confidence in the project and stop the investment. It is always best to focus on a single, achievable project with proper scope and target a simple business challenge.

8. Alignment Problem Between Technical and Business Teams

Many times, ML projects are started without clear alignment on expectations, goals and success criteria of the project between business and data science team.

These kinds of projects will forever stay in the research stage itself because they never know if they are making any progress since it was never clear what the objective was.

Here, the data science team will be focused mainly on the accuracy whereas the business team will be more interested on metrics such as financial benefits or business insights. At the end, business team ends up not accepting the outcome from the data science team.

Source: Help Net Security

9. Lack of Data Strategy

According to MIT Sloan Management Review, only 50% of large enterprises with more than 100,000 employees are mostly likely to have a data strategy. Developing a solid data strategy before you start the Machine learning project is critical.

You need to have a clear understanding of the following as part of Data strategy:

  • The total data you have in the company
  • How much of that data is really required for the projects?
  • How will the required individuals have access to these data and how easily those individuals can have access them
  • Specific strategy on how to bring all these data from different sources together
  • How to clean up and transform these data.

10. Lack of Leadership support

It is easy to think that you just need to throw some money and technology at the problem and the result would come automatically.

We do not see the right support from the leadership to make sure of the
needed conditions for success. Sometimes business leaders do not have
the confidence in the models developed by the data scientists.

This could be because of the combinations of lack of understanding of AI of the business leader and the inability of the data scientist to communicate the business benefits of the model to the leadership.

Ultimately, leaders need to understand how Machine learning works and what AI really means for the organization.

Spread the word

This post was originally published by Prajeen Vijayan at Hacker Noon

Related posts