Machine learning/ AI for Research — use it or refuse it?

mediumThis post was originally published by Florian Huber at Medium [AI]

To start, try getting a clear idea of what a machine learning solution would have to achieve. Then go through the flowchart below.

Application of ML to problems

Machine learning models can fail in unexpected ways

The clearest answer I can give is when you come with a mission critical task. That’s essentially anything that should never go wrong and where no final human inspection takes place. In those cases: forget about AI.

Sure, there is autonomous driving, clearly a mission critical application. But AI systems failing in unexpected ways have drastically reset expectations in that area (see or ). So, that’s only confirming my point: don’t use AI for it.

Simple flowchart to decide if machine learning could potentially be of interest. Even if the answer is yes, that of course still doesn’t mean it will really make sense for a particular problem/data combination. Hopefully the second flowchart below can then help you further.

When to consider AI/ ML solution

by Florian Huber, see for high-res pdf

Must the results remain the same at all times?

Think for instance of a distance measure used for clustering. Classical measures for this (cosine, euclidean etc.) will always return the same values, now and in hundred years. In contrast, many machine learning solutions are moving targets which involve training a model on a given dataset. Say, your model outputs a distance value for two inputs, classifies an input, or clusters a set of inputs. In all such cases the outcome can change each time the training dataset changes.

Even worse, unless you take extreme care, the results will even be slightly different each time you train a model on the same data (shocking, isn’t it!? ). You can still make things more reproducible, for instance by doing a proper versioning of your trained models. But better don’t think in terms of hundred years.

Must the results be easy to explain/ understand?

Many people interested in machine learning will have heard about the “black-box problem”, which usually refers to the fact that the model outcome cannot easily be explained due to the high model complexity. To get this right out of the way: Even the fanciest deep learning networks are not truly black boxes, because each mathematical operation done to get a certain outcome can in principle be accessed.

The key problem actually is that it’s extremely hard to translate this into an understandable pattern. So, if you want an explanation as simple as “A is close to B because the euclidean distance is below x” then deep learning is usually not the route to go (unless, maybe, you really know a lot about ).

Some classical machine learning techniques might be worth considering here, because techniques such as or models (on reasonable number of features), usually do provide very human-accessible explanations.

Good enough if results (on average) outperform current measures?

Finally, if what you really want to achieve is mostly to outperform current techniques, things look different. Say you want to get better query results than before (but a human in the end would anyway inspect them), or you want to classify things with higher accuracy (failing occasionally is fine though), or you want better predictions than you currently have (as all predictions those are allowed to occasionally fail as well, just maybe less often). In such cases, machine learning is probably worth considering.

It is important to notice that this point of “outperforming current measures” is broader than only achieving higher accuracy. Often machine learning can also provide faster, more scalable solutions. In big data times, that can make a huge difference.

Now we reach the difficult part.

If you have never applied any machine learning yourself, this seems near-impossible to answer. The second best thing you can do (first best thing again is: talk to someone experienced with machine learning!), is to search for related work where machine learning was applied.

This is tedious, and you might have to dig your way through mountains of jargon and unnecessarily complex and incomprehensible papers. Going through this pain, however, will hopefully give you a more realistic picture of what you can obtain with the type and amount of data you have. Otherwise, those incredibly shiny results that make the headlines can easily cause unrealistic expectations. As an orientation, have a look at the flowchart below:

Quick check if you are onto something with you machine learning idea. Don’t forget, it’s a flowchart, not an expert. If you come across any question where your answer is neither Yes or No, but “I don’t know”, you unfortunately have to suffer a bit more and read another 5 related blog posts, tutorials, or papers (the latter if you like suffering). Or, maybe I repeat myself…, just talk to someone with more experience in machine learning.

Hint: People that have experience with machine learning tend to drink coffee. If they don’t, they drink tea. Asking someone with machine learning experience to discuss an idea over a cup of coffee/ tea will probably help to finish this flowchart.

If you are serious about using machine learning for your research — maybe you are applying for funding, or deciding on future steps in you project, or looking for new collaborations? — better spend some time to gain a basic intuition on how machine learning could help with your problem. Don’t worry, you don’t have to become a machine learning expert yourself, but I imagine you also don’t want your next proposals to sound like this:

“We have data (or hope to have it), and then we want to do magic (which we simply call deep learning).”

OK then. No magical solutions from deep learning. Still, I am convinced that there is plenty of opportunity to enhance scientific research through machine learning.
– Yes, it can be hard to do right.
– And no, it is no silver bullet.
– But it is a whole lot of fun to do! And even if it doesn’t achieve what was hoped for, it will give you new insights about your data and your research question.
– And sometimes… well… sometimes… there are those rare moments when you press enter and then start to see this magical glittering that seems to come from your screen.

Spread the word

This post was originally published by Florian Huber at Medium [AI]

Related posts