The promise and peril of AI

mediumThis post was originally published by Tapani Toivonen at Medium [AI]

Artificial intelligence (AI) has matured and evolved during the last decade. Great advances in AI such as GANs, ConvNets, deep reinforcement learning and hardware upgrades enable analysis, predictions etc. so accurate, profound and deep it makes AI models to seem superhuman to us with capabilities beyond comprehension. AI models are constantly analyzing Internet traffic, recommending what we should buy next, trying to identify frauds, surveilling us; if you will, and beating us in board and video games.

The state of the art machine learning (ML) algorithms derive their power almost completely from neural networks: models inspired by how the brain and nervous system operate. Term ‘deep learning’ is usually used as it were a synonym for AI. These ‘deep learning’ models are usually ridiculously deep neural networks (neurons after neurons after neurons…) that outperform us humans in diagnosing illnesses, mastering board games, trading stocks for profit, driving cars safely and many many more tasks that require highly sophisticated reasoning abilities.

There are, however, many other domains in AI. Probably not as attractive as deep learning, but still significantly important. Such domains include optimization, constraint satisfaction, swarm intelligence, evolutionary computing, planning and so forth. Despite being important topics and despite being a part of the AI as a field, at least one thing ties these approaches to deep learning: they all are ridiculously hard.

One might not realize how hard training a deep learning model is when one has gotten used to near perfect image recognition on one’s mobile phone, near perfect speech recognition when starting a conversation by just shouting ‘Hey Siri’ or ‘Ok Google’ or by having being spooked by amazingly accurate recommendation system on Netflix or on Spotify.

First, deep learning requires enormous amount of data. That means millions and millions bits to learn to generalize (and to be able to predict, for instance). Second, the common method to train deep learning model is to use an optimization method named as stochastic gradient descent (SGD), which is at its best very volatile. SGD is a technique, which tries to minimize an objective function. In the context of ML, the objective function is the error rate (how much the model makes mistakes, for instance) and SGD tries to fix the model so that the value output of the objective function is very small.

The ‘funny’ thing about SGD is that only lately, researchers have put effort into formal proofs on how good SGD really is. It has been known for a while that usually SGD does not ‘fix’ deep learning model so that the model achieves smallest possible error rate but instead SGD gets stuck in somewhere in the middle of the initial state of the model and the best possible state of the model. The late findings are stunning: SGD only works if the input data follow some distribution or some other constraints are being satisfied. In general, SGD does only a little fix to model if even that.

What is more, back in the days, so called vanilla neural network ‘multilayer perceptron’ (one of the oldest deep learning algorithms) was shown to be (with certain very simple properties) very hard to train even up to a certain accuracy level. Computer scientists in general believe that these hardness results do not get refuted ever (I have to admit that part of me disagrees with the majority). During the recent years, these hardness indications have been applied to more sophisticated deep learning methods aswell: it seems that deep learning is computaionally super expensive and very time consuming.

So how do the big tech companies deal with the hardness and achieve to train models with superhuman performance? Well… with trial and error mostly. They have enormous amount of data, resources to fail and luck, of course. Dealing with the hardness of training a deep learning model usually begings with the assumption that the model is not required to be perfect. Instead a ‘good enough’ is good enough.

Also, there has been other approaches lately to train a deep learning model besides the traditional SGD. A provable efficient method ‘convex optimization’ works to some extend. What is probably the greatest finding and most promising is a result where instead of the smallest possible model, one uses a model of which size is greatly exaggerated. SGD then will provable fix the model to be near perfect in reasonable time.

So while deep learning has been empirically shown to be superior to traditional ML or AI to some extend, building a such predictive model (at least a small) is super hard. Perhaps beyound reasonable and efficient computation. Some hope has lately risen, but the future will eventually tell whether deep learning will answer to questions that the mankind has searched to resolve during last centuries.

Spread the word

This post was originally published by Tapani Toivonen at Medium [AI]

Related posts