Solve real-world problems using Deep Learning & Artificial Intelligence

mediumThis post was originally published by Vaishnavi Dwivedi at Medium [AI]

Featured image by Aaron Boris on Unsplash

Not your usual object detection blog

a man holds his head while sitting on a sofa

Photo by  Nik Shuliahin on Unsplash

Every new technology introduced has had a purpose(almost always). Usually, it is a solution to a certain problem(s) identified by its creator while brainstorming. But when we talk about Artificial Intelligence, which is a largely unexplored yet constantly evolving field of computing, we often get detracted by the spruced-up object detection projects.

We all know that correctly classifying handwritten digits or distinguishing between animals are not the applications that researchers like Geoffrey Hinton and Alexey Ivakhnenko had in mind while dedicating their lives to the field. AI has the potential to change the way we live our lives. The proof lies in your pockets. From your favorite video streaming platforms to your online cabby services, basics principles of AI are being used everywhere.

But what about those who don’t have the 20th century’s supercomputer in their pockets or anything in their pockets for that matter. What about those who sleep with raging hunger every night and wake up to the monstrosities of life.

We need to understand that Artificial Intelligence and its subsets are not just about math or algorithms that magically predict something. The data is the real deal. These algorithms without data are just like your phone without the internet. Absolutely useless. (Unless you have offline games installed)

But that’s where the problem arises. If we want to use AI to diagnose/cure diseases, prevent the drastic impacts of climate change or track incoming pandemics, we need real private data. Only if there was a way to train a deep learning model on data that cannot be retrieved or seen by engineers.

In recent years, there has been considerable research to preserve the privacy of a user to solve real-world problems that require private user data. Companies like Google and Apple have been heavily investing in decentralized, privacy-preserving tools, and approaches to extract the new “oil” without actually “extracting” it from the owner. We would be exploring a few of these approaches below.

In its essence, federated learning is an approach to answer questions with data that resides with the users across the world (on different devices). Secure Multiparty Computation (SMPC) is a cryptography protocol that different workers (or data producers) use to train over the decentralized datasets while protecting the privacy of the user. This protocol is used in the Federated Learning approach to make sure that private data residing with the user cannot be seen by the deep learning practitioners trying to derive insights and solving global issues.

However, this technique is prone to reverse engineering as stated by this paper on “A generic framework for privacy-preserving deep learning” by members of an organization called OpenMined. They have been trying to popularize the concept of privacy in AI by developing such frameworks on top of common tools used by researchers and practitioners on a daily basis.

The best example of federated learning is your Gboard (if you use an android device). In simple terms, it takes in the way you type your sentences, trying to analyze what kind of words you use and then tries to predict the next word you are likely to type. The fun fact is that the training happens on the device itself (edge devices) without revealing the data to the engineer who designed the algorithm. So complex yet snazzy, eh?

Imagine you have a dataset with certain entries, perhaps something like whether or not a person had a genetic disadvantage while they suffered from COVID-19. Now without actually looking at what information the data carries about each individual, you can try to understand what significance an individual entry has on the output result of the entire dataset. Once we do that, we could gain some insight into how that result came into being.

As stated in the book, The Algorithmic Foundations of Differential Privacy, Differential privacy addresses the paradox of learning nothing about an individual while learning useful information about a population.

However, differential privacy can have many cons as it depends highly on the nature of the dataset. If the individual entries are way too diverse (which is very possible) the overall result would be biased towards certain entries. And if there is too much noise in the data, there is a possibility that it won’t help gain useful insights.

Performing operations on encrypted data without decrypting it is called Homomorphic Encryption. Now, this looks very convenient as the data is encrypted and doesn’t require the engineers to look at what the data is about. But again, it needs a lot of computational power and definitely requires a lot of time to perform these computations. There are two types of Homomorphic Encryption, depending on the type of computations that can be performed using them — Partial Homomorphic Encryption and Fully Homomorphic Encryption. The former only supports basic operations like addition, subtraction, etc. whereas the latter can theoretically perform arbitrary operations but with significant overhead.

Spread the word

This post was originally published by Vaishnavi Dwivedi at Medium [AI]

Related posts