Gesture Recognition: How do machines learn hand gestures?

AI reads gestures

Published by FirstAlign

For a long time, Human-Computer Interaction (HCI) has been through traditional devices. Devices such as a mouse, keyboard, and joystick. The developing human interface is creating alternative ways of interaction. A hand gesture has become an intuitive and natural way of human-computer interaction. How does gesture recognition work? Let’s find out.

A hand gesture is the physical movement of the hand or any body part that can be interpreted by a motion sensor. It can be as simple as pointing a finger or a bigger movement like a high kick. Gesture-based interfaces allow users to control devices using hand movements or other body parts. A few applications include augmented reality gaming, driving, and shopping.

How do machines recognize hand gestures?

Gesture recognition has 3 modules

Detection: The first step is to capture movements. A camera captures movements of body parts through 3D space. Detection is a crucial step; the aim is to extract movement from the background for the subsequent steps. Features such as skin color, shape, or motion are used in detecting movements.

Tracking: Tracking monitors data frame by frame. It captures each movement for analysis and interpretation. Tracking hands can be quite a task, as hands can move quickly, and their appearance can vary dramatically within frames.

Recognition: Deep learning algorithms train to identify and correlate meaningful gestures. It correlates against a pre-built library or catalog of gestures. The gesture is matched to a real-time action, specific to the end-user application. Once the gesture has been matched from the library, the system executes the desired action. Deep Learning is the key to learning and interpreting movements.

Real-time Application: Sign Language to text conversion

Sign language is commonly used by those hard of hearing, but most people don’t understand sign language. This poses a challenge in communication between hearing-impaired and normal people. An added disadvantage to the deaf. It means missed opportunities for learning, work, and life. So how can we bridge the gap?

Rochester Institute of Technology came up with a solution. They created a system that converts American Sign Language (ASL) to text. Using computer vision and Machine Learning model ASL can be converted into text that can be read on any device.

Google research has freely published hand tracking technology. Google research engineers Valentin Bazarevsky and Fan Zhang said that the free publishing intended to serve as “the basis for sign language understanding.”  The approach provides high-fidelity hand and finger tracking. It uses Machine Learning (ML) to infer 21 3D key points of a hand from just a single frame. This method achieves real-time performance on a mobile phone, and can also be scaled to multiple hands.

Future Scope

The global gesture recognition market is likely to reach 30.6 billion USD by 2025. In the future, gesture recognition devices have the potential to be a big part of everyday life. Television manufacturers are already exploring motion sensor technology, with the aim of improving customer experience and convenience. There will be no need to push the buttons anymore, for example, a hand gesture could be used to mute your TV. With this in mind, the electronics sector is leading the race to adopt gesture recognition interfaces. Don’t be surprised, if soon your everyday life feels out of a Hollywood sci-fi movie.

References

Click here to connect with us

Spread the word

Related posts