Machines that can understand human speech: The conversational pattern of AI

Machines That Can Understand Human Speech: The Conversational Pattern Of AI

mediumThis post was originally published by elisha Moskel at Medium [AI]

Early on in the evolution of artificial intelligence, researchers realized the power and possibility of machines that are able to understand the meaning and nuances of human speech. Conversation and human language is a particularly challenging area for computers since words and communication is not precise. Human language is filled with nuance, context, cultural and societal depth, and imprecision that can lead to a wide range of interpretations. If computers can understand what we mean when we talk, and then communicate back to us in a way we can understand, then clearly we’ve accomplished a goal of artificial intelligence.

Conversational interaction as a pattern of AI

This particular application of AI is so profound that it makes up one of the fundamental seven patterns of AI: the conversation and human interaction pattern. The fundamental goal of the conversational pattern is to enable machines to communicate with humans in human natural language patterns, and for machines to communicate back to humans in the language they understand. Instead of requiring humans to conform to machine-modes of interaction such as typing, swiping, clicking, or using computer programming languages, the power of the conversational pattern is that we can interact with machines the way we interact with each other: by speaking, writing, and communicating in a way that our brains have already been wired to understand.

Many cases of today’s narrow applications of AI are focused on human communication. If a computer can understand what a human means when they communicate, we can create all manner of applications of practical value from chatbots and conversational agents to systems that can read what we write in our documents and emails and even systems that can accurately translate from one human language to another without losing meaning and context.

Machine to human, machine to machine, and human to machine interactions are all examples of how AI communicates and understands human communication. Some real-life examples include voice assistants, content generation, chatbots, sentiment analysis, mood analysis, and intent analysis, and also machine powered translation. The applications of the conversational pattern are so broad that entire market sectors are focused on the use of AI-enabled conversational systems, from conversational finance to telemedicine and beyond. Beyond simply understanding written or spoken language, the power of the conversational pattern of AI can be seen in a machine’s ability to understand the sentiment, mood, and intent, or take visual gestures and translate them into machine understandable forms.

Natural Language Processing: evolving over the past few decades

Accurately processing and generating human language is particularly complicated, with constant technology evolution happening over the past sixty years. One of the easier to solve problems is the conversion of audio waveforms into machine-readable text, known as Automatic Speech Recognition (ARS). While ASR is somewhat complicated to implement, it doesn’t need machine learning or AI capabilities generally, and some fairly accurate speech-to-text technologies have been around for decades. Speech-to-text is not natural language understanding. While the computer is transcribing what the human is saying, it is taking waveforms that it understands and converting them to words. It is not interpreting the data it is hearing.

The inverse capability, text-to-speech, also doesn’t require much in the way of machine learning or AI to be performed. Text-to-speech is simply the generation of waveforms by the computer to speak words that are already known. There is no understanding of the meaning of those words when simply using text-to-speech. The technology behind the text to speech has been around for years, you can hear it in the movie War Games (1983): “would you like to play a game?”

However, speech-to-text and text-to-speech isn’t where AI and machine learning are needed, even though machine learning has helped text-to-speech become more human-sounding, and speech-to-text more accurate. Natural language processing (NLP) involves more than the translation of waveforms and generation of audio waveforms. Just because you have text doesn’t mean that machines can understand it. To gain that understanding, machines need to be able to understand and generate parts of speech, extract and understand entities, determine meanings of words, and use much more complicated processing activities to connect together concepts, phrases, concepts, and grammar into the larger picture of intent and meaning.

Natural language processing consists of two parts: natural language understanding and natural language generation. Natural language understanding is where a computer interprets human input such as voice or text and can translate that into something the machine is capable of using in an intended manner. Natural language understanding consists of many subdomains in trying to understand the intent from text generated from audio waveforms or typed by humans in text-mode interactions such as chatbots or messaging interfaces. AI is applied to lexical parsing to understand grammar rules and break sentences into structural components. Regardless of the approach used, most natural-language-understanding systems share some common components. Then, once the components are identified, each piece can be semantically understood to interpret words based on context and word order. Further logical analysis and deduction can be used to determine meaning, based on what the various parts are referring to, using knowledge graphs and other Methods to deduce meaning.

Natural language generation is the process of the AI being able to prepare communication for humans in any form that is natural and does not sound like it was made by a computer. In order for a computer process to be considered natural language generation, the computer actually has to interpret content and understand its meaning for effective communication. This involves the reverse of many of the steps identified in natural language understanding, taking concepts, and generating human-understandable conversations from how the machine understands the way humans communicate.

Why is machine-facilitated conversation so important?

When it comes down to the pattern of human and computer communication, it is receiving so much focus because our interactions with systems can be very difficult at times. Typing or swiping can take time and not communicate our needs properly while reading static content like an FAQ might not be helpful for most customers. People want to interact with machines efficiently and effectively. Many user interfaces are quite suboptimal for human interaction, requiring confusing menu interaction, interactive voice response systems that are too simplistic, or rules-based chatbots that fail to satisfy user needs.

The development of more intelligent conversational systems goes back decades, with the ELIZA chatbot first developed in 1966 as an illustration of the possibilities of machine-mediated conversation. Nowadays, users are more familiar with voice assistants such as Alexa, Google Assistant, Apple Siri, Microsoft Cortana, and web-based chatbots. However, if you’ve interacted with any of them recently, they still are lacking in understanding in many significant ways. There’s no doubt that much of the work of AI researchers is going into improving the ways that machines can understand and generate human language and thus reinforce the power of those applications that leverage the conversational pattern of AI.

Spread the word

This post was originally published by elisha Moskel at Medium [AI]

Related posts