NLP made easy: Perform basic NLP operations with ease using TextBlob

Published by FirstAlign

NLP (Natural Language Processing) in the field of Artificial Intelligence is used to enable the interaction of human and computers using natural language. NLP is particularly important in this day and age with the abundance of data present on the internet or elsewhere that is in the form of natural language. So, to understand and interpret that data, NLP is used. This has a wide variety of use cases, some of them are given below.

Spelling Correction: If you have ever used any word processor you must have experienced the spell check option, i.e. when you type an incorrect spelling word processor tell you it’s wrong and suggests the correct option. This was one the very first use cases of NLP.

Spam Detection: Spam detection is also an early use cases of NLP. Computer decide whether a text, email or a message is spam or a genuine. You will have experienced this using any email account.

Sentiment Analysis: With the rise of social media platforms where people language to communicate, analyzing the sentiment of text can be very useful. For example, in disaster management with sentiment analysis we can target a particular geographical regions and review the sentiment of texts in that location. This allows observations and understanding of the effects of the disaster.

Sentiment analysis is one of the most prominent use cases of NLP.

Text Generation: NLP is not only used to analyze a given text but can also generate text. One of the examples of text generation is a chatbot such is the advancement in processing power and improvements in algorithms. Text generation can even be used to perform several tasks such as writing poetry stories or even code as discussed in AI can write stories, poetry, and code.

These are just some of the use cases for NLP, in the field there is a vast array. If we look under the hood of any of the use cases, we will see it is a collection of NLP operations. In this post we will explore these NLP operations and demonstrate TextBlob in its performance

TextBlob

According to their website “TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks. These include part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, to name but a few.”

Our reason for using TextBlob is its simplicity as an API. This allows NLP operations ot be performed with just a few lines of code.

Getting Started with TextBlob

TextBlob is a python library. To use it in the local environment we need to install it in our local machine. To install TextBlob on our local machine run following commands;

$ pip install -U textblob
$ python -m textblob.download_corpora

After running these commands, TextBlob will be up and running. For a more extensive installation guide for TextBlob click here.

Let’s dive into the actual work.

Tokenization (splitting text into words and sentences)

Tokenization is the process of splitting the sequence of text into meaningful units. Those units can be words, phrases, or sentences. It is a very basic NLP operation, and is usually performed before any other operation. Most of the NLP operation depends on tokenization, with a simple example of tokenization being shown in the picture. The sentence is split into words.

To know more about tokenization read one of our other blogs, Tokenization techniques in NLP: How a sequence of text can be split into meaningful units?

To perform tokenization, we need to create a TextBlob. This is a sequence of a text on which we will perform an NLP operation.

The first block of code shown below, we have created a TextBlob with text sequence: Reminder “TextBlob is python library. “text.words” split the text sequence into words, “text.sentence” split the text into sentences, i.e. text is tokenized into sentences.

Tokenization with TextBlob

Parts of Speech tagging (POS tagging)

Parts of speech tagging is the process of classifying a word according to their function in speech. In this process, we have a sequence of text that is tokenized into the words, and then each word from that sequence is assigned a function ‘part of speech’. The figure below shows an example of POS tagging.

POS Tagging Example

To perform POS tagging using TextBlob, firstly we need a sequence of text, we then to perform POS tagging use “text.tags”. It will return a list of tuples, each tuple consisting of a word and its part of speech.


POS Tagging using TextBlob

Lemmatization

This is the process of removing the inflectional ending of words to convert them into base word, or the dictionary version of that word. This is done with the help of a vocabulary, and morphological analysis of the word.

The below diagram shows the example of Lemmatization.

Example of Lemmatization

To perform Lemmatization a word on which to perform is needed. We then apply the lemmatize function, this will return the root or base for word. Here we are taking word “corpora” which is converted to a word “corpus” which is its dictionary version.

Lemmatization example

Spelling Correction

One of the initial use cases for Natural Language Processing employed by many text editors is spelling correction. In this operation, spelling is checked against a vocabulary. If it’s wrong, the correct is spelling suggested. This is based on the probability value for possible correct spelling, as the spelling with the highest probability has most chances of being correct.

To perform spelling correction in TextBlob we need a word. For instance, we wrote an incorrect spelling of the word “spell”. We then used the “spellcheck()” function. It correctly assessed the probability value and suggested with 90% accuracy that the correct spelling is “spell” and a 10% chance that correct spelling is “sped”.

Spelling correction example

If we have a complete sentence and we want TextBlob to correct all spelling within, then we just have to create a sequence of text and call function “correct()”. It will return a sequence of text with correct spelling as shown below.

Spelling correction in a sentence

These are a few of the most common uses of NLP, there are many others.

Final Thoughts

In this blog we have covered some of the very basic NLP operations and use cases to demonstrate the application of NLP in real terms. First, we discussed was what is NLP, following we=this we looked at some basic NLP operation and how we can perform those operations using TextBlob.

The reason for choosing TextBlob is its ease of use, for which we learned how to get started with this technology. The full code is available at Github.

Hope you enjoyed the article stay tuned Happy Coding ❤

Published by FirstAlign

Spread the word

Related posts