TextBlob spelling correction

TextBlob Spelling Correction

towards-data-science

This post was originally published by Nishanth N at Towards Data Science

What is TextBlob?

TextBlob is a Python library for processing textual data. It provides a consistent API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, and more.

Why TextBlob?

NLU is a subset of NLP in which an unstructured data or sentence is being converted into its structured form for performing NLP in terms of handling end to end interactions. Relation extraction, semantic parsing, sentiment analysis, Noun phrase extraction are few examples of NLU which itself is a subset of NLP. Now to work in these areas, TextBlob plays a great role which is not that efficiently done by NLTK.

Spelling Correction with TextBlob

Photo by Romain Vignes on Unsplash

STEP: 1 → Installing TextBlob

Sometimes tweets, reviews, or any blog data may contain typo errors, hence first we need to correct that data to reducing multiple copies of the same words, which represents the same meaning.

Installing TextBlob on your computer is very simple. You simply need to install it using pip.

STEP: 2 → Load the Input for preprocessing

Just we have to feed the computer from the basics so that it can be very well trained in natural language understanding and processing.

Next, load the input text (as docx) for which you need the correct the spelling, and the text we are about to handle is Immigrants in Toronto.

Here, we need to clean the input text using regex, since we don’t need any numeric characters.

Here, the cleaned text input

People have travelled through and inhabited the Toronto area, located on a broad sloping plateau interspersed with rivers, deep ravines, and urban forest, for more than , years. After the broadly disputed Torronto Purchase, when the Mississauga surrendered the area to the British Crown, the British established the town of York in and later designeted it as the capital of Upper Canada. During the War of , the town was the site of the Battle of York and suffered heavy damage by American troops. York was renamed and incorporated in as the city of Toronto. It was designated as the capitel of the province of Ontario in during Canadian Confederation. The city proper has since expanded past its original borders through both annexation and amalgamation to its current area of . km . sq mi . The diverse population of Tornto reflects its current and historical role as an important destination for immigrants to Canada. More than percent of residants belong to a visible minority population group, and over distinct ethnic origins are represented among its inhabitats. While the majority of Torontonians speak English as their premary language, over languages are spoken in the city. Toront is a prominent center for music, theatre, motion picture production, and tilevision production, and is home to the headquarters of Canada s major notional broadcast networks and media outlets. Its varied caltural institutions, which include numerous museums and gelleries, festivals and public events, entertaiment districts, national historic sites, and sports actevities, attract over million touriets each year. Torunto is known for its many skysvrapers and high rise buildinds, in particalar the tallest free standind structure in the Western Hemisphere, the CN Tower.

STEP: 3 → Identify the misspelled tokens

At first, identify the misspelled tokens, to correct their spelling. For this, we can use the SpellChecker function from the spellchecker module.

Output:

{'skysvrapers', 'entertaiment', 'standind', 'residants', 'rivers,', 'forest,', 'torronto', 'torunto', 'actevities,', 'touriets', 'toront', 'particalar', 'gelleries,', 'production,', 'designeted', 'purchase,', 'hemisphere,', 'area,', 'capitel', 'ravines,', 'premary', 'inhabitats.', 'ecents,', 'tilevision', 'toronto.', 'tornto', 'buildinds,', 'confederation.', 'sites,', 'caltural', 'theatre,', 'institutons,', 'language,', 'music,', 'troups.', 'torontonians', 'mississauga'}

STEP: 4 → Error correction process

Store the updated text with TextBlob object.

Use the correct() method to attempt spelling correction.

Output:

People have travelled through and inhabited the Toronto area, located on a broad sloping plateau interspersed with rivers, deep ravines, and urban forest, for more than , years. After the broadly disputed Torronto Purchase, when the Mississauga surrendered the area to the British Grown, the British established the town of Work in and later designed it as the capital of Upper Canada. During the War of , the town was the site of the Battle of Work and suffered heavy damage by American troops. Work was renamed and incorporated in as the city of Toronto. It was designate as the capital of the province of Ontario in during Canadian Confederation. The city proper has since expanded past its original borders through both annexation and amalgamation to its current area of . km . sq mi . The diverse population of Onto reflect its current and historical role as an important destination for immigrants to Canada. More than percent of residents belong to a visible minority population group, and over distinct ethnic origins are represented among its inhabitants. While the majority of Torontonians speak English as their primary language, over languages are spoken in the city. Front is a prominent center for music, theatre, motion picture production, and television production, and is home to the headquarters of Canada s major national broadcast network and media outlets. Its varied cultural institutions, which include numerous museums and galleries, festival and public events, entertainment districts, national historic sites, and sports activities, attract over million tories each year. Torunto is known for its many skysvrapers and high rise buildings, in particular the tables free standing structure in the Western Hemisphere, the of Power.

The full code is available in GitHub

I hope you have now understood how to use the TextBlob module for Spelling correction. Thanks for reading!

Spread the word

This post was originally published by Nishanth N at Towards Data Science

Related posts