Analyzing the chaotic Presidential Debate 2020 with text mining techniques


This post was originally published by Feng Lim at Towards Data Science

Thanks to the internet, now the world knew about the Presidential Debate 2020 that went out of control. All of the major news stations were reporting about how the participants were interrupting and sniping at one another.

I decided to put together an article that focuses on analyzing the words used in the event and see if there are any hidden insights.

This article focuses on finding out the most used words, categorized by each spokesperson, and sentiment analysis of the speeches.

The first 2020 Presidential Debate overview

– Incumbent President Donald Trump
– Former Vice President Joe Biden (Democratic nominee)

– Chris Wallace

Topics covered:

  1. The candidates’ political records
  2. The Supreme Court
  3. The coronavirus
  4. The economy
  5. Race and violence in cities
  6. The integrity of the election

Cleaning the dataset

In total, close to 20,000 words were used in the event. After removing names and common stop words, around 6000 words were left for analysis.

The first Presidential Debate in short

The word correlation network graph illustrates how words are used either in the same sentence or next to each other in the debate. I grouped some of the words into networks that may be relevant to the topics covered in the debate:

  • The U.S. economy topic includes words such as ‘affordable’, ‘job’, ‘act’, etc.
  • The supreme court topic includes words such as ‘justice’, ‘reason’, ‘judge’, etc.
  • The race and violence cities topic includes words such as ‘peaceful’ and ‘protest’
  • The election topic includes words such as ‘ballots’, ‘management’, ‘mail’, etc.

Classifying the debate with its corresponding part-of-speech

We can also tag each participants’ words with a part-of-speech category (noun, proper noun, adjective, adverb, etc.). This section will look specifically at the most occurring proper nouns, nouns, and adjectives used by Trump and Biden.

  • ‘China’ is the most used proper noun by Trump
  • ‘People’ is used by Trump and Biden for more than 60 times each in the debate
  • Interestingly, the most-used adjective by Trump is ‘wrong’ and ‘true’ by Joe Biden
Image for post
Image for post
Image for post
Image for post
Images by Author: Part-of-speech bar charts

The polarity of the debate

One of the basic tasks of sentiment analysis is understanding the polarity of a given text, whether the opinions expressed in the text are positive, negative, or neutral.

The first plot below shows the polarity of speech text by each spokesperson across the debate event. The blue ticks mean negative opinions were used, and the red ticks mean positive opinions. It is interesting that there is an empty chunk between sentences 1500–2500 for Trump.

The second plot below shows the number of negative, neutral, and positive words used by the Presidential debate spokesperson.

Overall, it seems that Biden and Trump expressed almost an equal amount of positive and negative opinions.

Positivity of each participant in the Presidential Debate

The stacked bar chart below shows each spokesperson’s overall polarity by categorizing their speeches as either positive or negative.

Based on the graph, Biden seems to use more positive words compared to Trump in the debate. Next, we will look at the spokesperson’s exact words, colored with polarity/positivity.

Positivity word clouds by each spokesperson

Words with negative sentiment are colored in blue and positive sentiment as red in the word clouds. The size of the words depends on their respective frequencies.

The most used positive word by Trump is ‘won’ and ‘affordable’ by Biden. Both participants used the word ‘wrong’ often. The word ‘support’ also shown up in all of the word clouds.

Next, we will look at some sentiment analysis in the debate.

Sentiment analysis of the First Presidential Debate 2020

The bar chart below shows the count of words used in the debate associated with each emotion. Words associated with the positive emotion of “trust” occurred the most, whereas words associated with the negative emotion of “disgust” occurred the least in the debate.

Next, we will look a the exact words related to each emotion, categorized by each of the spokespeople:

Wallace’s most frequent word is ‘sir,’ which occurred more than 40 times in the debate.

Based on the chart below, words including ‘vote,’ ‘deal,’ and ‘tax’ are highly mentioned by Biden.

Trump highly mentions words, including ‘military,’ ‘law,’ and ‘job.’


The debate has caused quite a stir on the internet that I thought it would be fun to analyze the event’s speeches. One particular word stood out to me from the analysis — ‘people.’ The word was highly mentioned by both participants and seemed to be the only topic they were aligned with. ‘People’ also extends to other words such as ‘job,’ ‘affordable,’ ‘court,’ etc., that are related to the major topics discussed.

Spread the word

This post was originally published by Feng Lim at Towards Data Science

Related posts