A practical guide on Pandas Sidetable

mediumThis post was originally published by Soner Yıldırım at Medium [AI]

The dataset contains 16598 video games with sales amount (in millions) and some features such as genre, publisher and year. Let’s see how many different genres exist in the dataset.

df.Genre.nunique()
12

We can see how many games each genre has using value_counts.

df.Genre.value_counts()

Action and Sports genres have the most video games. We may want to see the ratio of each genre rather than the numbers. Normalize parameter of value_counts can be used as follows.

df.Genre.value_counts(normalize=True)

Almost 20% of the games belong to action genre. I think it is much better to see both count and percentage at the same time. Sidetable provides this convenience as well as some other informative statistics.

Sidetable is used as an accessor on dataframes with stb keyword.

df.stb.freq(['Genre'])

As you can see, both count and percent values are displayed. In addition to these, cumulative values are also provided. For instance, the first 4 genres constitute 53% of the entire dataset.

In some cases, there might be so many different categories which makes it hard to visualize all. Thresh parameter of sidetable allows to limit the displayed values based on a threshold value on cumulative percent. For example, we can display the platforms that contain 70% of all video games.

df.stb.freq(['Platform'], thresh=70)

The first 8 platforms in terms of number of video games constitute 69.96% of the entire dataset. The remaining ones are combined under “others” label. Sidetable allows to change this label using other_label parameter.

df.stb.freq(['Platform'], thresh=70, other_label="Other Platforms")
See below for the full article…
How to use the sidetable efficiently.
Medium | Soner Yıldırım
Spread the word

This post was originally published by Soner Yıldırım at Medium [AI]

Related posts