My odyssey, finding the most popular Python function

We all love Python, but how often do we use which mighty functionality? An article about my quest to figure it out.

The most mentioned Python functions mentioned inside Pythonrepositories calculated via GitHub commits. Image by Author

The other day while I was running some zip() with some lists through a map(). I couldn’t stop noticing how much my Python style over the years has changed.
We all asked ourselves this question before, what is it other people do with this beautiful language? What functions do they use?
As a data scientist, I aimed at something slightly more measurable. What is the most mentioned Python functionality in GitHub commits?
In the following articles, I will

Discuss the limitations of such a question and in how many ways I failed to find the answer
Show how I collected the data from GitHub
And most importantly, teach you how to lure Medium readers to your article with cool racing bars

Initially, I started this project to figure out how often Python functions are called. Quickly we noticed that on Github, you could look this up in no time. Use the search function!

Amount of print() functions on GitHub, Image by Author

Problem Solved!
Well not quite…

The issue is that these results are volatile. By calling this search several times, we can get any number of results! This means when calling it again.

Amount of print() functions on GitHub when calling it again, Image by Author.
We get a very different result…
Github API
GitHub has a fantastic search API!

Problem Solved!
Well not quite…

The issue here is that they only offer the first 34k results or something like this for the code, after trying for quite some time to get something useful out of it. I had to realize that they won’t allow me to do it in this way. And our questions sadly can’t be answered using the easy way.
Github Search function via Commits
After quite some time, I detected that one could search by commits in the Python Language by time!

Problem Solved!
Well not quite…

While this way of searching seems to be quite reliable. It produces a lot of false positives. For example, it will show commits to repositories that only commit a little bit of Python. The commit may then include the words or functions in some sense.
While this is not ideal, I decided to take this route since it allowed for a comparison over time. Also, I tried all other ways I could think of, if you found a better way please let me know in the comments. Generally, this data has to be taken with a lot of skepticism, but I hope it teaches us some valuable lessons. Most certainly, it creates a killer plot 😉

We have our approximation of how to find the answer. Now, all we have to do is call the GitHub API!

Problem Solved!
Well not quite…

The issue seemed to be that this API is supposed to be more for actual searches inside your repositories. GitHub seems to have a hard limit on the number of links they return to you. They seem to look for X seconds and then stop, and return whatever they got so far. This makes a lot of sense since dealing with such vast amounts of data is very expensive. Sadly it also makes our journey to an answer so much harder.
Since we refuse to give up, we decide to call their website and parse the answer from the returned HTML! While this is neither elegant nor simple, we ain’t no quitters.
Let’s build our link. An example link might look like{function}%28+language%3A{Language}+type%3Acommits+committer-date%3A%3C{before_year}-01-01&type=commits

Example link, Image by Author
As we can see we look for basically 3 things.

function: What function do we want to know about? e.g. len()language: What programming language? e.g. Pythonbefore_year: Before what year? e.g. 2000

When feeding these parameters to GitHub it will tell us how many functions have been committed before that date!
After calling this link, it returns us an HTML file that we can filter to get our answer. The code for doing such things can be

import urllib.requestlanguage=’Python’befor_year=2000# create the url using a year and a languageurl_base = f”{search_term}%28+language%3A{language}+type%3Acommits+committer-date%3A

Read More