Kite expands its AI code completions from 2 to 13 programming languages

Kite, which suggests code snippets for developers in real time, today added support for 11 more programming languages, bringing its total to 13. In addition to Python and JavaScript, Kite’s AI-powered code completions now support TypeScript, Java, HTML, CSS, Go, C, C#, C++, Objective C, Kotlin, and Scala.

Read More

The Bitcoin Mempool: Where Transactions Take Flight

One of Bitcoin’s strengths and the thing that makes it unique in the finance world is its radical transparency. Blockchain data is like a window, you can see right through it. But if Blockchain data is a window, it often feels less like the one in your apartment to look out of (solemnly during a pandemic, possibly), and more like this:

Read More

How to use Machine Learning Models to predict Loan eligibility

Build predictive models to automate the process of targeting the right applicants.
Loans are the core business of banks. The main profit comes directly from the loan’s interest. The loan companies grant a loan after an intensive process of verification and validation. However, they still don’t have assurance if the applicant is able to repay the loan with no difficulties.

Read More

My odyssey, finding the most popular Python function

We all love Python, but how often do we use which mighty functionality? An article about my quest to figure it out.

The most mentioned Python functions mentioned inside Pythonrepositories calculated via GitHub commits. Image by Author

The other day while I was running some zip() with some lists through a map(). I couldn’t stop noticing how much my Python style over the years has changed.
We all asked ourselves this question before, what is it other people do with this beautiful language? What functions do they use?
As a data scientist, I aimed at something slightly more measurable. What is the most mentioned Python functionality in GitHub commits?
In the following articles, I will

Discuss the limitations of such a question and in how many ways I failed to find the answer
Show how I collected the data from GitHub
And most importantly, teach you how to lure Medium readers to your article with cool racing bars

Initially, I started this project to figure out how often Python functions are called. Quickly we noticed that on Github, you could look this up in no time. Use the search function!

Amount of print() functions on GitHub, Image by Author

Problem Solved!
Well not quite…

The issue is that these results are volatile. By calling this search several times, we can get any number of results! This means when calling it again.

Amount of print() functions on GitHub when calling it again, Image by Author.
We get a very different result…
Github API
GitHub has a fantastic search API!

Problem Solved!
Well not quite…

The issue here is that they only offer the first 34k results or something like this for the code, after trying for quite some time to get something useful out of it. I had to realize that they won’t allow me to do it in this way. And our questions sadly can’t be answered using the easy way.
Github Search function via Commits
After quite some time, I detected that one could search by commits in the Python Language by time!

Problem Solved!
Well not quite…

While this way of searching seems to be quite reliable. It produces a lot of false positives. For example, it will show commits to repositories that only commit a little bit of Python. The commit may then include the words or functions in some sense.
While this is not ideal, I decided to take this route since it allowed for a comparison over time. Also, I tried all other ways I could think of, if you found a better way please let me know in the comments. Generally, this data has to be taken with a lot of skepticism, but I hope it teaches us some valuable lessons. Most certainly, it creates a killer plot 😉

We have our approximation of how to find the answer. Now, all we have to do is call the GitHub API!

Problem Solved!
Well not quite…

The issue seemed to be that this API is supposed to be more for actual searches inside your repositories. GitHub seems to have a hard limit on the number of links they return to you. They seem to look for X seconds and then stop, and return whatever they got so far. This makes a lot of sense since dealing with such vast amounts of data is very expensive. Sadly it also makes our journey to an answer so much harder.
Since we refuse to give up, we decide to call their website and parse the answer from the returned HTML! While this is neither elegant nor simple, we ain’t no quitters.
Let’s build our link. An example link might look like

https://github.com/search?q={function}%28+language%3A{Language}+type%3Acommits+committer-date%3A%3C{before_year}-01-01&type=commits

Example link, Image by Author
As we can see we look for basically 3 things.

function: What function do we want to know about? e.g. len()language: What programming language? e.g. Pythonbefore_year: Before what year? e.g. 2000

When feeding these parameters to GitHub it will tell us how many functions have been committed before that date!
After calling this link, it returns us an HTML file that we can filter to get our answer. The code for doing such things can be

import urllib.requestlanguage=’Python’befor_year=2000# create the url using a year and a languageurl_base = f”https://github.com/search?l=Python&q={search_term}%28+language%3A{language}+type%3Acommits+committer-date%3A

Read More

How to plot a Decision Boundary for Machine Learning Algorithms in Python

Classification algorithms learn how to assign class labels to examples (observations or data points), although their decisions can appear opaque. A popular diagnostic for understanding the decisions made by a classification algorithm is the decision surface.

Read More

Create reproducible Machine Learning experiments using Sacred

To give an example of how to use this powerful framework, I am going to use the dataset from a Kaggle competition, Real or Not? NLP with Disaster Tweets. This competition is a binary classification problem where you are supposed to decide whether a tweet is describing an actual disaster or not. Here are two examples…

Read More

‍️Data Scientist— 12 steps from beginner to pro

Data science is not only neural networks, but also classical statistics and machine learning algorithms, and overall everything related to the analysis, processing, and presentation of information in digital form. It cannot yet be said that there is a clear division of labor in Data Science — this is a non-specialized profession.

Read More
1 2 3 4