This post was originally published by Austin Poor at Towards Data Science
Find the parts of your algorithm that are ACTUALLY slowing you down
Data scientists often need to write a lot of complex, slow, CPU- and I/O-heavy code — whether you’re working with large matrices, millions of rows of data, reading in data files, or web-scraping.
Wouldn’t you hate to waste your time refactoring one section of your code, trying to wring out every last ounce of performance, when a few simple changes to another section could speed up your code tenfold?
If you’re looking for a way to speed up your code, a profiler can show you exactly which parts are taking the most time, allowing you to see which sections would benefit most from optimization.
A profiler measures the time or space complexity of a program. There’s certainly value in theorizing about the big O complexity of an algorithm but it can be equally valuable to examine the real complexity of an algorithm.
Where is the biggest slowdown in your code? Is your code I/O bound or CPU bound? Which specific lines are causing the slowdowns?
Once you’ve answered those questions you’ll A) have a better understanding of your code and B) know where to target your optimization efforts in order to get the biggest boon with the least effort.
Let’s dive into some quick examples using Python.
You might already be familiar with a few methods of timing your code. You could check the time before and after a line executes like this:
In : start_time = time.time() ...: a_function() # Function you want to measure ...: end_time = time.time() ...: time_to_complete = end_time - start_time ...: time_to_complete Out: 1.0110783576965332
Or, if you’re in a Jupyter Notebook, you could use the magic
%time command to time the execution of a statement, like this:
In : %time a_function() CPU times: user 14.2 ms, sys: 41 µs, total: 14.2 ms Wall time: 1.01 s
Or, you could use the other magic command
%timeit which gets a more accurate measurement by running the command multiple times, like this:
In : %timeit a_function() 1.01 s ± 1.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Alternatively, if you want to time your whole script, you can use the bash command
time, like so…
$ time python my_script.py
These techniques are great if you want to get a quick sense of how long a script or a section of code takes to run but they are less useful when you want a more comprehensive picture. It would be a nightmare if you had to wrap each line in
time.time() checks. In the next section, we’ll look at how to use Python’s built-in profiler.
When you’re trying to get a better understanding of how your code is running, the first place to start is cProfile, Python’s built-in profiler. cProfile will keep track of how often and for how long parts of your program were executed.
Just keep in mind that cProfile shouldn’t be used to benchmark your code. It’s written in C which makes it fast but it still introduces some overhead that could throw off your times.
There are multiple ways to use cProfile but one simple way is from the command line.
Before we demo cProfile, let’s start by looking at a basic sample program that will download some text files, count the words in each one, and then save the top 10 words from each to a file. Now that being said, it isn’t too important what the code does, just that we’ll be using it to show how the profiler works.
Demo code to test our profiler
Now, with the following command, we’ll profile our script.
$ python -m cProfile -o profile.stat script.py
-o flag specifies an output file for cProfile to save the profiling statistics.
Next, we can fire up python to examine the results using the pstats module (also part of the standard library).
In : import pstats ...: p = pstats.Stats("profile.stat") ...: p.sort_stats( ...: "cumulative" # sort by cumulative time spent ...: ).print_stats( ...: "script.py" # only show fn calls in script.py ...: )Fri Aug 07 08:12:06 2020 profile.stat46338 function calls (45576 primitive calls) in 6.548 secondsOrdered by: cumulative time List reduced from 793 to 6 due to restriction <'script.py'>ncalls tottime percall cumtime percall filename:lineno(function) 1 0.008 0.008 5.521 5.521 script.py:1(<module>) 1 0.012 0.012 5.468 5.468 script.py:19(read_books) 5 0.000 0.000 4.848 0.970 script.py:5(get_book) 5 0.000 0.000 0.460 0.092 script.py:11(split_words) 5 0.000 0.000 0.112 0.022 script.py:15(count_words) 1 0.000 0.000 0.000 0.000 script.py:32(save_results)
Wow! Look at all that useful info!
For each function called, we’re seeing the following information:
ncalls: number of times the function was called
tottime: total time spent in the given function (excluding calls to sub-functions)
cumtime: total time spent in this function and all sub-functions
filename:lineo(function): the file name, line number, and function name
When reading this output, note the fact that we’re hiding a lot of data —in fact, we’re only seeing 6 out of 793 rows. Those hidden rows are all the sub-functions being called from within functions like
re.split. Also, note that the
<module> row corresponds to the code in
script.py that isn’t inside a function.
Now let’s look back at the results, sorted by cumulative duration.
ncalls tottime percall cumtime percall filename:lineno(function) 1 0.008 0.008 5.521 5.521 script.py:1(<module>) 1 0.012 0.012 5.468 5.468 script.py:19(read_books) 5 0.000 0.000 4.848 0.970 script.py:5(get_book) 5 0.000 0.000 0.460 0.092 script.py:11(split_words) 5 0.000 0.000 0.112 0.022 script.py:15(count_words) 1 0.000 0.000 0.000 0.000 script.py:32(save_results)
Keep in mind the hierarchy of function calls. The top-level,
count_words. By comparing cumulative times, we see that most of
<module>’s time is spent in
read_books and most of
read_books’s time is spent in
get_book, where we make our HTTP request, making this script (unsurprisingly) I/O bound.
Next, let’s take a look at how we can be even more granular by profiling our code line-by-line.
Once we’ve used cProfile to get a sense of what function calls are taking the most time, we can examine those functions line-by-line to get an even clearer picture of where our time is being spent.
For this, we’ll need to install the
line-profiler library with the following command:
$ pip install line-profiler
Once installed, we just need to add the
@profile decorator to the function we want to profile. Here’s the updated snippet from our script:
Note the fact that we don’t need to import the
profile decorator function — it will be injected by
Now, to profile our function, we can run the following:
$ kernprof -l -v script-prof.py
kernprof is installed along with
-l flag tells
line-profiler to go line-by-line and the
-v flag tells it to print the result to the terminal rather than save it to a file.
The result for our script would look something like this:
The key column to focus on here is
% Time. As you can see, 89.5% of our time parsing each book is spent in the
get_book function — making the HTTP request — further validation that our program is I/O bound rather than CPU bound.
Now, with this new info in mind, if we wanted to speed up our code we wouldn’t want to waste our time trying to make our word counter more efficient. It only takes a fraction of the time compared to the HTTP request. Instead, we’d focus on speeding up our requests — possibly by making them asynchronously.
Here, the results are hardly surprising, but on a larger and more complicated program,
line-profiler is an invaluable tool in our programming tool belt, allowing us to peer under the hood of our program and find the computational bottlenecks.
In addition to profiling the time-complexity of our program, we can also profile its memory-complexity.
In order to do line-by-line memory profiling, we’ll need to install the
memory-profiler library which also uses the same
@profile decorator to determine which function to profile.
$ pip install memory-profiler$ python -m memory_profiler script.py
The result of running
memory-profiler on our same script should look something like the following:
There are currently some issues with the accuracy of the “Increment” so just focus on the “Mem usage” column for now.
Our script had peak memory usage on line 28 when we split the books up into words.
Hopefully, now you’ll have a few new tools in your programming tool belt to help you write more efficient code and quickly determine how to best spend your optimization-time.
This post was originally published by Austin Poor at Towards Data Science