How to do performance micro benchmarks in Python

24 June 2017   4 comments   Python

Powered by Fusion×

Suppose that you have a function and you wonder, "Can I make this faster?" Well, you might already have thought that and you might already have a theory. Or two. Or three. Your theory might be sound and likely to be right, but before you go anywhere with it you need to benchmark it first. Here are some tips and scaffolding for doing Python function benchmark comparisons.

Tenets

  1. Internally, Python will warm up and it's likely that your function depends on other things such as databases or IO. So it's important that you don't test function1 first and then function2 immediately after because function2 might benefit from a warm up painfully paid for by function1. So mix up the order of them or cycle through them enough that they all pay for or gain from warm ups.

  2. Look at the median first. The mean (aka. average) is often tainted by spikes and these spikes of slow-down can be caused by your local Spotify client deciding to reindex itself or something some such. Sometimes those spikes matter. For example, garbage collection is inevitable and will have an effect that matters.

  3. Run your functions many times. So many times that the whole benchmark takes a while. Like tens of seconds or more. Also, if you run it significantly long it's likely that all candidates gets punished by the same environmental effects such as garbage collection or CPU being reassinged to something else intensive on your computer.

  4. Try to take your benchmark into different, and possibly more realistic environments. For example, don't rely on reading a file like /Users/peterbe/only/on/my/macbook when, likely, the end destination for your code is an Ubuntu server in AWS. Write your code so that it's easy to copy and paste around, like into a vi/jed editor in an ssh session somewhere.

  5. Sanity check each function before benchmarking them. No need for pytest or anything fancy but just make sure that you test them in some basic way. But the assertion testing is likely to add to the total execution time so don't do it when running the functions.

  6. Avoid "prints" inside the time measured code. A print() is I/O and an "external resource" that can become very unfair to compare CPU bound performance.

  7. Don't fear testing many different functions. If you have multiple ideas of doing a function differently, it's cheap to pile them on. But be careful how you "report" because if there are many different ways of doing something you might accidentally compare different fruit without noticing.

  8. Make sure your functions take at least one parameter. I'm no Python core developer or C hacker but I know there are "murks" within a compiler and interpreter that might do what a regular memoizer might done. Also, the performance difference can be reversed on tiny inputs compared to really large ones.

  9. Be humble with the fact that 0.01 milliseconds difference when doing 10,000 iterations is probably not worth writing a more complex and harder-to-debug function.

The Boilerplate

Let's demonstrate with an example:

# The functions to compare
import math


def f1(degrees):
    return math.cos(degrees)


def f2(degrees):
    e = 2.718281828459045
    return (
        (e**(degrees * 1j) + e**-(degrees * 1j)) / 2
    ).real


# Assertions
assert f1(100) == f2(100) == 0.862318872287684
assert f1(1) == f2(1) == 0.5403023058681398


# Reporting
import time
import random
import statistics

functions = f1, f2
times = {f.__name__: [] for f in functions}

for i in range(100000):  # adjust accordingly so whole thing takes a few sec
    func = random.choice(functions)
    t0 = time.time()
    func(i)
    t1 = time.time()
    times[func.__name__].append((t1 - t0) * 1000)

for name, numbers in times.items():
    print('FUNCTION:', name, 'Used', len(numbers), 'times')
    print('\tMEDIAN', statistics.median(numbers))
    print('\tMEAN  ', statistics.mean(numbers))
    print('\tSTDEV ', statistics.stdev(numbers))

Let's break that down a bit.

You run that and get something like this:

FUNCTION: f1 Used 49990 times
    MEDIAN 0.0
    MEAN   0.00045161219591330375
    STDEV  0.0011268475946446341
FUNCTION: f2 Used 50010 times
    MEDIAN 0.00095367431640625
    MEAN   0.0009188626294516487
    STDEV  0.000642871632138125

More Examples

The example above already broke one of the tenets in that these functions were simply too fast. Doing rather basic mathematics is just too fast to compare with such a trivial benchmark. Here are some other examples:

Remove duplicates from list without losing order

# The functions to compare


def f1(seq):
    checked = []
    for e in seq:
        if e not in checked:
            checked.append(e)
    return checked


def f2(seq):
    checked = []
    seen = set()
    for e in seq:
        if e not in seen:
            checked.append(e)
            seen.add(e)
    return checked


def f3(seq):
    checked = []
    [checked.append(i) for i in seq if not checked.count(i)]
    return checked


def f4(seq):
    seen = set()
    return [x for x in seq if x not in seen and not seen.add(x)]


def f5(seq):
    def generator():
        seen = set()
        for x in seq:
            if x not in seen:
                seen.add(x)
                yield x

    return list(generator())


# Assertion
import random

def _random_seq(length):
    seq = []
    for _ in range(length):
        seq.append(random.choice(
            'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
        ))
    return seq


L = list('abca')
assert f1(L) == f2(L) == f3(L) == f4(L) == f5(L) == list('abc')
L = _random_seq(10)
assert f1(L) == f2(L) == f3(L) == f4(L) == f5(L)

# Reporting
import time
import statistics

functions = f1, f2, f3, f4, f5
times = {f.__name__: [] for f in functions}

for i in range(3000):
    seq = _random_seq(i)
    for _ in range(len(functions)):
        func = random.choice(functions)
        t0 = time.time()
        func(seq)
        t1 = time.time()
        times[func.__name__].append((t1 - t0) * 1000)

for name, numbers in times.items():
    print('FUNCTION:', name, 'Used', len(numbers), 'times')
    print('\tMEDIAN', statistics.median(numbers))
    print('\tMEAN  ', statistics.mean(numbers))
    print('\tSTDEV ', statistics.stdev(numbers))

Results:

FUNCTION: f1 Used 3029 times
    MEDIAN 0.6871223449707031
    MEAN   0.6917867380307822
    STDEV  0.42611748137761174
FUNCTION: f2 Used 2912 times
    MEDIAN 0.054955482482910156
    MEAN   0.05610262627130026
    STDEV  0.03000829926668248
FUNCTION: f3 Used 2985 times
    MEDIAN 1.4472007751464844
    MEAN   1.4371055654145566
    STDEV  0.888658217522005
FUNCTION: f4 Used 2965 times
    MEDIAN 0.051975250244140625
    MEAN   0.05343245816673035
    STDEV  0.02957275548477728
FUNCTION: f5 Used 3109 times
    MEDIAN 0.05507469177246094
    MEAN   0.05678296204202234
    STDEV  0.031521596461048934

Winner:

def f4(seq):
    seen = set()
    return [x for x in seq if x not in seen and not seen.add(x)]

Fastest way to count the number of lines in a file

# The functions to compare
import codecs
import subprocess


def f1(filename):
    count = 0
    with codecs.open(filename, encoding='utf-8', errors='ignore') as f:
        for line in f:
            count += 1
    return count


def f2(filename):
    with codecs.open(filename, encoding='utf-8', errors='ignore') as f:
        return len(f.read().splitlines())


def f3(filename):
    return int(subprocess.check_output(['wc', '-l', filename]).split()[0])


# Assertion
filename = 'big.csv'
assert f1(filename) == f2(filename) == f3(filename) == 9999


# Reporting
import time
import statistics
import random

functions = f1, f2, f3
times = {f.__name__: [] for f in functions}

filenames = 'dummy.py', 'hacker_news_data.txt', 'yarn.lock', 'big.csv'
for _ in range(200):
    for fn in filenames:
        for func in functions:
            t0 = time.time()
            func(fn)
            t1 = time.time()
            times[func.__name__].append((t1 - t0) * 1000)

for name, numbers in times.items():
    print('FUNCTION:', name, 'Used', len(numbers), 'times')
    print('\tMEDIAN', statistics.median(numbers))
    print('\tMEAN  ', statistics.mean(numbers))
    print('\tSTDEV ', statistics.stdev(numbers))

Results:

FUNCTION: f1 Used 800 times
    MEDIAN 5.852460861206055
    MEAN   25.403797328472137
    STDEV  37.09347378640582
FUNCTION: f2 Used 800 times
    MEDIAN 0.45299530029296875
    MEAN   2.4077045917510986
    STDEV  3.717931526478758
FUNCTION: f3 Used 800 times
    MEDIAN 2.8804540634155273
    MEAN   3.4988239407539368
    STDEV  1.3336427480808102

Winner:

def f2(filename):
    with codecs.open(filename, encoding='utf-8', errors='ignore') as f:
        return len(f.read().splitlines())

Conclusion

No conclusion really. Just wanted to point out that this is just a hint of a decent start when doing performance benchmarking of functions.

There is also the timeit built-in for "provides a simple way to time small bits of Python code" but it has the disadvantage that your functions are not allowed to be as complex. Also, it's harder to generate multiple different fixtures to feed your functions without that fixture generation effecting the times.

There's a lot of things that this boilerplate can improve such as sorting by winner, showing percentages comparisons against the fastests, ASCII graphs, memory allocation differences, etc. That's up to you.

Comments

Boris Feld
Nice article! I would recommend using the perf module for micro benchmarks: https://perf.readthedocs.io/en/latest/examples.html

The author made great work trying to have stable benchmarks: https://haypo.github.io/journey-to-stable-benchmark-average.html
Peter Bengtsson
I'm so glad I blogged. So I would get a comment like that. Thanks!
Kamil
Really useful, well written and comprehensive article. Thank you for you effort!
Shay Elmualem
This is great, easy to understand and useful! Thanks!
Thank you for posting a comment

Your email will never ever be published


Related posts

Previous:
Fastest way to find out if a file exists in S3 (with boto3) 16 June 2017
Next:
Best YouTube Chefs of 2017 14 July 2017
Related by Keyword:
Ultrafast loading of CSS 01 September 2017
Fastest *local* cache backend possible for Django 04 August 2017
Why I'm ditching AdBuff on SongSear.ch 20 July 2017
More CSS load performance experiments (of a React app) 25 March 2017
Optimization of QuerySet.get() with or without select_related 03 November 2016
Related by Text:
Really simple Django view function timer decorator 08 December 2017
Don't forget your sets in Python! 10 March 2017
Optimization of QuerySet.get() with or without select_related 03 November 2016
Fastest way to match a filename's extension in Python 31 August 2017
Decorated Concurrency - Python multiprocessing made really really easy 13 May 2016