Here are two perfectly good ways to turn 123456789 into "123,456,789":

import locale

def f1(n):
    locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
    return locale.format('%d', n, True)

def f2(n):
    r = []
    for i, c in enumerate(reversed(str(n))):
        if i and (not (i % 3)):
            r.insert(0, ',')
        r.insert(0, c)
    return ''.join(r)

assert f1(123456789) == '123,456,789'
assert f2(123456789) == '123,456,789'    

Which one do you think is the fastest?

Easy, write a benchmark:

from time import time

for f in (f1, f2):
    t0 = time()
    for i in range(1000000):
    t1 = time()
    print f.func_name, t1 - t0, 'seconds'

And, drumroll, the results are:

peterbe@mpb:~$ python
f1 19.4571149349 seconds
f2 6.30253100395 seconds

The f2 one looks very plain and a good candidate for PyPy:

peterbe@mpb:~$ pypy
f1 14.367814064 seconds
f2 0.77246594429 seconds

...which is 800% speed boost which is cute. It's also kinda ridiculous that each iteration of f2 takes 0.0000008 seconds. What's that!?

An obvious albeit somewhat risky optimization on f1 is this:

import locale
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
def f1(n):
    return locale.format('%d', n, True)

...and now we get:

peterbe@mpb:~$ python
f1 16.3811080456 seconds
f2 6.14097189903 seconds

Before you say it, yes I'm aware the locale can do much more but I was just curious and I scratched it.


Dave points out the built in function format (which was added in Python 2.6). So let's add it and kick ass!

def f3(i):
    return format(i, ',')

And we run the tests again:

peterbe@mpb:~$ python
f1 16.4227910042
f2 6.13625884056
f3 0.892002105713
peterbe@mpb:~$ pypy
f1 4.61941003799
f2 0.720993041992
f3 0.26224398613

There's your winner!

Adam Forsyth - 13 October 2012 [«« Reply to this]
If we're talking about performance, I wouldn't call f2 a good method for this. While it's fine for small numbers, list.insert is O(n) for each insert, making f2 an O(n^2) operation, while the operation can easily be done in O(n) time since it requires only one pass over the data.

Two simple ways to rewrite f2 for O(n) operation are:

1. Use len() to calculate the offset from the left of the first comma so you can work left to right and use list.append instead of list.insert.

2. Use a collections.deque instead of a list to allow O(1) inserts at the beginning of the deque.

There are also a bunch of good solutions to this problem in inlcuding the beautifully simple "'{:,}'.format(value)". I'd be interested to know how various methods perform on small vs. large numbers.
Dave - 13 October 2012 [«« Reply to this]
What about format(n, ',')?
Peter Bengtsson - 14 October 2012 [«« Reply to this]
It's new in 2.7 right? I guess it warrents to be included.
Peter Bengtsson - 14 October 2012 [«« Reply to this]
I actually didn't know about this one. I'm glad you mentioned it!
Senyai - 13 October 2012 [«« Reply to this]
+20% speed

    def f1(n):
        return ''.join(
                c + ','
                if i != 0 and i % 3 == 0 else
                for i, c in enumerate(reversed(str(n)))
Peter Bengtsson - 14 October 2012 [«« Reply to this]
That looks like f2() but laid out slightly differently.
Adam Skutt - 14 October 2012 [«« Reply to this]
There's nothing risky about that "optimization", as it is the right thing to do. The locale settings manged by locale.setlocale() are global. You shouldn't ever be setting them in a locale-using routine like that.
Peter Bengtsson - 14 October 2012 [«« Reply to this]
Bruno ReniƩ adds his version which improves on f2() by almost 100% in the CPython benchmark.
Robert Helmer - 15 October 2012 [«« Reply to this]
I would've used the 2.7 format() for socorro-crashstats but we only have 2.6 on the Mozilla servers (RHEL 6) :( although honestly I care more about readability than perf unless it's shown to be a bottleneck for a use case we care about. However, this is still a very interesting discussion and I don't wish to discourage it :)

I've actually been playing with pypy a bit, even though we're super i/o-bound on Socorro. I think it's be interesting for certain type of analysis that Java is used for now, and also for better alternatives to our approach to threading (e.g. something safer/saner like STM) without having to write in some brutally different style like twisted/tornado/etc

Looking at there's a lot going on there:

This is related to the obvious downside to optimizing this - you are precluding localizing the app so it displays the appropriate digit group separator (note that "thousands" is not always the appropriate group, see
Peter Bengtsson - 15 October 2012 [«« Reply to this]
Performance would only matter if we did a spreadsheet app or something. We don't.

Right, in Swedish for example the price of a chewing gum is: SEK 0,5
Whereas here in the US it's USD 0.5
Neil Rashbrook - 15 October 2012 [«« Reply to this]
locale.setlocale(locale.LC_ALL, '')
def f5(n):
    return format('n', n)

Works in 2.6 (',' needs 2.7).
Peter Bengtsson - 15 October 2012 [«« Reply to this]
See the UPDATE to the post above.
Neil Rashbrook - 16 October 2012 [«« Reply to this]
Sorry, but I don't see any reference to the 'n' format in the post, its update, or any of the other comments.
Peter Bengtsson - 16 October 2012 [«« Reply to this]
Python 2.6.6 (r266:84292, Dec 5 2011, 09:38:23)
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.1.00)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> format('n', 100000000)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: format expects arg 2 to be string or unicode, not int
Neil Rashbrook - 17 October 2012 [«« Reply to this]
My apologies, I was working on two different computers and retyped the code snippet incorrectly. The correct code should of course be:

import locale
locale.setlocale(locale.LC_ALL, '')
def f5(n):
    return format(n, 'n')

Your email will never ever be published