String length truncation optimization difference in Python

19 March 2012   8 comments   Python

Powered by Fusion×

We have a piece of code that is going to be run A LOT on a server infrastructure that needs to be fast. I know that I/O is much more important but because I had the time I wanted to figure out which is fastest:

def a(s, m):
    if len(s) > m:
        s = s[:m]
    return s

...or...

def b(s, m):
    return s[:m]


I wrote a simple benchmark that bombarded these two string manipulation functions with strings that were on average 50% chance longer than the max length. In other words, half of strings sent to these two functions where so short they didn't need to be truncated.

Turns out, there is absolutely no significant difference! I'm not even going to post the timings.

I could go on an repeat the iterations till my CPU starts to smoke but then I'm sure the benchmark is becoming invalid and needs to be re-engineered and by then the realm of the test becomes surreal. Now, carry on with your life of writing real code.

Comments

Jeff Epler
It looks like Python (2.7 here) is optimizing this case by returning the original string:
>>> s = "abc"
>>> s[:50] is s
True

And here's the C code that makes it happen:
1124 static PyObject *
1125 string_slice(register PyStringObject *a, register Py_ssize_t i,
1126 register Py_ssize_t j)
...
1133 if (j > Py_SIZE(a))
1134 j = Py_SIZE(a);
1135 if (i == 0 && j == Py_SIZE(a) && PyString_CheckExact(a)) {
1136 /* It's the same as a */
1137 Py_INCREF(a);
1138 return (PyObject *)a;

That said, I'd predict a measurable difference in performance, due to the extra bytecodes and the len() function call in a, at least if you're using traditional cpython.
Peter Bengtsson
Thanks for that!
Yeah maybe you could see a measurable difference but that would require a workload that vastly exaggerates the conditions of my application.
Chris AtLee
I wonder if b(s,m) is guaranteed to return a new reference, or if id(b(s,m)) == id(s) when len(s) < m.
Peter Bengtsson
See Jeff's interesting response above.

Also,
>>> s='peter'
>>> m=3
>>> id(b(s,m)) == id(s)
False
R. David Murray
In your tests, the Python function call overhead probably swamped any difference in the performance of the code inside the functions.
Peter Bengtsson
Swamped?
R. David Murray
python -m timeit -r 10 -s "def myfunc(x): return x[:10]" "myfunc('abcdefg')"
100000 loops, best of 10: 2.75 usec per loop

./python -m timeit -r 10 "'abcdefg'[:10]"
1000000 loops, best of 10: 0.714 usec per loop

0.714 is 25% of 2.75. So "swamped" might appear to be an overstatement. ("Overwhelm with an excessive amount of something"). However, even with -r 10 I'm getting measurements for the smaller numbers that vary by as much as 50%, so I think swamped is probably appropriate. Better measurements could doubtless be obtained if one had access to a test machine that wasn't doing anything else (I'm measuring on my laptop).

Some additional measurments:
rdmurray@hey:~/python/p33>./python -m timeit -r 10 "'abcdefg'[:3]"
1000000 loops, best of 10: 1.41 usec per loop
rdmurray@hey:~/python/p33>./python -m timeit -r 10 "'abcdefg'[:3]"
1000000 loops, best of 10: 1.41 usec per loop
rdmurray@hey:~/python/p33>./python -m timeit -r 10 "'abcdefg'[:3]"
1000000 loops, best of 10: 1.42 usec per loop
rdmurray@hey:~/python/p33>./python -m timeit -r 10 "'abcdefg'[:10]"
1000000 loops, best of 10: 0.633 usec per loop
rdmurray@hey:~/python/p33>./python -m timeit -r 10 "'abcdefg'[:10]"
1000000 loops, best of 10: 0.714 usec per loop
rdmurray@hey:~/python/p33>./python -m timeit -r 10 "'abcdefg'[:10]"
1000000 loops, best of 10: 0.696 usec per loop
rdmurray@hey:~/python/p33>./python -m timeit -r 10 "'abcdefg'[:10]"
1000000 loops, best of 10: 0.319 usec per loop
rdmurray@hey:~/python/p33>./python -m timeit -r 10 "'abcdefg'[:3]"
1000000 loops, best of 10: 1.35 usec per loop

So, it looks like the slice is indeed faster if longer than the string on CPython (my measurements were on my current checkout of 3.3, by the way). That, however, is a (current) CPython implementation artifact, and not something on which to rely (possibly outside of tuning a particular application for a particular machine/environment).
P Wildani
There is indeed a solid difference, but yeah, it's tiny.
$ python --version
Python 2.7.1

$ python -m timeit -r16 -s 'd=["a"*i for i in range(10000)]; l=len' -- 'for x in d:' ' if l(x) > 5000: x[:5000]'
100 loops, best of 16: 4.94 msec per loop

$ python -m timeit -r16 -s 'd=["a"*i for i in range(10000)]' -- 'for x in d:' ' x[:5000]'
100 loops, best of 16: 4.14 msec per loop

The numbers were pretty stable (around +-.4 ms, so +- 40ns per actual call at 10000 calls per loop) for several runs. These were the two fastest measurements for each approach. Not worth worrying about because if you've got an inner loop that tight, you need to either switch to pypy or write it in C.

Overhead:
$ python -m timeit -r16 -s 'd=["a"*i for i in range(10000)]; l=len' -- 'for x in d: pass'
1000 loops, best of 16: 599 usec per loop
Thank you for posting a comment

Your email will never ever be published


Related posts

Previous:
When to __deepcopy__ classes in Python 14 March 2012
Next:
How much faster is Nginx+gunicorn than Apache+mod_wsgi? 22 March 2012
Related by keywords:
To readline() or readlines() 12 March 2004
bool is instance of int in Python 05 December 2008
Reciprocal lesson about gender perspectives 02 September 2011
Nginx vs. Squid 17 March 2009
IssueTrackerProduct now officially abandoned 30 March 2012
How and why to use django-mongokit (aka. Django to MongoDB) 08 March 2010
On the command line no one can hear you screen. Or can they? 03 May 2012
Nasty surprise of Django cache 09 December 2008
Random ID generator for Zope 02 September 2005
Google Calendar, iCalendar Validator but not bloody Apple iCal 09 April 2009
In Django, how much faster is it to aggregate? 27 October 2010
tempfile in Python standard library 07 February 2006