27 February 2013 0 comments Python
Remember mincss from last month? Well, despite it's rather crazy version number has only really had one major release. And it's never really been optimized.
So I took some metrics and was able to find out where all the time is spent. It's basically in this:
for body in bodies: for each in CSSSelector(selector)(body): return True
That in itself, on its own, is very fast. Just a couple of milliseconds. But the problem was that it happens so god damn often!
So, in version 0.8 it now, by default, first make a list (actually, a set) of every ID and every CLASS name in every node of every HTML document. Then, using this it gingerly tries to avoid having to use
CSSSelector(selector) if the selector is quite simple. For example, if the selector is
#container form td:last-child
and if there is no node with id
container then why bother.
It equally applies the same logic to classes.
And now, what you've all been waiting for; the results:
On a big document (20Kb) like my home page...
BEFORE: 4.7 seconds
AFTER: 0.85 seconds
(I ran it a bunch of times and averaged the times which had very little deviation)
So in the first round of optimization it suddenly becomes 500% faster. Pretty cool!
I've made it possible to switch this off just because I haven't yet tested it on equally many sites. All the unit tests pass of course.