IssueTrackerMassContainer is a simple Zope product that is used to put a bunch of IssueTrackerProduct instances into. It doesn't add much apart from a nice looking dashboard that lists all recent issues and then with an AJAX poll it keeps updating automatically.

But what it was doing was it recursively put together all issues across all issue trackers, sorting them and then returning only the first 20. Fine, but once the numbers start to add up it can become a vast sort operation to deal with.

In my local development copy of 814 issues, by the use of pympler and time() I was able to go from 7 Mb taking 2 seconds down to using only 8 Kb and taking 0.05 seconds.

Here's the initial naive version of the code:

    def getRecentIssues(self, since=None, recursive=True, batch_size=20, batch_start=0):
       """ return a list of all the most recent issues """
       issues = self._getAllIssues(self.getRoot())
       if since is not None:
           ... # checking variable since
           issues = [x for x in issues
                     if float(x.getModifyDate()) > since]

       issues.sort(lambda x,y: cmp(y.getModifyDate(), x.getModifyDate()))
       return issues[int(batch_start):int(batch_size)]

So, instead of making a fat list of issue objects, just turn it into a list of the things we really need. The second version:

    def getRecentIssues(self, since=None, recursive=True, batch_size=20, batch_start=0):
       """ return a list of all the most recent issues """
       root = self.getRoot()
       issues = self._getAllIssues(root)
       issues = [(x.getModifyDate(), '/'.join(x.getPhysicalPath())) for x in issues]
       if since is not None:
           ... # checking variable since
           issues = [(t,i) for (t,i) in issues
                     if float(t) > since]

       issues.sort()
       issues.reverse()
       issue_paths = [x[1] for x in issues[int(batch_start):int(batch_size)]]
       return [root.unrestrictedTraverse(p) for p in issue_paths]

The issue method getModifyDate() returns a Zope DateTime instance which is ridiculously nifty datetime implementation but it sucks in memory use and performance. See this blog about how it sucks compared to mxDateTime and standard lib datetime. So, this time, turn it into a float and then sort. Final version:

    def getRecentIssues(self, since=None, recursive=True, batch_size=20, batch_start=0):
       """ return a list of all the most recent issues """
       root = self.getRoot()
       issues = self._getAllIssues(root)
       issues = [(float(x.getModifyDate()), '/'.join(x.getPhysicalPath())) 
                 for x in issues]
       if since is not None:
           ... # checking variable since
           issues = [(t,i) for (t,i) in issues
                     if t > since]

       issues.sort()
       issues.reverse()
       issue_paths = [x[1] for x in issues[int(batch_start):int(batch_size)]]
       return [root.unrestrictedTraverse(p) for p in issue_paths]

And the results for my local copy of 818 issues?:

Version 1:
   7842736 bytes (7.5 Mb)
   2.1547999382 seconds

Version 2:
   834880 bytes (0.79 Mb)
   0.210245847702 seconds

Version 3:
   87448 bytes (85 Kb)
   0.0538010597229 seconds

Granted, Zope will release this memory by the garbage collector but why even let it get that big if you have concurrent hits or if anything gets stuck for longer than necessary. Python 2.4 can free memory used but not return it to the operating system to reuse unless the process dies (this was fixed in Python 2.5).

That's a memory usage improvement of about 90 fold and a speed improvement of about 40 fold.

Fry-IT has huge issuetracker instances and at the time of writing keeps 6668 "active" issues across all projects.

Ross Patterson - 28 February 2010 [«« Reply to this]
It seems like you might be able to do a bit better with generators and the "key" argument to the sort functions:

def key(item):
return (x.getModifyDate(), '/'.join(x.getPhysicalPath()))
return sorted(
(item for item in self._getAllIssues(root)), key=key, reverse=True)[int(batch_start):int(batch_size)]
Matt Chaput - 28 February 2010 [«« Reply to this]
If you really wanted to be efficient, you could use a generator heapq.nlargest to avoid building the full list at all. But I guess a list of floats and strings probably isn't so bad.
Peter Bengtsson - 01 March 2010 [«« Reply to this]
Thanks for the tip. Sorting 6668 tuples only takes about 0.03 seconds on my laptop :)

My current problem when I turn the list of issue objects into a list of tuples. That's taking more than 3 seconds. I'm going to solve it by remembering what the oldest timestamp was of those shown the first time and, as a cached value, use that to never look into containers older than that.
Simon Michael - 28 February 2010 [«« Reply to this]
Nice one. pympler looks awesome, thanks for the example.
Gael Le Mignot - 03 March 2010 [«« Reply to this]
Hello. We are using a quite heavily customized version of an older IssueTracker at Pilot Systems. We have a very huge amount of issues : around 250 trackers with more than 50 000 issues in total.

To be able to cope with that amount of data, we are using the Zope catalog as much as we can, for any dashboard-like feature we developed for our needs.

Did you try using a catalog with an index on the date and doing a catalog(sort_on = getModifyDate)[:20] ? This should be even more efficient since it doesn't load from the ZODB the full Issue objects, and much simpler than having to fiddle with hand-optimizing every single sorting.
Peter Bengtsson - 05 March 2010 [«« Reply to this]
Hi Gael,
First of all I can't maintain two ZCatalog instances since the will only index in the nearest ZCatalog instance object which is the one inside the tracker.

What I could do however, is use each trackers ZCatalog to pull out the 20 most recent issues which means that if you have 10 trackers the total number of things to worry about is 20 x 10 brains which I then sort in an aggregate and from those 20x10 brains I pick out the first 20 and turn them into objects. I'll try that instead. When time allows.


Your email will never ever be published