Fastest way to uniqify a list in Python

Monday, Aug 14, 2006

⬅︎ Back to Fastest way to uniqify a list in Python

Comment

Shane October 6, 2015

Hi All,

I added a new variant of f5 / f5b (see below) that simply swaps the seen items being a dict to use a set. The thinking amongst myself, and colleagues, is a set() would be faster ... running the tests (simply adding below into the downloadable benchmark py) showed the dict (both f5 and f5b) is faster by 10-20% across python versions 2.6 - 3.1 ... does anyone else find this counter intuitive? and have a reason why this is so? (oh, for py3 testing I did remove f7 and used print function too)

# BEGIN f5c
def f5c(seq, idfun=None): # Alex Martelli ******* order preserving
    if idfun is None:
        def idfun(x): return x
    seen = set()
    result = []
    for item in seq:
        marker = idfun(item)
        # in old Python versions:
        # if seen.has_key(marker)
        # but in new ones:
        if marker not in seen:
            seen.add(marker)
            result.append(item)

    return result
# END f5c

TIA,