Comment

Ben Truscott

I'd just like to point out that the premise of the test is somewhat flawed, because the various methods have different asymptotic complexity. f5 is fast for short inputs with many duplicates, but it scales poorly, so cannot be said to be universally the best approach. I came across this page while thinking about how to identify (rather than eliminate) duplicates in numeric arrays using NumPy, but after I'd arrived at something several times faster for large inputs. Still, the presentation and comparison between the algorithms is interesting.

Replies

Eric Werner

Thats probably also why Peter wrote "best". And have you seen the script code he linked? There is lots to play around with length of items and loop counts!
It would be nice to see this tested against a reference at least once per testdata batch. Then with different lengths as well. Far too much for a small blog post like this.

I just stumbled across this again and tried out the "new" entries.
I was also amaaazed how incredibly FAST f1 became in Python 3.
then i looked a little closer:

testdata: ['f', 'g', 'c', 'd', 'b', 'a', 'a']
result: dict_keys([]) 😐