20 March 2018 0 comments Python
tl;dr; The new 0.12.0 version of
hashin is between 6 and 30 times faster.
Version 0.12.0 is exciting because it switches from using
https://pypi.org/pypi/<package>/json so it's using the new PyPI. I only last week found out about the JSON containing
.digest.sha256 as part of the JSON even though apparently it's been there for almost a year!
Prior to version 0.12.0, what
hashin used to do is download every tarball and .whl file and run
pip on it, in Python, to get the checksum hash. Now, if you use the default
sha256, that checksum value is immediately available right there in the JSON, for every file per release. This is especially important for binary packages (
lxml for example) where it has to download a lot.
To test this, I cleared my temp directory of any previously downloaded
lxml-* files and used
hashin 0.11.5 to fill a
▶ hashin --version 0.11.5 ▶ time hashin Django hashin Django 0.48s user 0.14s system 12% cpu 5.123 total ▶ time hashin lxml hashin lxml 1.61s user 0.59s system 8% cpu 25.361 total
In other words, 5.1 seconds to get the hashes for
Django and 25.4 seconds for
Now, let's do it with the new 0.12.0
▶ hashin --version 0.12.0 ▶ mv requirements.txt 0.11.5-requirements.txt ; touch requirements.txt ▶ time hashin Django hashin Django 0.34s user 0.06s system 46% cpu 0.860 total ▶ time hashin lxml hashin lxml 0.35s user 0.06s system 44% cpu 0.909 total
So, instead of 5.1 seconds, now it only takes 0.9 seconds. And instead of 25.4 seconds, now it only takes 0.9 seconds.
Note, the old code that downloads and runs
pip is still there. It kicks in when you request a digest checksum that is not included in the JSON. For example...:
▶ hashin --version 0.12.0 ▶ time hashin --algorithm sha512 lxml hashin --algorithm sha512 lxml 1.56s user 0.64s system 5% cpu 38.171 total
(The reason this took 38 seconds instead of 25 in the run above is because of the unpredictability of the speed of my home broadband)