If you have an example of using fastthreadpool that I can use instead of concurrent.futures.ThreadPoolExecutor or concurrent.futures.ProcessPoolExecutor then I'll try to run a benchmark with that too.
The bulk is indeed in unzipping. But if you've an archive with many small files the overhead of the pool can be 10% or more. And this is much for just handling a pool of threads. The alternative is to use map, where you have to prepare an iterable before calling map. Another alternative is to switch to a faster pool implementation.
The module zipfile is completely written in Python, which comes with a relatively big overhead at Python level, which in turn means the GIL is locked relatively long. The result is a low level of parallelisation. I'm currently writing an archiving tool which uses msgpack and zstd. Both libraries have a very thin Python layer and parallelisation with threads is very good. I get nearly 100% CPU load. The results currently are ~4 faster than zip and compression ratio between zip and lzma. When the tool is finished I'll release it for the public.
Comment
If you have an example of using fastthreadpool that I can use instead of concurrent.futures.ThreadPoolExecutor or concurrent.futures.ProcessPoolExecutor then I'll try to run a benchmark with that too.
Parent comment
The bulk is indeed in unzipping. But if you've an archive with many small files the overhead of the pool can be 10% or more. And this is much for just handling a pool of threads. The alternative is to use map, where you have to prepare an iterable before calling map. Another alternative is to switch to a faster pool implementation. The module zipfile is completely written in Python, which comes with a relatively big overhead at Python level, which in turn means the GIL is locked relatively long. The result is a low level of parallelisation. I'm currently writing an archiving tool which uses msgpack and zstd. Both libraries have a very thin Python layer and parallelisation with threads is very good. I get nearly 100% CPU load. The results currently are ~4 faster than zip and compression ratio between zip and lzma. When the tool is finished I'll release it for the public.