You should not use ThreadPoolExecutor, because it has a relatively high overhead and is very slow. Especially if you use submit. Use a different thread pool implementation or if you want to stick with ThreadPoolExecutor use map. A much faster solution is the good old multiprocessing.pool.ThreadPool. Or you could also try my fastthreadpool module (https://github.com/brmmm3/fastthreadpool). This module is faster than ThreadPool and much faster than ThreadPoolExecutor. It also has the advantage that you can use generator functions as worker which is very useful in certain situations. For example code please look into benchmark.py. The doc directory contains some benchmark files which show the overhead difference between the 3 thread pool implementations. I'm also working on a fastprocesspool module but this is still not finished and buggy, First tests have shown that this module is also faster than multiprocessing.Pool and much fast than ProcessPoolExecutor.
What's the alternative to submit? And multiprocessing.Pool might be marginally faster, but isn't the bulk of the computation still in the actual unzipping?
The bulk is indeed in unzipping. But if you've an archive with many small files the overhead of the pool can be 10% or more. And this is much for just handling a pool of threads. The alternative is to use map, where you have to prepare an iterable before calling map. Another alternative is to switch to a faster pool implementation. The module zipfile is completely written in Python, which comes with a relatively big overhead at Python level, which in turn means the GIL is locked relatively long. The result is a low level of parallelisation. I'm currently writing an archiving tool which uses msgpack and zstd. Both libraries have a very thin Python layer and parallelisation with threads is very good. I get nearly 100% CPU load. The results currently are ~4 faster than zip and compression ratio between zip and lzma. When the tool is finished I'll release it for the public.
If you have an example of using fastthreadpool that I can use instead of concurrent.futures.ThreadPoolExecutor or concurrent.futures.ProcessPoolExecutor then I'll try to run a benchmark with that too.
Comment
You should not use ThreadPoolExecutor, because it has a relatively high overhead and is very slow. Especially if you use submit. Use a different thread pool implementation or if you want to stick with ThreadPoolExecutor use map. A much faster solution is the good old multiprocessing.pool.ThreadPool. Or you could also try my fastthreadpool module (https://github.com/brmmm3/fastthreadpool). This module is faster than ThreadPool and much faster than ThreadPoolExecutor. It also has the advantage that you can use generator functions as worker which is very useful in certain situations. For example code please look into benchmark.py. The doc directory contains some benchmark files which show the overhead difference between the 3 thread pool implementations. I'm also working on a fastprocesspool module but this is still not finished and buggy, First tests have shown that this module is also faster than multiprocessing.Pool and much fast than ProcessPoolExecutor.
Replies
What's the alternative to submit?
And multiprocessing.Pool might be marginally faster, but isn't the bulk of the computation still in the actual unzipping?
The bulk is indeed in unzipping. But if you've an archive with many small files the overhead of the pool can be 10% or more. And this is much for just handling a pool of threads. The alternative is to use map, where you have to prepare an iterable before calling map. Another alternative is to switch to a faster pool implementation.
The module zipfile is completely written in Python, which comes with a relatively big overhead at Python level, which in turn means the GIL is locked relatively long. The result is a low level of parallelisation. I'm currently writing an archiving tool which uses msgpack and zstd. Both libraries have a very thin Python layer and parallelisation with threads is very good. I get nearly 100% CPU load. The results currently are ~4 faster than zip and compression ratio between zip and lzma. When the tool is finished I'll release it for the public.
If you have an example of using fastthreadpool that I can use instead of concurrent.futures.ThreadPoolExecutor or concurrent.futures.ProcessPoolExecutor then I'll try to run a benchmark with that too.