Fastest way to unzip a zip file in Python

Wednesday, Jan 31, 2018

⬅︎ Back to Fastest way to unzip a zip file in Python

Comment

Masklinn February 4, 2018

Peter, if you don't need the entire content of the file in memory (which I guess you don't since you're apparently fine with reading it from disk despite in-memory decompression blowing up) *and* you don't need seekability, you may have missed the `open` method on zipfiles. You'll still be paying for the decompression work, and as noted previously you can only access data sequentially (no seeking backwards) but it *is* lazy, you ask for a 10k chunk of a file, you get 10k in memory (plus some overhead from Python).

Replies

Peter Bengtsson February 4, 2018

Doing an open requires all things to be in-memory. It's a bit faster but would require me to have a much beefier server, which might be the best course of action actually.

Masklinn February 5, 2018

> Doing an open requires all things to be in-memory.

It only requires that the zip file be accessible, either in-memory or on-disk.

Your introduction says that you originally had the zip file *and* the decompressed files in memory, ZipFile.open avoids the latter. Furthermore the small bit of code you post in the conclusion seems to hint that you still have the zip file in memory when you do the extraction.

And again, you can use ZipFile.open just fine if you have the zip file on-disk, it's still lazy.