How to rotate a video on OSX with ffmpeg
January 3, 2018
5 comments Linux, MacOSX
Every now and then, I take a video with my iPhone and even though I hold the camera in landscape mode, the video gets recorded in portrait mode. Probably because it somehow started in portrait and didn't notice that I rotated the phone.
So I'm stuck with a 90° video. Here's how I rotate it:
ffmpeg -i thatvideo.mov -vf "transpose=2" ~/Desktop/thatvideo.mov
then I check that ~/Desktop/thatvideo.mov
looks like it should.
I can't remember where I got this command originally but I've been relying on my bash history for a looong time so it's best to write this down.
The "transpose=2"
means 90° counter clockwise. "transpose=1"
means 90° clockwise.
What is ffmpeg
??
If you're here because you Googled it and you don't know what ffmpeg
is, it's a command line program where you can "programmatically" do almost anything to videos such as conversion between formats, put text in, chop and trim videos. To install it, install Homebrew then type:
brew install ffmpeg
Unzip benchmark on AWS EC2 c3.large vs c4.large
November 29, 2017
18 comments Python, Linux, Mozilla, Go
This web app I'm working on gets a blob of bytes from a HTTP POST. The nature of the blob is a 100MB to 1,100MB blob of a zip file. What my app currently does is that it takes this byte buffer, uses Python's built in zipfile
to extract all its content to a temporary directory. A second function then loops over the files within this extracted tree and processes each file in multiple threads with concurrent.futures.ThreadPoolExecutor
. Here's the core function itself:
@metrics.timer_decorator('upload_dump_and_extract')
def dump_and_extract(root_dir, file_buffer):
zf = zipfile.ZipFile(file_buffer)
zf.extractall(root_dir)
So far so good.
Speed Speed Speed
I quickly noticed that this is amounting to quite a lot of time spent doing the unzip and the writing to disk. What to do????
At first I thought I'd shell out to good old unzip
. E.g. unzip -d /tmp/tempdirextract /tmp/input.zip
but that has two flaws:
1) I'd first have to dump the blob of bytes to disk and do the overhead of shelling out (i.e. Python subprocess
)
2) It's actually not faster. Did some experimenting and got the same results at Alex Martelli in this Stackoverflow post
What about disk speed? Yeah, this is likely to be a part of the total time. The servers that run the symbols.mozilla.org
service runs on AWS EC2 c4.large
. This only has EBS (Elastic Block Storage). However, AWS EC2 c3.large
looks interesting since it's using SSD disks. That's probably a lot faster. Right?
Note! For context, the kind of .zip files I'm dealing with contain many small files and often 1-2 really large ones.
EC2s Benchmarking
I create two EC2 nodes to experiment on. One c3.large
and one c4.large
. Both running Ubuntu 16.04.
Next, I have this little benchmarking script which loops over a directory full of .zip files between 200MB-600MB large. Roughly 10 of them. It then loads each one, one at a time, into memory and calls the dump_and_extract
. Let's run it on each EC2 instance:
On c4.large
c4.large$ python3 fastest-dumper.py /tmp/massive-symbol-zips 138.2MB/s 291.1MB 2.107s 146.8MB/s 314.5MB 2.142s 144.8MB/s 288.2MB 1.990s 84.5MB/s 532.4MB 6.302s 146.6MB/s 314.2MB 2.144s 136.5MB/s 270.7MB 1.984s 85.9MB/s 518.9MB 6.041s 145.2MB/s 306.8MB 2.113s 127.8MB/s 138.7MB 1.085s 107.3MB/s 454.8MB 4.239s 141.6MB/s 251.2MB 1.774s Average speed: 127.7MB/s Median speed: 138.2MB/s Average files created: 165 Average directories created: 129
On c3.large
c3.large$ python3 fastest-dumper.py -t /mnt/extracthere /tmp/massive-symbol-zips 105.4MB/s 290.9MB 2.761s 98.1MB/s 518.5MB 5.287s 108.1MB/s 251.2MB 2.324s 112.5MB/s 294.3MB 2.615s 113.7MB/s 314.5MB 2.767s 106.3MB/s 291.5MB 2.742s 104.8MB/s 291.1MB 2.778s 114.6MB/s 248.3MB 2.166s 114.2MB/s 248.2MB 2.173s 105.6MB/s 298.1MB 2.823s 106.2MB/s 297.6MB 2.801s 98.6MB/s 521.4MB 5.289s Average speed: 107.3MB/s Median speed: 106.3MB/s Average files created: 165 Average directories created: 127
What the heck!? The SSD based instance is 23% slower!
I ran it a bunch of times and the average and median numbers are steady. c4.large
is faster than c3.large
at unzipping large blobs to disk. So much for that SSD!
Something Weird Is Going On
It's highly likely that the unzipping work is CPU bound and that most of those, for example, 5 seconds is spent unzipping and only a small margin is the time it takes to write to disk.
If the unzipping CPU work is the dominant "time consumer" why is there a difference at all?!
Or, is the "compute power" the difference between c3 and c4 and disk writes immaterial?
For the record, this test clearly demonstrates that the locally mounted SSD drive is 600% faster than ESB.
c3.large$ dd if=/dev/zero of=/tmp/1gbtest bs=16k count=65536 65536+0 records in 65536+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 16.093 s, 66.7 MB/s c3.large$ sudo dd if=/dev/zero of=/mnt/1gbtest bs=16k count=65536 65536+0 records in 65536+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.62728 s, 409 MB/s
Let's try again. But instead of using c4.large
and c3.large
, let's use the beefier c4.4xlarge
and c3.4xlarge
. Both have 16 vCPUs.
c4.4xlarge
c4.4xlarge$ python3 fastest-dumper.py /tmp/massive-symbol-zips 130.6MB/s 553.6MB 4.238s 149.2MB/s 297.0MB 1.991s 129.1MB/s 529.8MB 4.103s 116.8MB/s 407.1MB 3.486s 147.3MB/s 306.1MB 2.077s 151.9MB/s 248.2MB 1.634s 140.8MB/s 292.3MB 2.076s 146.8MB/s 288.0MB 1.961s 142.2MB/s 321.0MB 2.257s Average speed: 139.4MB/s Median speed: 142.2MB/s Average files created: 148 Average directories created: 117
c3.4xlarge
c3.4xlarge$ python3 fastest-dumper.py -t /mnt/extracthere /tmp/massive-symbol-zips 95.1MB/s 502.4MB 5.285s 104.1MB/s 303.5MB 2.916s 115.5MB/s 313.9MB 2.718s 105.5MB/s 517.4MB 4.904s 114.1MB/s 288.1MB 2.526s 103.3MB/s 555.9MB 5.383s 114.0MB/s 288.0MB 2.526s 109.2MB/s 251.2MB 2.300s 108.0MB/s 291.0MB 2.693s Average speed: 107.6MB/s Median speed: 108.0MB/s Average files created: 150 Average directories created: 119
What's going on!? The time it takes to unzip and write to disk is, on average, the same for c3.large
as c3.4xlarge
!
Is Go Any Faster?
I need a break. As mentioned above, the unzip
command line program is not any better than doing it in Python. But Go is faster right? Right?
Please first accept that I'm not a Go programmer even though I can use it to build stuff but really my experience level is quite shallow.
Here's the Go version. Critical function that does the unzipping and extraction to disk here:
func DumpAndExtract(dest string, buffer []byte, name string) {
size := int64(len(buffer))
zipReader, err := zip.NewReader(bytes.NewReader(buffer), size)
if err != nil {
log.Fatal(err)
}
for _, f := range zipReader.File {
rc, err := f.Open()
if err != nil {
log.Fatal(err)
}
defer rc.Close()
fpath := filepath.Join(dest, f.Name)
if f.FileInfo().IsDir() {
os.MkdirAll(fpath, os.ModePerm)
} else {
// Make File
var fdir string
if lastIndex := strings.LastIndex(fpath, string(os.PathSeparator)); lastIndex > -1 {
fdir = fpath[:lastIndex]
}
err = os.MkdirAll(fdir, os.ModePerm)
if err != nil {
log.Fatal(err)
}
f, err := os.OpenFile(
fpath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, f.Mode())
if err != nil {
log.Fatal(err)
}
defer f.Close()
_, err = io.Copy(f, rc)
if err != nil {
log.Fatal(err)
}
}
}
}
And the measurement is done like this:
size := int64(len(content))
t0 := time.Now()
DumpAndExtract(tmpdir, content, filename)
t1 := time.Now()
speed := float64(size) / t1.Sub(t0).Seconds()
It's not as sophisticated (since it's only able to use /tmp
) but let's just run it see how it compares to Python:
c4.4xlarge$ mkdir ~/GO c4.4xlarge$ export GOPATH=~/GO c4.4xlarge$ go get github.com/pyk/byten c4.4xlarge$ go build unzips.go c4.4xlarge$ ./unzips /tmp/massive-symbol-zips 56MB/s 407MB 7.27804954 74MB/s 321MB 4.311504933 75MB/s 288MB 3.856798853 75MB/s 292MB 3.90972474 81MB/s 248MB 3.052652168 58MB/s 530MB 9.065985117 59MB/s 554MB 9.35237202 75MB/s 297MB 3.943132388 74MB/s 306MB 4.147176578 Average speed: 70MB/s Median speed: 81MB/s
So... Go is, on average, 40% slower than Python in this scenario. Did not expect that.
In Conclusion
No conclusion. Only confusion.
I thought this would be a lot clearer and more obvious. Yeah, I know it's crazy to measure two things at the same time (unzip and disk write) but the whole thing started with a very realistic problem that I'm trying to solve. The ultimate question was; will the performance benefit from us moving the web servers from AWS EC2 c4.large
to c3.large
and I think the answer is no.
UPDATE (Nov 30, 2017)
Here's a horrible hack that causes the extraction to always go to /dev/null
:
class DevNullZipFile(zipfile.ZipFile):
def _extract_member(self, member, targetpath, pwd):
# member.is_dir() only works in Python 3.6
if member.filename[-1] == '/':
return targetpath
dest = '/dev/null'
with self.open(member, pwd=pwd) as source, open(dest, "wb") as target:
shutil.copyfileobj(source, target)
return targetpath
def dump_and_extract(root_dir, file_buffer, klass):
zf = klass(file_buffer)
zf.extractall(root_dir)
And here's the outcome of running that:
c4.4xlarge$ python3 fastest-dumper.py --dev-null /tmp/massive-symbol-zips 170.1MB/s 297.0MB 1.746s 168.6MB/s 306.1MB 1.815s 147.1MB/s 553.6MB 3.765s 132.1MB/s 407.1MB 3.083s 145.6MB/s 529.8MB 3.639s 175.4MB/s 248.2MB 1.415s 163.3MB/s 321.0MB 1.965s 162.1MB/s 292.3MB 1.803s 168.5MB/s 288.0MB 1.709s Average speed: 159.2MB/s Median speed: 163.3MB/s Average files created: 0 Average directories created: 0
I ran it a few times to make sure the numbers are stable. They are. This is on the c4.4xlarge
.
So, the improvement of writing to /dev/null
instead of the ESB /tmp
is 15%. Kinda goes to show how much of the total time is spent reading the ZipInfo
file object.
For the record, the same comparison on the c3.4xlarge
was 30% improvement when using /dev/null
.
Also for the record, if I replace that line shutil.copyfileobj(source, target)
above with pass
, the average speed goes from 159.2MB/s to 112.8GB/s but that's not a real value of any kind.
UPDATE (Nov 30, 2017)
Here's the same benchmark using c5.4xlarge
instead. So, still EBS but...
"3.0 GHz Intel Xeon Platinum processors with new Intel Advanced Vector Extension 512 (AVX-512) instruction set"
Let's run it on this supposedly faster CPU:
c5.4xlarge$ python3 fastest-dumper.py /tmp/massive-symbol-zips 165.6MB/s 314.6MB 1.900s 163.3MB/s 287.7MB 1.762s 155.2MB/s 278.6MB 1.795s 140.9MB/s 513.2MB 3.643s 137.4MB/s 556.9MB 4.052s 134.6MB/s 531.0MB 3.946s 165.7MB/s 314.2MB 1.897s 158.1MB/s 301.5MB 1.907s 151.6MB/s 253.8MB 1.674s 146.9MB/s 502.7MB 3.422s 163.7MB/s 288.0MB 1.759s Average speed: 153.0MB/s Median speed: 155.2MB/s Average files created: 150 Average directories created: 119
So that is, on average, 10% faster than c4.4xlarge
.
Is it 10% more expensive? For a 1-year reserved instance, it's $0.796 versus $0.68 respectively. I.e. 15% more expensive. In other words, in this context it's 15% more $$$ for 10% more processing power.
UPDATE (Jan 24, 2018)
I can almost not believe it!
Thanks you Oliver who discovered (see comment below) a blaring mistake in my last conclusion. The for reserved instances (which is what we use on my Mozilla production servers) the c5.4xlarge
is actually cheaper than c4.4xlarge
. What?!
In my previous update I compared c4.4xlarge
and c5.4xlarge
and concluded that c5.4xlarge
is 10% faster but 15% more expensive. That actually made sense. Fancier servers, more $$$. But it's not like that in the real world. See for yourself:
How to create-react-app with Docker
November 17, 2017
31 comments Linux, Web development, JavaScript, React, Docker
Why would you want to use Docker to do React app work? Isn't Docker for server-side stuff like Python and Golang etc? No, all the benefits of Docker apply to JavaScript client-side work too.
So there are three main things you want to do with create-react-app
; dev server, running tests and creating build artifacts. Let's look at all three but using Docker.
Create-react-app first
If you haven't already, install create-react-app
globally:
▶ yarn global add create-react-app
And, once installed, create a new project:
▶ create-react-app docker-create-react-app ...lots of output... ▶ cd docker-create-react-app ▶ ls README.md node_modules package.json public src yarn.lock
We won't need the node_modules
here in the project directory. Instead, when building the image we're going let node_modules
stay inside the image. So you can go ahead and... rm -fr node_modules
.
Create the Dockerfile
Let's just dive in. This Dockerfile
is the minimum:
FROM node:8 ADD yarn.lock /yarn.lock ADD package.json /package.json ENV NODE_PATH=/node_modules ENV PATH=$PATH:/node_modules/.bin RUN yarn WORKDIR /app ADD . /app EXPOSE 3000 EXPOSE 35729 ENTRYPOINT ["/bin/bash", "/app/run.sh"] CMD ["start"]
A couple of things to notice here.
First of all we're basing this on the official Node v8 repository on Docker Hub. That gives you a Node and Yarn by default.
Note how the NODE_PATH
environment variable puts the node_modules
in the root of the container. That's so that it doesn't get added in "here" (i.e. the current working directory). If you didn't do this, the node_modules
directory would be part of the mounted volume which not only slows down Docker (since there are so many files) it also isn't necessary to see those files.
Note how the ENTRYPOINT
points to run.sh
. That's a file we need to create too, alongside the Dockerfile
file.
#!/usr/bin/env bash set -eo pipefail case $1 in start) # The '| cat' is to trick Node that this is an non-TTY terminal # then react-scripts won't clear the console. yarn start | cat ;; build) yarn build ;; test) yarn test $@ ;; *) exec "$@" ;; esac
Lastly, as a point of convenience, note that the default CMD
is "start"
. That's so that when you simply run the container the default thing it does is to run yarn start
.
Build container
Now let's build it:
▶ docker image build -t react:app .
The -t react:app
is up to you. It doesn't matter so much what it is unless you're going to upload your container the a registry. Then you probably want the repository to be something unique.
Let's check that the build is there:
▶ docker image ls react:app REPOSITORY TAG IMAGE ID CREATED SIZE react app 3ee5c7596f57 13 minutes ago 996MB
996MB! The base Node image is about ~700MB and the node_modules
directory (for a clean new create-react-app
) is ~160MB (at the time of writing). What the remaining difference is, I'm not sure. But it's empty calories and easy to lose. When you blow away the built image (docker image rmi react:app
) your hard drive gets all that back and no actual code is lost.
Before we run it, lets go inside and see what was created:
▶ docker container run -it react:app bash root@996e708a30c4:/app# ls Dockerfile README.md package.json public run.sh src yarn.lock root@996e708a30c4:/app# du -sh /node_modules/ 148M /node_modules/ root@996e708a30c4:/app# sw-precache Total precache size is about 355 kB for 14 resources. service-worker.js has been generated with the service worker contents.
The last command (sw-precache
) was just to show that executables in /node_modules/.bin
are indeed on the $PATH
and can be run.
Run container
Now to run it:
▶ docker container run -it -p 3000:3000 react:app yarn run v1.3.2 $ react-scripts start Starting the development server... Compiled successfully! You can now view docker-create-react-app in the browser. Local: http://localhost:3000/ On Your Network: http://172.17.0.2:3000/ Note that the development build is not optimized. To create a production build, use yarn build.
Pretty good. Open http://localhost:3000
in your browser and you should see the default create-react-app
app.
Next step; Warm reloading
create-react-app
does not support hot reloading of components. But it does support web page reloading. As soon as a local file is changed, it sends a signal to the browser (using WebSockets) to tell it to... document.location.reload()
.
To make this work, we need to do two things:
1) Mount the current working directory into the Docker container
2) Expose the WebSocket port
The WebSocket thing is set up by exposing port 35729 to the host (-p 35729:35729
).
Below is an example running this with a volume mount and both ports exposed.
▶ docker container run -it -p 3000:3000 -p 35729:35729 -v $(pwd):/app react:app yarn run v1.3.2 $ react-scripts start Starting the development server... Compiled successfully! You can now view docker-create-react-app in the browser. Local: http://localhost:3000/ On Your Network: http://172.17.0.2:3000/ Note that the development build is not optimized. To create a production build, use yarn build. Compiling... Compiled successfully! Compiling... Compiled with warnings. ./src/App.js Line 7: 'neverused' is assigned a value but never used no-unused-vars Search for the keywords to learn more about each warning. To ignore, add // eslint-disable-next-line to the line before. Compiling... Failed to compile. ./src/App.js Module not found: Can't resolve './Apps.css' in '/app/src'
In the about example output. First I make a harmless save in the src/App.js
file just to see that the dev server notices and that my browser reloads when I did that. That's where it says
Compiling... Compiled successfully!
Secondly, I make an edit that triggers a warning. That's where it says:
Compiling... Compiled with warnings. ./src/App.js Line 7: 'neverused' is assigned a value but never used no-unused-vars Search for the keywords to learn more about each warning. To ignore, add // eslint-disable-next-line to the line before.
And lastly I make an edit by messing with the import line
Compiling... Failed to compile. ./src/App.js Module not found: Can't resolve './Apps.css' in '/app/src'
This is great! Isn't create-react-app
wonderful?
Build build :)
There are many things you can do with the code you're building. Let's pretend that the intention is to build a single-page-app and then take the static assets (including the index.html
) and upload them to a public CDN or something. To do that we need to generate the build
directory.
The trick here is to run this with a volume mount so that when it creates /app/build
(from the perspective) of the container, that directory effectively becomes visible in the host.
▶ docker container run -it -v $(pwd):/app react:app build yarn run v1.3.2 $ react-scripts build Creating an optimized production build... Compiled successfully. File sizes after gzip: 35.59 KB build/static/js/main.591fd843.js 299 B build/static/css/main.c17080f1.css The project was built assuming it is hosted at the server root. To override this, specify the homepage in your package.json. For example, add this to build it for GitHub Pages: "homepage" : "http://myname.github.io/myapp", The build folder is ready to be deployed. You may serve it with a static server: yarn global add serve serve -s build Done in 5.95s.
Now, on the host:
▶ tree build build ├── asset-manifest.json ├── favicon.ico ├── index.html ├── manifest.json ├── service-worker.js └── static ├── css │ ├── main.c17080f1.css │ └── main.c17080f1.css.map ├── js │ ├── main.591fd843.js │ └── main.591fd843.js.map └── media └── logo.5d5d9eef.svg 4 directories, 10 files
The contents of that file you can now upload to a CDN some public Nginx server that points to this as the root directory.
Running tests
This one is so easy and obvious now.
▶ docker container run -it -v $(pwd):/app react:app test
Note the that we're setting up a volume mount here again. Since the test runner is interactive it sits and waits for file changes and re-runs tests immediately, it's important to do the mount now.
All regular jest options work too. For example:
▶ docker container run -it -v $(pwd):/app react:app test --coverage ▶ docker container run -it -v $(pwd):/app react:app test --help
Debugging the node_modules
First of all, when I say "debugging the node_modules
", in this context, I'm referring to messing with node_modules
whilst running tests or running the dev server.
One way to debug the node_modules
used is to enter a bash shell and literally mess with the files inside it. First, start the dev server (or start the test runner) and give the container a name:
▶ docker container run -it -p 3000:3000 -p 35729:35729 -v $(pwd):/app --name mydebugging react:app
Now, in a separate terminal start bash
in the container:
▶ docker exec -it mydebugging bash
Once you're in you can install an editor and start editing files:
root@2bf8c877f788:/app# apt-get update && apt-get install jed root@2bf8c877f788:/app# jed /node_modules/react/index.js
As soon as you make changes to any of the files, the dev server should notice and reload.
When you stop the container all your changes will be reset. So if you had to sprinkle the node_modules
with console.log('WHAT THE HECK!')
all of those disappear when the container is stopped.
NodeJS shell
This'll come as no surprise by now. You basically run bash
and you're there:
▶ docker container run -it -v $(pwd):/app react:app bash root@2a21e8206a1f:/app# node > [] + 1 '1'
Conclusion
When I look back at all the commands above, I can definitely see how it's pretty intimidating and daunting. So many things to remember and it's got that nasty feeling where you feel like your controlling your development environment through unwieldy levers rather than your own hands.
But think of the fundamental advantages too! It's all encapsulated now. What you're working on will be based on the exact same version of everything as your teammate, your dev server and your production server are using.
Pros:
- All packaged up and all team members get the exact same versions of everything, including Node and Yarn.
- The
node_modules
directory gets out of your hair. - Perhaps some React code is just a small part of a large project. E.g. the frontend is React, the backend is Django. Then with some
docker-compose
magic you can have it all running with one command without needing to run the frontend in a separate terminal.
Cons:
- Lack of color output in terminal.
- The initial (or infrequent) wait for building the docker image is brutal on a slow network.
- Lots of commands to remember. For example, How do you start a shell again?
In my (Mozilla Services) work, the projects I work on, I actually use docker-compose
for all things. And I have a Makefile
to help me remember all the various docker-compose
commands (thanks Jannis & Will!). One definitely neat thing you can do with docker-compose
is start multiple containers. Then you can, with one command, start a Django server and the create-react-app
dev server with one command. Perhaps a blog post for another day.
Concurrent Gzip in Python
October 13, 2017
11 comments Python, Linux, Docker
Suppose you have a bunch of files you need to Gzip in Python; what's the optimal way to do that? In serial, to avoid saturating the GIL? In multiprocessing, to spread the load across CPU cores? Or with threads?
I needed to know this for symbols.mozilla.org since it does a lot of Gzip'ing. In symbols.mozilla.org clients upload a zip file full of files. A lot of them are plain text and when uploaded to S3 it's best to store them gzipped. Basically it does this:
def upload_sym_file(s3_client, payload, bucket_name, key_name):
file_buffer = BytesIO()
with gzip.GzipFile(fileobj=file_buffer, mode='w') as f:
f.write(payload)
file_buffer.seek(0, os.SEEK_END)
size = file_buffer.tell()
file_buffer.seek(0)
s3_client.put_object(
Bucket=bucket_name,
Key=key_name,
Body=file_buffer
)
print(f"Uploaded {size}")
Another important thing to consider before jumping into the benchmark is to appreciate the context of this application; the bundles of files I need to gzip are often many but smallish. The average file size of the files that need to be gzip'ed is ~300KB. And each bundle is between 5 to 25 files.
The Benchmark
For the sake of the benchmark, here, all it does it figure out the size of each gzipped buffer and reports that as a list.
f1
- Basic serial
def f1(payloads):
sizes = []
for payload in payloads:
sizes.append(_get_size(payload))
return sizes
f2
- Using multiprocessing.Pool
def f2(payloads): # multiprocessing
sizes = []
with multiprocessing.Pool() as p:
sizes = p.map(_get_size, payloads)
return sizes
f3
- Using concurrent.futures.ThreadPoolExecutor
def f3(payloads): # concurrent.futures.ThreadPoolExecutor
sizes = []
futures = []
with concurrent.futures.ThreadPoolExecutor() as executor:
for payload in payloads:
futures.append(
executor.submit(
_get_size,
payload
)
)
for future in concurrent.futures.as_completed(futures):
sizes.append(future.result())
return sizes
f4
- Using concurrent.futures.ProcessPoolExecutor
def f4(payloads): # concurrent.futures.ProcessPoolExecutor
sizes = []
futures = []
with concurrent.futures.ProcessPoolExecutor() as executor:
for payload in payloads:
futures.append(
executor.submit(
_get_size,
payload
)
)
for future in concurrent.futures.as_completed(futures):
sizes.append(future.result())
return sizes
Note that when using asynchronous methods like this, the order of items returned is not the same as they're submitted. An easy remedy if you need the results back in order is to not use a list but to use a dictionary. Then you can track each key (or index if you like) to a value.
The Results
I ran this on three different .zip files of different sizes. To get some sanity in the benchmark I made it print out how many bytes it has to process and how many bytes the gzip will manage to do.
# files 66 Total bytes to gzip 140.69MB Total bytes gzipped 14.96MB Total bytes shaved off by gzip 125.73MB # files 103 Total bytes to gzip 331.57MB Total bytes gzipped 66.90MB Total bytes shaved off by gzip 264.67MB # files 26 Total bytes to gzip 86.91MB Total bytes gzipped 8.28MB Total bytes shaved off by gzip 78.63MB
Sorry for being eastetically handicapped when it comes to using Google Docs but here goes...
This demonstrates the median times it takes each function to complete, each of the three different files.
In all three files I tested, clearly doing it serially (f1
) is the worst. Supposedly since my laptop has more than one CPU core and the others are not being used. Another pertinent thing to notice is that when the work is really big, (the middle 4 bars) the difference isn't as big doing things serially compared to concurrently.
That second zip file contained a single file that was 80MB. The largest in the other two files were 18MB and 22MB.
This is the mean across all medians grouped by function and each compared to the slowest.
I call this the "bestest graph". It's a combination across all different sizes and basically concludes which one is the best, which clearly is function f3
(the one using concurrent.futures.ThreadPoolExecutor
).
CPU Usage
This is probably the best way to explain how the CPU is used; I ran each function repeatedly, then opened gtop
and took a screenshot of the list of processes sorted by CPU percentage.
f1
- Serially
No distractions but it takes 100% of one CPU to work.
f2
- multiprocessing.Pool
My laptop has 8 CPU cores, but I don't know why I see 9 Python processes here.
I don't know why each CPU isn't 100% but I guess there's some administrative overhead to start processes by Python.
f3
- concurrent.futures.ThreadPoolExecutor
One process, with roughly 5 x 8 = 40 threads GIL swapping back and forth but all in all it manages to keep itself very busy since threads are lightweight to share data to.
f4
- concurrent.futures.ProcessPoolExecutor
This is actually kinda like multiprocessing.Pool
but with a different (arguably easier) API.
Conclusion
By a small margin concurrent.futures.ThreadPoolExecutor
won. That's despite not being able to use all CPU cores. This, pseudo scientifically, proves that the overhead of starting the threads is (remember average number of files in each .zip is ~65) more worth it than being able to use all CPUs.
Discussion
There's an interesting twist to this! At least for my use case...
In the application I'm working on, there's actually a lot more that needs to be done other than just gzip'ping some blobs of files. For each file I need to a HEAD query to AWS S3 and an PUT query to AWS S3 too. So what I actually need to do is create an instance of client = botocore.client.S3
that I use to call client.list_objects_v2
and client.put_object
.
When you create an instance of botocore.client.S3
, automatically botocore
will instanciate itself with credentials from os.environ['AWS_ACCESS_KEY_ID']
etc. (or read from some /.aws
file). Once created, if you ask it to do many different network operations, internally it relies on urllib3.poolmanager.PoolManager
which is a list of 10 HTTP connections that get reused.
So when you run the serial version you can re-use the client instance for every file you process but you can only use one HTTP connection in the pool. With the concurrent.futures.ThreadPoolExecutor
it can not only re-use the same instance of botocore.client.S3
it can cycle through all the HTTP connections in the pool.
The process based alternatives like multiprocessing.Pool
and concurrent.futures.ProcessPoolExecutor
can not re-use the botocore.client.S3
instance since it's not pickle'able. And it has to create a new HTTP connection for every single file.
So, the conclusion of the above rambling is that concurrent.futures.ThreadPoolExecutor
is really awesome! Not only did it perform excellently in the Gzip benchmark, it has the added bonus that it can share instance objects and HTTP connections.
A neat trick to zip a git repo with a version number
September 1, 2017
4 comments Linux, Web development
I have this WebExtension addon. It's not very important. Just a web extension that does some hacks to GitHub pages when I open them in Firefox. The web extension is a folder with a manifest.json
, icons/icon-48.png
, tricks.js
, README.md
etc. To upload it to addons.mozilla.org I first have to turn the whole thing into a .zip
file that I can upload.
So I discovered a neat way to make that zip file. It looks like this:
#!/bin/bash DESTINATION=build-`cat manifest.json | jq -r .version`.zip git archive --format=zip master > $DESTINATION echo "Created..." ls -lh $DESTINATION
You run it and it creates a build-1.0.zip
file containing all the files that are checked into the git repo. So it discards my local "junk" such as backup files or other things that are mentioned in .gitignore
(and .git/info/exclude
).
I bet someone's going to laugh and say "Duhh! Of course!" but I didn't know you can do that easily. Hopefully posting this it'll help someone trying to do something similar.
Note; this depends on jq which is an amazing little program.
Why didn't I know about machma?!
June 7, 2017
0 comments Linux, MacOSX, Go
"machma - Easy parallel execution of commands with live feedback"
This is so cool! https://github.com/fd0/machma
It's a command line program that makes it really easy to run any command line program in parallel. I.e. in separate processes with separate CPUs.
Something network bound
Suppose I have a file like this:
▶ wc -l urls.txt 30 urls.txt ▶ cat urls.txt | head -n 3 https://s3-us-west-2.amazonaws.com/org.mozilla.crash-stats.symbols-public/v1/wntdll.pdb/D74F79EB1F8D4A45ABCD2F476CCABACC2/wntdll.sym https://s3-us-west-2.amazonaws.com/org.mozilla.crash-stats.symbols-public/v1/firefox.pdb/448794C699914DB8A8F9B9F88B98D7412/firefox.sym https://s3-us-west-2.amazonaws.com/org.mozilla.crash-stats.symbols-public/v1/d2d1.pdb/CB8FADE9C48E44DA9A10B438A33114781/d2d1.sym
If I wanted to download all of these files with wget
the traditional way would be:
▶ time cat urls.txt | xargs wget -q -P ./downloaded/ cat urls.txt 0.00s user 0.00s system 53% cpu 0.005 total xargs wget -q -P ./downloaded/ 0.07s user 0.24s system 2% cpu 14.913 total ▶ ls downloaded | wc -l 30 ▶ du -sh downloaded 21M downloaded
So it took 15 seconds to download 30 files that totals 21MB.
Now, let's do it with machama
instead:
▶ time cat urls.txt | machma -- wget -q -P ./downloaded/ {} cat urls.txt 0.00s user 0.00s system 55% cpu 0.004 total machma -- wget -q -P ./downloaded/ {} 0.53s user 0.45s system 12% cpu 7.955 total
That uses 8 separate processors (because my laptop has 8 CPUs).
Because 30 / 8 ~= 4, it roughly does 4 iterations.
But note, it took 15 seconds to download 30 files synchronously. That's an average of 0.5s per file. The reason it doesn't take 4x0.5 seconds (instead of 8 seconds) is because it's at the mercy of bad luck and some of those 30 spiking a bit.
Something CPU bound
Now let's do something really CPU intensive; Guetzli compression.
▶ ls images | wc -l 7 ▶ time find images -iname '*.jpg' | xargs -I {} guetzli --quality 85 {} compressed/{} find images -iname '*.jpg' 0.00s user 0.00s system 40% cpu 0.009 total xargs -I {} guetzli --quality 85 {} compressed/{} 35.74s user 0.68s system 99% cpu 36.560 total
And now the same but with machma
:
▶ time find images -iname '*.jpg' | machma -- guetzli --quality 85 {} compressed/{} processed 7 items (0 failures) in 0:10 find images -iname '*.jpg' 0.00s user 0.00s system 51% cpu 0.005 total machma -- guetzli --quality 85 {} compressed/{} 58.47s user 0.91s system 546% cpu 10.857 total
Basically, it took only 11 seconds. This time there were fewer images (7) than there was CPUs (8), so basically the poor computer is doing super intensive CPU (and memory) work across all CPUs at the same time. The average time for each of these files is ~5 seconds so it's really interesting that even if you try to do this in parallel execution instead of taking a total of ~5 seconds, it took almost double that.
In conclusion
Such a handy tool to have around for command line stuff. I haven't looked at its code much but it's almost a shame that the project only has 300+ GitHub stars. Perhaps because it's kinda complete and doesn't need much more work.
Also, if you attempt all the examples above you'll notice that when you use the ... | xargs ...
approach the stdout and stderr is a mess. For wget
, that's why I used -q
to silence it a bit. With machma
you get a really pleasant color coded live output that tells you the state of the queue, possible failures and an ETA.