Peterbe.com

A blog and website by Peter Bengtsson

Filtered home page!
Currently only showing blog entries under the category: Linux. Clear filter

How to create-react-app with Docker

17 November 2017 0 comments   Docker, ReactJS, Javascript, Web development, Linux


Why would you want to use Docker to do React app work? Isn't Docker for server-side stuff like Python and Golang etc? No, all the benefits of Docker apply to JavaScript client-side work too.

So there are three main things you want to do with create-react-app; dev server, running tests and creating build artifacts. Let's look at all three but using Docker.

Create-react-app first

If you haven't already, install create-react-app globally:

▶ yarn global add create-react-app

And, once installed, create a new project:

▶ create-react-app docker-create-react-app
...lots of output...

▶ cd docker-create-react-app
▶ ls
README.md    node_modules package.json public       src          yarn.lock

We won't need the node_modules here in the project directory. Instead, when building the image we're going let node_modules stay inside the image. So you can go ahead and... rm -fr node_modules.

Create the Dockerfile

Let's just dive in. This Dockerfile is the minimum:

FROM node:8

ADD yarn.lock /yarn.lock
ADD package.json /package.json

ENV NODE_PATH=/node_modules
ENV PATH=$PATH:/node_modules/.bin
RUN yarn

WORKDIR /app
ADD . /app

EXPOSE 3000
EXPOSE 35729

ENTRYPOINT ["/bin/bash", "/app/run.sh"]
CMD ["start"]

A couple of things to notice here.
First of all we're basing this on the official Node v8 repository on Docker Hub. That gives you a Node and Yarn by default.

Note how the NODE_PATH environment variable puts the node_modules in the root of the container. That's so that it doesn't get added in "here" (i.e. the current working directory). If you didn't do this, the node_modules directory would be part of the mounted volume which not only slows down Docker (since there are so many files) it also isn't necessary to see those files.

Note how the ENTRYPOINT points to run.sh. That's a file we need to create too, alongside the Dockerfile file.

#!/usr/bin/env bash
set -eo pipefail

case $1 in
  start)
    # The '| cat' is to trick Node that this is an non-TTY terminal
    # then react-scripts won't clear the console.
    yarn start | cat
    ;;
  build)
    yarn build
    ;;
  test)
    yarn test $@
    ;;
  *)
    exec "$@"
    ;;
esac

Lastly, as a point of convenience, note that the default CMD is "start". That's so that when you simply run the container the default thing it does is to run yarn start.

Build container

Now let's build it:

▶ docker image build -t react:app .

The -t react:app is up to you. It doesn't matter so much what it is unless you're going to upload your container the a registry. Then you probably want the repository to be something unique.

Let's check that the build is there:

▶ docker image ls react:app
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
react               app                 3ee5c7596f57        13 minutes ago      996MB

996MB! The base Node image is about ~700MB and the node_modules directory (for a clean new create-react-app) is ~160MB (at the time of writing). What the remaining difference is, I'm not sure. But it's empty calories and easy to lose. When you blow away the built image (docker image rmi react:app) your hard drive gets all that back and no actual code is lost.

Before we run it, lets go inside and see what was created:

▶ docker container run -it react:app bash
root@996e708a30c4:/app# ls
Dockerfile  README.md  package.json  public  run.sh  src  yarn.lock
root@996e708a30c4:/app# du -sh /node_modules/
148M    /node_modules/
root@996e708a30c4:/app# sw-precache
Total precache size is about 355 kB for 14 resources.
service-worker.js has been generated with the service worker contents.

The last command (sw-precache) was just to show that executables in /node_modules/.bin are indeed on the $PATH and can be run.

Run container

Now to run it:

▶ docker container run -it -p 3000:3000 react:app
yarn run v1.3.2
$ react-scripts start
Starting the development server...

Compiled successfully!

You can now view docker-create-react-app in the browser.

  Local:            http://localhost:3000/
  On Your Network:  http://172.17.0.2:3000/

Note that the development build is not optimized.
To create a production build, use yarn build.

Default app running

Pretty good. Open http://localhost:3000 in your browser and you should see the default create-react-app app.

Next step; Warm reloading

create-react-app does not support hot reloading of components. But it does support web page reloading. As soon as a local file is changed, it sends a signal to the browser (using WebSockets) to tell it to... document.location.reload().

To make this work, we need to do two things:
1) Mount the current working directory into the Docker container
2) Expose the WebSocket port

The WebSocket thing is set up by exposing port 35729 to the host (-p 35729:35729).

Below is an example running this with a volume mount and both ports exposed.

▶ docker container run -it -p 3000:3000 -p 35729:35729 -v $(pwd):/app react:app
yarn run v1.3.2
$ react-scripts start
Starting the development server...

Compiled successfully!

You can now view docker-create-react-app in the browser.

  Local:            http://localhost:3000/
  On Your Network:  http://172.17.0.2:3000/

Note that the development build is not optimized.
To create a production build, use yarn build.

Compiling...
Compiled successfully!
Compiling...
Compiled with warnings.

./src/App.js
  Line 7:  'neverused' is assigned a value but never used  no-unused-vars

Search for the keywords to learn more about each warning.
To ignore, add // eslint-disable-next-line to the line before.

Compiling...
Failed to compile.

./src/App.js
Module not found: Can't resolve './Apps.css' in '/app/src'

In the about example output. First I make a harmless save in the src/App.js file just to see that the dev server notices and that my browser reloads when I did that. That's where it says

Compiling...
Compiled successfully!

Secondly, I make an edit that triggers a warning. That's where it says:

Compiling...
Compiled with warnings.

./src/App.js
  Line 7:  'neverused' is assigned a value but never used  no-unused-vars

Search for the keywords to learn more about each warning.
To ignore, add // eslint-disable-next-line to the line before.

And lastly I make an edit by messing with the import line

Compiling...
Failed to compile.

./src/App.js
Module not found: Can't resolve './Apps.css' in '/app/src'

This is great! Isn't create-react-app wonderful?

Build build :)

There are many things you can do with the code you're building. Let's pretend that the intention is to build a single-page-app and then take the static assets (including the index.html) and upload them to a public CDN or something. To do that we need to generate the build directory.

The trick here is to run this with a volume mount so that when it creates /app/build (from the perspective) of the container, that directory effectively becomes visible in the host.

▶ docker container run -it -v $(pwd):/app react:app build
yarn run v1.3.2
$ react-scripts build
Creating an optimized production build...
Compiled successfully.

File sizes after gzip:

  35.59 KB  build/static/js/main.591fd843.js
  299 B     build/static/css/main.c17080f1.css

The project was built assuming it is hosted at the server root.
To override this, specify the homepage in your package.json.
For example, add this to build it for GitHub Pages:

  "homepage" : "http://myname.github.io/myapp",

The build folder is ready to be deployed.
You may serve it with a static server:

  yarn global add serve
  serve -s build

Done in 5.95s.

Now, on the host:

▶ tree build
build
├── asset-manifest.json
├── favicon.ico
├── index.html
├── manifest.json
├── service-worker.js
└── static
    ├── css
    │   ├── main.c17080f1.css
    │   └── main.c17080f1.css.map
    ├── js
    │   ├── main.591fd843.js
    │   └── main.591fd843.js.map
    └── media
        └── logo.5d5d9eef.svg

4 directories, 10 files

The contents of that file you can now upload to a CDN some public Nginx server that points to this as the root directory.

Running tests

This one is so easy and obvious now.

▶ docker container run -it -v $(pwd):/app react:app test

Note the that we're setting up a volume mount here again. Since the test runner is interactive it sits and waits for file changes and re-runs tests immediately, it's important to do the mount now.

All regular jest options work too. For example:

▶ docker container run -it -v $(pwd):/app react:app test --coverage
▶ docker container run -it -v $(pwd):/app react:app test --help

Debugging the node_modules

First of all, when I say "debugging the node_modules", in this context, I'm referring to messing with node_modules whilst running tests or running the dev server.

One way to debug the node_modules used is to enter a bash shell and literally mess with the files inside it. First, start the dev server (or start the test runner) and give the container a name:

▶ docker container run -it -p 3000:3000 -p 35729:35729 -v $(pwd):/app --name mydebugging react:app

Now, in a separate terminal start bash in the container:

▶ docker exec -it mydebugging bash

Once you're in you can install an editor and start editing files:

root@2bf8c877f788:/app# apt-get update && apt-get install jed
root@2bf8c877f788:/app# jed /node_modules/react/index.js

As soon as you make changes to any of the files, the dev server should notice and reload.

When you stop the container all your changes will be reset. So if you had to sprinkle the node_modules with console.log('WHAT THE HECK!') all of those disappear when the container is stopped.

NodeJS shell

This'll come as no surprise by now. You basically run bash and you're there:

▶ docker container run -it -v $(pwd):/app react:app bash
root@2a21e8206a1f:/app# node
> [] + 1
'1'

Conclusion

When I look back at all the commands above, I can definitely see how it's pretty intimidating and daunting. So many things to remember and it's got that nasty feeling where you feel like your controlling your development environment through unwieldy levers rather than your own hands.

But think of the fundamental advantages too! It's all encapsulated now. What you're working on will be based on the exact same version of everything as your teammate, your dev server and your production server are using.

Pros:

Cons:

In my (Mozilla Services) work, the projects I work on, I actually use docker-compose for all things. And I have a Makefile to help me remember all the various docker-compose commands (thanks Jannis & Will!). One definitely neat thing you can do with docker-compose is start multiple containers. Then you can, with one command, start a Django server and the create-react-app dev server with one command. Perhaps a blog post for another day.

Concurrent Gzip in Python

13 October 2017 6 comments   Docker, Linux, Python


Suppose you have a bunch of files you need to Gzip in Python; what's the optimal way to do that? In serial, to avoid saturating the GIL? In multiprocessing, to spread the load across CPU cores? Or with threads?

I needed to know this for symbols.mozilla.org since it does a lot of Gzip'ing. In symbols.mozilla.org clients upload a zip file full of files. A lot of them are plain text and when uploaded to S3 it's best to store them gzipped. Basically it does this:

def upload_sym_file(s3_client, payload, bucket_name, key_name):
    file_buffer = BytesIO()
    with gzip.GzipFile(fileobj=file_buffer, mode='w') as f:
        f.write(payload)
    file_buffer.seek(0, os.SEEK_END)
    size = file_buffer.tell()
    file_buffer.seek(0)
    s3_client.put_object(
        Bucket=bucket_name,
        Key=key_name,
        Body=file_buffer
    )
    print(f"Uploaded {size}")

Another important thing to consider before jumping into the benchmark is to appreciate the context of this application; the bundles of files I need to gzip are often many but smallish. The average file size of the files that need to be gzip'ed is ~300KB. And each bundle is between 5 to 25 files.

The Benchmark

For the sake of the benchmark, here, all it does it figure out the size of each gzipped buffer and reports that as a list.

f1 - Basic serial

def f1(payloads):
    sizes = []
    for payload in payloads:
        sizes.append(_get_size(payload))
    return sizes

f2 - Using multiprocessing.Pool

def f2(payloads):  # multiprocessing
    sizes = []
    with multiprocessing.Pool() as p:
        sizes = p.map(_get_size, payloads)
    return sizes

f3 - Using concurrent.futures.ThreadPoolExecutor

def f3(payloads):  # concurrent.futures.ThreadPoolExecutor
    sizes = []
    futures = []
    with concurrent.futures.ThreadPoolExecutor() as executor:
        for payload in payloads:
            futures.append(
                executor.submit(
                    _get_size,
                    payload
                )
            )
        for future in concurrent.futures.as_completed(futures):
            sizes.append(future.result())
    return sizes

f4 - Using concurrent.futures.ProcessPoolExecutor

def f4(payloads):  # concurrent.futures.ProcessPoolExecutor
    sizes = []
    futures = []
    with concurrent.futures.ProcessPoolExecutor() as executor:
        for payload in payloads:
            futures.append(
                executor.submit(
                    _get_size,
                    payload
                )
            )
        for future in concurrent.futures.as_completed(futures):
            sizes.append(future.result())
    return sizes

Note that when using asynchronous methods like this, the order of items returned is not the same as they're submitted. An easy remedy if you need the results back in order is to not use a list but to use a dictionary. Then you can track each key (or index if you like) to a value.

The Results

I ran this on three different .zip files of different sizes. To get some sanity in the benchmark I made it print out how many bytes it has to process and how many bytes the gzip will manage to do.

# files 66
Total bytes to gzip 140.69MB
Total bytes gzipped 14.96MB
Total bytes shaved off by gzip 125.73MB

# files 103
Total bytes to gzip 331.57MB
Total bytes gzipped 66.90MB
Total bytes shaved off by gzip 264.67MB

# files 26
Total bytes to gzip 86.91MB
Total bytes gzipped 8.28MB
Total bytes shaved off by gzip 78.63MB

Sorry for being eastetically handicapped when it comes to using Google Docs but here goes...


This demonstrates the median times it takes each function to complete, each of the three different files.

In all three files I tested, clearly doing it serially (f1) is the worst. Supposedly since my laptop has more than one CPU core and the others are not being used. Another pertinent thing to notice is that when the work is really big, (the middle 4 bars) the difference isn't as big doing things serially compared to concurrently.

That second zip file contained a single file that was 80MB. The largest in the other two files were 18MB and 22MB.


This is the mean across all medians grouped by function and each compared to the slowest.

I call this the "bestest graph". It's a combination across all different sizes and basically concludes which one is the best, which clearly is function f3 (the one using concurrent.futures.ThreadPoolExecutor).

CPU Usage

This is probably the best way to explain how the CPU is used; I ran each function repeatedly, then opened gtop and took a screenshot of the list of processes sorted by CPU percentage.

f1 - Serially

f1
No distractions but it takes 100% of one CPU to work.

f2 - multiprocessing.Pool

f2
My laptop has 8 CPU cores, but I don't know why I see 9 Python processes here.
I don't know why each CPU isn't 100% but I guess there's some administrative overhead to start processes by Python.

f3 - concurrent.futures.ThreadPoolExecutor

f3
One process, with roughly 5 x 8 = 40 threads GIL swapping back and forth but all in all it manages to keep itself very busy since threads are lightweight to share data to.

f4 - concurrent.futures.ProcessPoolExecutor

f4
This is actually kinda like multiprocessing.Pool but with a different (arguably easier) API.

Conclusion

By a small margin concurrent.futures.ThreadPoolExecutor won. That's despite not being able to use all CPU cores. This, pseudo scientifically, proves that the overhead of starting the threads is (remember average number of files in each .zip is ~65) more worth it than being able to use all CPUs.

Discussion

There's an interesting twist to this! At least for my use case...

In the application I'm working on, there's actually a lot more that needs to be done other than just gzip'ping some blobs of files. For each file I need to a HEAD query to AWS S3 and an PUT query to AWS S3 too. So what I actually need to do is create an instance of client = botocore.client.S3 that I use to call client.list_objects_v2 and client.put_object.

When you create an instance of botocore.client.S3, automatically botocore will instanciate itself with credentials from os.environ['AWS_ACCESS_KEY_ID'] etc. (or read from some /.aws file). Once created, if you ask it to do many different network operations, internally it relies on urllib3.poolmanager.PoolManager which is a list of 10 HTTP connections that get reused.

So when you run the serial version you can re-use the client instance for every file you process but you can only use one HTTP connection in the pool. With the concurrent.futures.ThreadPoolExecutor it can not only re-use the same instance of botocore.client.S3 it can cycle through all the HTTP connections in the pool.

The process based alternatives like multiprocessing.Pool and concurrent.futures.ProcessPoolExecutor can not re-use the botocore.client.S3 instance since it's not pickle'able. And it has to create a new HTTP connection for every single file.

So, the conclusion of the above rambling is that concurrent.futures.ThreadPoolExecutor is really awesome! Not only did it perform excellently in the Gzip benchmark, it has the added bonus that it can share instance objects and HTTP connections.

A neat trick to zip a git repo with a version number

01 September 2017 4 comments   Web development, Linux


I have this WebExtension addon. It's not very important. Just a web extension that does some hacks to GitHub pages when I open them in Firefox. The web extension is a folder with a manifest.json, icons/icon-48.png, tricks.js, README.md etc. To upload it to addons.mozilla.org I first have to turn the whole thing into a .zip file that I can upload.

So I discovered a neat way to make that zip file. It looks like this:

#!/bin/bash

DESTINATION=build-`cat manifest.json | jq -r .version`.zip
git archive --format=zip master > $DESTINATION

echo "Created..."
ls -lh $DESTINATION

You run it and it creates a build-1.0.zip file containing all the files that are checked into the git repo. So it discards my local "junk" such as backup files or other things that are mentioned in .gitignore (and .git/info/exclude).

I bet someone's going to laugh and say "Duhh! Of course!" but I didn't know you can do that easily. Hopefully posting this it'll help someone trying to do something similar.

Note; this depends on jq which is an amazing little program.

Why didn't I know about machma?!

07 June 2017 0 comments   Go, MacOSX, Linux


"machma - Easy parallel execution of commands with live feedback"

This is so cool! https://github.com/fd0/machma

It's a command line program that makes it really easy to run any command line program in parallel. I.e. in separate processes with separate CPUs.

Something network bound

Suppose I have a file like this:

▶ wc -l urls.txt
      30 urls.txt

▶ cat urls.txt | head -n 3
https://s3-us-west-2.amazonaws.com/org.mozilla.crash-stats.symbols-public/v1/wntdll.pdb/D74F79EB1F8D4A45ABCD2F476CCABACC2/wntdll.sym
https://s3-us-west-2.amazonaws.com/org.mozilla.crash-stats.symbols-public/v1/firefox.pdb/448794C699914DB8A8F9B9F88B98D7412/firefox.sym
https://s3-us-west-2.amazonaws.com/org.mozilla.crash-stats.symbols-public/v1/d2d1.pdb/CB8FADE9C48E44DA9A10B438A33114781/d2d1.sym

If I wanted to download all of these files with wget the traditional way would be:

▶ time cat urls.txt | xargs wget -q -P ./downloaded/
cat urls.txt  0.00s user 0.00s system 53% cpu 0.005 total
xargs wget -q -P ./downloaded/  0.07s user 0.24s system 2% cpu 14.913 total

▶ ls downloaded | wc -l
      30

▶ du -sh downloaded
 21M    downloaded

So it took 15 seconds to download 30 files that totals 21MB.

Now, let's do it with machama instead:

▶ time cat urls.txt | machma -- wget -q -P ./downloaded/ {}
cat urls.txt  0.00s user 0.00s system 55% cpu 0.004 total
machma -- wget -q -P ./downloaded/ {}  0.53s user 0.45s system 12% cpu 7.955 total

That uses 8 separate processors (because my laptop has 8 CPUs).
Because 30 / 8 ~= 4, it roughly does 4 iterations.

But note, it took 15 seconds to download 30 files synchronously. That's an average of 0.5s per file. The reason it doesn't take 4x0.5 seconds (instead of 8 seconds) is because it's at the mercy of bad luck and some of those 30 spiking a bit.

Something CPU bound

Now let's do something really CPU intensive; Guetzli compression.

▶ ls images | wc -l
  7

▶ time find images -iname '*.jpg' | xargs -I {} guetzli --quality 85 {} compressed/{}
find images -iname '*.jpg'  0.00s user 0.00s system 40% cpu 0.009 total
xargs -I {} guetzli --quality 85 {} compressed/{}  35.74s user 0.68s system 99% cpu 36.560 total

And now the same but with machma:

▶ time find images -iname '*.jpg' | machma -- guetzli --quality 85 {} compressed/{}

processed 7 items (0 failures) in 0:10
find images -iname '*.jpg'  0.00s user 0.00s system 51% cpu 0.005 total
machma -- guetzli --quality 85 {} compressed/{}  58.47s user 0.91s system 546% cpu 10.857 total

Basically, it took only 11 seconds. This time there were fewer images (7) than there was CPUs (8), so basically the poor computer is doing super intensive CPU (and memory) work across all CPUs at the same time. The average time for each of these files is ~5 seconds so it's really interesting that even if you try to do this in parallel execution instead of taking a total of ~5 seconds, it took almost double that.

In conclusion

Such a handy tool to have around for command line stuff. I haven't looked at its code much but it's almost a shame that the project only has 300+ GitHub stars. Perhaps because it's kinda complete and doesn't need much more work.

Also, if you attempt all the examples above you'll notice that when you use the ... | xargs ... approach the stdout and stderr is a mess. For wget, that's why I used -q to silence it a bit. With machma you get a really pleasant color coded live output that tells you the state of the queue, possible failures and an ETA.

Experimenting with Guetzli

24 May 2017 0 comments   MacOSX, Web development, Linux

https://codepen.io/peterbe/pen/rmPMpm


tl;dr; Guetzli, the new JPEG compression program from Google can save a bytes with little loss of quality.

Inspired by this blog post about Guetzli I thought I'd try it out with something that's relevant to my project, 300x300 JPGs that can be heavily compressed.

So I installed it (with Homebrew) on my MacBook Pro (late 2013) and picked 7 JPGs I had, and use in SongSearch. Which is interesting because these JPEGs have already been compressed once. They are taken from converting from much larger PNGs with PIL (Pillow) at quality rating 80%. In other words, this is Guetzli on top of PIL.

I ran one iteration for every image for the following qualities: 85%, 90%, 95%, 99%, 100%.

The results on the size are as follows:

Image Average Size (bytes) % Smaller
original 23497.0 0
85% 16025.4 32%
90% 18829.4 20%
95% 21338.1 9.2%
99% 22705.3 3.4%
100% 22919.7 2.5%

So, for example, if you choose the 90% quality you save, on average, 4,667B (4.6KB).

As you might already know, Guetzli is incredibly memory hungry and very very slow. On average each image compression took on average 4-6 seconds (higher quality, shorter times). Meaning, if you like Guetzli you probably need to build around it so that the compression happens in a build step or async somewhere and ideally you don't want to run too many compressions in parallel as it might cause CPU and memory overloading.

Now, how does it look?

Go to https://codepen.io/peterbe/pen/rmPMpm and stare at the screen to see if you can A) see which one is more compressed and B) if the one that is more compressed is too low quality.

What do you think?

Is it worth it?

Is the quality drop too much to save 10% on image sizes?

Please share your thoughts. Perhaps we can re-do this experiment with some slightly larger JPGs.

Fastest Redis configuration for Django

11 May 2017 0 comments   Django, Web development, Linux, Python

https://github.com/peterbe/django-fastest-redis


I have an app that does a lot of Redis queries. It all runs in AWS with ElastiCache Redis. Due to the nature of the app, it stores really large hash tables in Redis. The application then depends on querying Redis for these. The question is; What is the best configuration possible for the fastest service possible?

Note! Last month I wrote Fastest cache backend possible for Django which looked at comparing Redis against Memcache. Might be an interesting read too if you're not sold on Redis.

Options

All options are variations on the compressor, serializer and parser which are things you can override in django-redis. All have an effect on the performance. Even compression, for if the number of bytes between Redis and the application is smaller, then it should have better network throughput.

Without further ado, here are the variations:

CACHES = {
    "default": {
        "BACKEND": "django_redis.cache.RedisCache",
        "LOCATION": config('REDIS_LOCATION', 'redis://127.0.0.1:6379') + '/0',
        "OPTIONS": {
            "CLIENT_CLASS": "django_redis.client.DefaultClient",
        }
    },
    "json": {
        "BACKEND": "django_redis.cache.RedisCache",
        "LOCATION": config('REDIS_LOCATION', 'redis://127.0.0.1:6379') + '/1',
        "OPTIONS": {
            "CLIENT_CLASS": "django_redis.client.DefaultClient",
            "SERIALIZER": "django_redis.serializers.json.JSONSerializer",
        }
    },
    "ujson": {
        "BACKEND": "django_redis.cache.RedisCache",
        "LOCATION": config('REDIS_LOCATION', 'redis://127.0.0.1:6379') + '/2',
        "OPTIONS": {
            "CLIENT_CLASS": "django_redis.client.DefaultClient",
            "SERIALIZER": "fastestcache.ujson_serializer.UJSONSerializer",
        }
    },
    "msgpack": {
        "BACKEND": "django_redis.cache.RedisCache",
        "LOCATION": config('REDIS_LOCATION', 'redis://127.0.0.1:6379') + '/3',
        "OPTIONS": {
            "CLIENT_CLASS": "django_redis.client.DefaultClient",
            "SERIALIZER": "django_redis.serializers.msgpack.MSGPackSerializer",
        }
    },
    "hires": {
        "BACKEND": "django_redis.cache.RedisCache",
        "LOCATION": config('REDIS_LOCATION', 'redis://127.0.0.1:6379') + '/4',
        "OPTIONS": {
            "CLIENT_CLASS": "django_redis.client.DefaultClient",
            "PARSER_CLASS": "redis.connection.HiredisParser",
        }
    },
    "zlib": {
        "BACKEND": "django_redis.cache.RedisCache",
        "LOCATION": config('REDIS_LOCATION', 'redis://127.0.0.1:6379') + '/5',
        "OPTIONS": {
            "CLIENT_CLASS": "django_redis.client.DefaultClient",
            "COMPRESSOR": "django_redis.compressors.zlib.ZlibCompressor",
        }
    },
    "lzma": {
        "BACKEND": "django_redis.cache.RedisCache",
        "LOCATION": config('REDIS_LOCATION', 'redis://127.0.0.1:6379') + '/6',
        "OPTIONS": {
            "CLIENT_CLASS": "django_redis.client.DefaultClient",
            "COMPRESSOR": "django_redis.compressors.lzma.LzmaCompressor"
        }
    },
}

As you can see, they each have a variation on the OPTIONS.PARSER_CLASS, OPTIONS.SERIALIZER or OPTIONS.COMPRESSOR.

The default configuration is to use redis-py and to pickle the Python objects to a bytestring. Pickling in Python is pretty fast but it has the disadvantage that it's Python specific so you can't have a Ruby application reading the same Redis database.

The Experiment

Note how I have one LOCATION per configuration. That's crucial for the sake of testing. That way one database is all JSON and another is all gzip etc.

What the benchmark does is that it measures how long it takes to READ a specific key (called benchmarking). Then, once it's done that it appends that time to the previous value (or [] if it was the first time). And lastly it writes that list back into the database. That way, towards the end you have 1 key whose value looks something like this: [0.013103008270263672, 0.003879070281982422, 0.009411096572875977, 0.0009970664978027344, 0.0002830028533935547, ..... MANY MORE ....].

Towards the end, each of these lists are pretty big. About 500 to 1,000 depending on the benchmark run.

In the experiment I used wrk to basically bombard the Django server on the URL /random (which makes a measurement with a random configuration). On the EC2 experiment node, it finalizes around 1,300 requests per second which is a decent number for an application that does a fair amount of writes.

The way I run the Django server is with uwsgi like this:

uwsgi --http :8000 --wsgi-file fastestcache/wsgi.py --master --processes 4 --threads 2

And the wrk command like this:

wrk -d30s  "http://127.0.0.1:8000/random"

(that, by default, runs 2 threads on 10 connections)

At the end of starting the benchmarking, I open http://localhost:8000/summary which spits out a table and some simple charts.

An Important Quirk

Time measurements over time
One thing I noticed when I started was that the final numbers' average was very different from the medians. That would indicate that there are spikes. The graph on the right shows the times put into that huge Python list for the default configuration for the first 200 measurements. Note that there are little spikes but generally quite flat over time once it gets past the beginning.

Sure enough, it turns out that in almost all configurations, the time it takes to make the query in the beginning is almost order of magnitude slower than the times once the benchmark has started running for a while.

So in the test code you'll see that it chops off the first 10 times. Perhaps it should be more than 10. After all, if you don't like the spikes you can simply look at the median as the best source of conclusive truth.

The Code

The benchmarking code is here. Please be aware that this is quite rough. I'm sure there are many things that can be improved, but I'm not sure I'm going to keep this around.

The Equipment

The ElastiCache Redis I used was a cache.m3.xlarge (13 GiB, High network performance) with 0 shards and 1 node and no multi-zone enabled.

The EC2 node was a m4.xlarge Ubuntu 16.04 64-bit (4 vCPUs and 16 GiB RAM with High network performance).

Both the Redis and the EC2 were run in us-west-1c (North Virginia).

The Results

Here are the results! Sorry if it looks terrible on mobile devices.

root@ip-172-31-2-61:~# wrk -d30s  "http://127.0.0.1:8000/random" && curl "http://127.0.0.1:8000/summary"
Running 30s test @ http://127.0.0.1:8000/random
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     9.19ms    6.32ms  60.14ms   80.12%
    Req/Sec   583.94    205.60     1.34k    76.50%
  34902 requests in 30.03s, 2.59MB read
Requests/sec:   1162.12
Transfer/sec:     88.23KB
                         TIMES        AVERAGE         MEDIAN         STDDEV
json                      2629        2.596ms        2.159ms        1.969ms
msgpack                   3889        1.531ms        0.830ms        1.855ms
lzma                      1799        2.001ms        1.261ms        2.067ms
default                   3849        1.529ms        0.894ms        1.716ms
zlib                      3211        1.622ms        0.898ms        1.881ms
ujson                     3715        1.668ms        0.979ms        1.894ms
hires                     3791        1.531ms        0.879ms        1.800ms

Best Averages (shorter better)
###############################################################################
██████████████████████████████████████████████████████████████   2.596  json
█████████████████████████████████████                            1.531  msgpack
████████████████████████████████████████████████                 2.001  lzma
█████████████████████████████████████                            1.529  default
███████████████████████████████████████                          1.622  zlib
████████████████████████████████████████                         1.668  ujson
█████████████████████████████████████                            1.531  hires
Best Medians (shorter better)
###############################################################################
███████████████████████████████████████████████████████████████  2.159  json
████████████████████████                                         0.830  msgpack
████████████████████████████████████                             1.261  lzma
██████████████████████████                                       0.894  default
██████████████████████████                                       0.898  zlib
████████████████████████████                                     0.979  ujson
█████████████████████████                                        0.879  hires


Size of Data Saved (shorter better)
###############################################################################
█████████████████████████████████████████████████████████████████  60K  json
██████████████████████████████████████                             35K  msgpack
████                                                                4K  lzma
█████████████████████████████████████                              35K  default
█████████                                                           9K  zlib
████████████████████████████████████████████████████               48K  ujson
█████████████████████████████████████                              34K  hires

Discussion Points

Conclusion

This experiment has lead me to the conclusion that the best serializer is msgpack and the best compression is zlib. That is the best configuration for django-redis.

msgpack has implementation libraries for many other programming languages. Right now that doesn't matter for my application but if msgpack is both faster and more versatile (because it supports multiple languages) I conclude that to be the best serializer instead.

Fastest cache backend possible for Django

07 April 2017 5 comments   Web development, Linux, Python


tl;dr; Redis is twice as fast as memcached as a Django cache backend when installed using AWS ElastiCache. Only tested for reads.

Django has a wonderful caching framework. I think I say "wonderful" because it's so simple. Not because it has a hundred different bells or whistles. Each cache gets a name (e.g. "mymemcache" or "redis append only"). The only configuration you generally have to worry about is 1) what backed and 2) what location.

For example, to set up a memcached backend:

# this in settings.py
CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
        'KEY_PREFIX': 'myapp',
        'LOCATION': config('MEMCACHED_LOCATION', '127.0.0.1:11211'),
    },
}

With that in play you can now do:

>>> from django.core.cache import caches
>>> caches['default'].set('key', 'value', 60)  # 60 seconds
>>> caches['default'].get('key')
'value'

Django comes without built-in backend called django.core.cache.backends.locmem.LocMemCache which is basically a simply Python object in memory with no persistency between Python processes. This one is of course super fast because it involves no further network (local or remote) beyond the process itself. But it's not really useful because if you care about performance (which you probably are if you're here because of the blog post title) because it can't be reused amongst processes.

Anyway, the most common backends to use are:

These are semi-persistent and built for extremely fast key lookups. They can both be reached over TCP or via a socket.

What I wanted to see, is which one is fastest.

The Experiment

First of all, in this blog post I'm only measuring the read times of the various cache backends.

Here's the Django view function that is the experiment:

from django.conf import settings
from django.core.cache import caches

def run(request, cache_name):
    if cache_name == 'random':
        cache_name = random.choice(settings.CACHE_NAMES)
    cache = caches[cache_name]
    t0 = time.time()
    data = cache.get('benchmarking', [])
    t1 = time.time()
    if random.random() < settings.WRITE_CHANCE:
        data.append(t1 - t0)
        cache.set('benchmarking', data, 60)
    if data:
        avg = 1000 * sum(data) / len(data)
    else:
        avg = 'notyet'
    # print(cache_name, '#', len(data), 'avg:', avg, ' size:', len(str(data)))
    return http.HttpResponse('{}\n'.format(avg))

It records the time to make a cache.get read and depending settings.WRITE_CHANCE it also does a write (but doesn't record that).
What it records is a list of floats. The content of that piece of data stored in the cache looks something like this:

  1. [0.0007331371307373047]
  2. [0.0007331371307373047, 0.0002570152282714844]
  3. [0.0007331371307373047, 0.0002570152282714844, 0.0002200603485107422]

So the data grows from being really small to something really large. If you run this 1,000 times with settings.WRITE_CACHE of 1.0 the last time it has to fetch a list of 999 floats out of the cache backend.

You can either test it with 1 specific backend in mind and see how fast Django can do, say, 10,000 of these. Here's one such example:

$ wrk -t10 -c400 -d10s http://127.0.0.1:8000/default
Running 10s test @ http://127.0.0.1:8000/default
  10 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    76.28ms  155.26ms   1.41s    92.70%
    Req/Sec   349.92    193.36     1.51k    79.30%
  34107 requests in 10.10s, 2.56MB read
  Socket errors: connect 0, read 0, write 0, timeout 59
Requests/sec:   3378.26
Transfer/sec:    259.78KB

$ wrk -t10 -c400 -d10s http://127.0.0.1:8000/memcached
Running 10s test @ http://127.0.0.1:8000/memcached
  10 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    96.87ms  183.16ms   1.81s    95.10%
    Req/Sec   213.42     82.47     0.91k    76.08%
  21315 requests in 10.09s, 1.57MB read
  Socket errors: connect 0, read 0, write 0, timeout 32
Requests/sec:   2111.68
Transfer/sec:    159.27KB

$ wrk -t10 -c400 -d10s http://127.0.0.1:8000/redis
Running 10s test @ http://127.0.0.1:8000/redis
  10 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    84.93ms  148.62ms   1.66s    92.20%
    Req/Sec   262.96    138.72     1.10k    81.20%
  25271 requests in 10.09s, 1.87MB read
  Socket errors: connect 0, read 0, write 0, timeout 15
Requests/sec:   2503.55
Transfer/sec:    189.96KB

But an immediate disadvantage with this is that the "total final rate" (i.e. requests/sec) is likely to include so many other factors. However, you can see that LocMemcache got 3378.26 req/s, MemcachedCache got 2111.68 req/s and RedisCache got 2503.55 req/s.

The code for the experiment is available here: https://github.com/peterbe/django-fastest-cache

The Infra Setup

I created an AWS m3.xlarge EC2 Ubuntu node and two nodes in AWS ElastiCache. One 2-node memcached cluster based on cache.m3.xlarge and one 2-node 1-replica Redis cluster also based on cache.m3.xlarge.

The Django server was run with uWSGI like this:

uwsgi --http :8000 --wsgi-file fastestcache/wsgi.py  --master --processes 6 --threads 10

The Results

Instead of hitting one backend repeatedly and reporting the "requests per second" I hit the "random" endpoint for 30 seconds and let it randomly select a cache backend each time and once that's done, I'll read each cache and look at the final massive list of timings it took to make all the reads. I run it like this:

wrk -t10 -c400 -d30s http://127.0.0.1:8000/random && curl http://127.0.0.1:8000/summary
...wrk output redacted...

                         TIMES        AVERAGE         MEDIAN         STDDEV
memcached                 5738        7.523ms        4.828ms        8.195ms
default                   3362        0.305ms        0.187ms        1.204ms
redis                     4958        3.502ms        1.707ms        5.591ms

Best Averages (shorter better)
###############################################################################
█████████████████████████████████████████████████████████████  7.523  memcached
██                                                             0.305  default
████████████████████████████                                   3.502  redis

Things to note:

Other Things To Test

Perhaps pylibmc is faster than python-memcached.

                         TIMES        AVERAGE         MEDIAN         STDDEV
pylibmc                   2893        8.803ms        6.080ms        7.844ms
default                   3456        0.315ms        0.181ms        1.656ms
redis                     4754        3.697ms        1.786ms        5.784ms

Best Averages (shorter better)
###############################################################################
██████████████████████████████████████████████████████████████   8.803  pylibmc
██                                                               0.315  default
██████████████████████████                                       3.697  redis

Using pylibmc didn't make things much faster. What if we we pit memcached against pylibmc?:

                         TIMES        AVERAGE         MEDIAN         STDDEV
pylibmc                   3005        8.653ms        5.734ms        8.339ms
memcached                 2868        8.465ms        5.367ms        9.065ms

Best Averages (shorter better)
###############################################################################
█████████████████████████████████████████████████████████████  8.653  pylibmc
███████████████████████████████████████████████████████████    8.465  memcached

What about that fancy hiredis Redis Python driver that's supposedly faster?

                         TIMES        AVERAGE         MEDIAN         STDDEV
redis                     4074        5.628ms        2.262ms        8.300ms
hiredis                   4057        5.566ms        2.296ms        8.471ms

Best Averages (shorter better)
###############################################################################
███████████████████████████████████████████████████████████████  5.628  redis
██████████████████████████████████████████████████████████████   5.566  hiredis

These last two results are both surprising and suspicious. Perhaps the whole setup is wrong. Why wouldn't the C-based libraries be faster? Is it so incredibly dwarfed by the network I/O in the time between my EC2 node and the ElastiCache nodes?

In Conclusion

I personally like Redis. It's not as stable as memcached. On a personal server I've run for years the Redis server sometimes just dies due to corrupt memory and I've come to accept that. I don't think I've ever seen memcache do that.

But there are other benefits with Redis as a cache backend. With the django-redis library you have really easy access to the raw Redis connection and you can do much more advanced data structures. You can also cache certain things indefinitely. Redis also supports storing much larger strings than memcached (1MB for memcached and 512MB for Redis).

The conclusion is that Redis is faster than memcached by a factor of 2. Considering the other feature benefits you can get out of having a Redis server available, it's probably a good choice for your next Django project.

Bonus Feature

In big setups you most likely have a whole slur of web heads that are servers that do nothing by handle web requests. And these are configured to talk to databases and caches over the near network. However, so many of us have cheap servers on DigitalOcean or Linode where we run web servers, relational databases and cache servers all on the same machine. (I do. This blog is one of those where there is Nginx, Redis, memcached and PostgreSQL on a 4GB DigitalOcean NYC SSD machine).

So here's one last test where I installed a local Redis and a local memcached on the EC2 node itself:

$ cat .env | grep 127.0.0.1
MEMCACHED_LOCATION="127.0.0.1:11211"
REDIS_LOCATION="redis://127.0.0.1:6379/0"

Here are the results:

                         TIMES        AVERAGE         MEDIAN         STDDEV
memcached                 7366        3.456ms        1.380ms        5.678ms
default                   3716        0.263ms        0.189ms        1.002ms
redis                     5582        2.334ms        0.639ms        4.965ms

Best Averages (shorter better)
###############################################################################
█████████████████████████████████████████████████████████████  3.456  memcached
████                                                           0.263  default
█████████████████████████████████████████                      2.334  redis

The conclusion of that last benchmark is that Redis is still faster and it's roughly 1.8x faster to run these backends on the web head than to use ElastiCache. Perhaps that just goes to show how amazingly fast the AWS inter-datacenter fiber network is!

ElasticSearch 5 in Travis-CI

06 January 2017 0 comments   Web development, Linux, Python

https://github.com/peterbe/autocompeter/blob/66dcda64c15a2c4367104bdcb69190fa18a122e0/.travis.yml#L7-L12


tl;dr; Here's a working .travis.yml file that works with ElasticSearch 5.1.1

I had to jump through hoops to get Travis-CI to run with ElasticSearch 5.1.1 and I thought I'd share. If you just do:

services:
  - elasticsearch

This is from the Travis-CI documentation but this installs ElasticSearch 1.4. Not good enough. The instructions on the same page for using higher versions did not work for me.

To get a specific version you need to download it yourself and install it with dpkg -i but the problem is that if you want to use ElasticSearch version 5, you need to have Java 1.8. The short answer is that this is how you install Java 1.8:

addons:
  apt:
    packages:
      - oracle-java8-set-default

But now you need to sudo so you need to add sudo: true in your .travis.yml. Bummer, because it makes the build a bit slower. However, a necessary evil.

The critical line I use to install it is this:

curl -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.1.1.deb && \
sudo dpkg -i --force-confnew elasticsearch-5.1.1.deb && \
sudo service elasticsearch start

I thought I could "upgrade" the existing install, but that breaks thinks. In other words you have to remove the services: - elasticsearch line or else it can't upgrade.

Now, during debugging I was not getting errors on the line:

sudo service elasticsearch start

So I add this to be sure the right version got installed:

#!/bin/bash
curl -v http://localhost:9200/

and then I can see that the right version was installed. It should look something like this:

* About to connect() to localhost port 9200 (#0)
*   Trying 127.0.0.1... connected
> GET / HTTP/1.1
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> Host: localhost:9200
> Accept: */*
> 
< HTTP/1.1 200 OK
< content-type: application/json; charset=UTF-8
< content-length: 327
< 
{
  "name" : "m_acpqT",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "b4_KnK6KQmSx64C9o-81Ug",
  "version" : {
    "number" : "5.1.1",
    "build_hash" : "5395e21",
    "build_date" : "2016-12-06T12:36:15.409Z",
    "build_snapshot" : false,
    "lucene_version" : "6.3.0"
  },
  "tagline" : "You Know, for Search"
}
* Connection #0 to host localhost left intact
* Closing connection #0

Note the line that says "number" : "5.1.1",.

So, yay! Hopefully this will help someone else because it took me quite a while to get right.

Time to do concurrent CPU bound work

13 May 2016 3 comments   Python, Linux, MacOSX

https://docs.google.com/spreadsheets/d/1uCjMXKygM_SAxBBv8Wm-dBjGIs5YQR5wlYzOPavfArc/edit?usp=sharing


Did you see my blog post about Decorated Concurrency - Python multiprocessing made really really easy? If not, fear not. There, I'm demonstrating how I take a task of creating 100 thumbnails from a large JPG. First in serial, then concurrently, with a library called deco. The total time to get through the work massively reduces when you do it concurrently. No surprise. But what's interesting is that each individual task takes a lot longer. Instead of 0.29 seconds per image it took 0.65 seconds per image (...inside each dedicated processor).

The simple explanation, even from a layman like myself, must be that when doing so much more, concurrently, the whole operating system struggles to keep up with other little subtle tasks.

With deco you can either let Python's multiprocessing just use as many CPUs as your computer has (8 in the case of my Macbook Pro) or you can manually set it. E.g. @concurrent(processes=5) would spread the work across a max of 5 CPUs.

So, I ran my little experiment again for every number from 1 to 8 and plotted the results:

Time elapsed vs. work time

What to take away...

The blue bars is the time it takes, in total, from starting the program till the program ends. The lower the better.

The red bars is the time it takes, in total, to complete each individual task.

Meaning, when the number of CPUs is low you have to wait longer for all the work to finish and when the number of CPUs is high the computer needs more time to finish its work. This is an insight into over-use of operating system resources.

If the work is much much more demanding than this experiment (the JPG is only 3.3Mb and one thumbnail only takes 0.3 seconds to make) you might have a red bar on the far right that is too expensive for your server. Or worse, it might break things so that everything stops.

In conclusion...

Choose wisely. Be aware how "bound" the task is.

Also, remember that if the work of each individual task is too "light", the overhead of messing with multprocessing might actually cost more than it's worth.

The code

Here's the messy code I used:

import time
from PIL import Image
from deco import concurrent, synchronized
import sys

processes = int(sys.argv[1])
assert processes >= 1
assert processes <= 8


@concurrent(processes=processes)
def slow(times, offset):
    t0 = time.time()
    path = '9745e8.jpg'
    img = Image.open(path)
    size = (100 + offset * 20, 100 + offset * 20)
    img.thumbnail(size, Image.ANTIALIAS)
    img.save('thumbnails/{}.jpg'.format(offset), 'JPEG')
    t1 = time.time()
    times[offset] = t1 - t0


@synchronized
def run(times):
    for index in range(100):
        slow(times, index)

t0 = time.time()
times = {}
run(times)
t1 = time.time()
print "TOOK", t1-t0
print "WOULD HAVE TAKEN", sum(times.values())

UPDATE

I just wanted to verify that the experiment is valid that proves that CPU bound work hogs resources acorss CPUs that affects their individual performance.

Let's try to the similar but totally different workload of a Network bound task. This time, instead of resizing JPEGs, it waits for finishing HTTP GET requests.

Network bound

So clearly it makes sense. The individual work withing each process is not generally slowed down much. A tiny bit, but not much. Also, I like the smoothness of the curve of the blue bars going from left to right. You can clearly see that it's reverse logarithmic.

.git/info/exclude, .gitignore and ~/.gitignore_global

20 April 2016 2 comments   Linux, MacOSX


How did I not know about this until now?! .git/info/exlude is like .gitingore but yours to mess with. Thanks @willkg!

There are three ways to tell Git to ignore files.

.gitignore

A file you check in to the project. It's shared amongst developers on the project. It's just a plain text file where you write one line per file pattern that Git should not ask "Have you forgotten to check this in?"

Certain things that are good to put in there are...:

node_modules/
*.py[co]
.coverage

Ideally, this file should be as small as possible and every entry should confidently be something 100% of the developers on the team will want to ignore. If your particular editor has some convention for storing state or revision files, that does not belong on this file.

A reason to keep it short is that of purity and simplicity. Every edit of this file will require a git commit.

~/.gitignore_global

This is yours to keep and maintain. The file doesn't have to be in your home directory. (The ~/ is UNIX nomenclature for your OS user home directory). You can set it to be anything. Like:

$ git config --global core.excludesfile ~/projects/dotfiles/gitignore-global.txt

Here you put stuff you want to personally ignore in every Git project. New and old.

Good examples of things to put in it are...:

*~
.DS_Store
.env
settings/local.py
pip-log.txt

.git/info/exclude

This is a kinda mix between the two above mentioned ignore files. This is things only you want to ignore in a specific project. More or less "junk files" specific to a project. For example if you, in your Git clone, has some test scripts or a specific log file.

Suppose you have a little hack script or some specific config that is only applicable to the project at hand, this is where you add it. For example...:

run_webapp_uwsgi.sh
analyze_correlation_json_dumps.py

I hope this helps someone else who, like me, didn't know about .git/info/exclude until 2016.