Peterbe.com

Peter Bengtsson's blog

Filtered by Linux

Page 6

Local jed settings

April 19, 2013
4 comments Linux, macOS

(if you're wondering what you're doing here, jed is a hardcore text based editor for programmers)

Thanks to fellow Jed user and hacker Ullrich Horlacher I can now have local settings per directory.

I personally prefer 2 spaces in my Javascript. And thankfully most projects I work on agrees with that standard. However, I have one Mozilla project I work on which uses 4 spaces for indentation. So, what I've had to get used to to is to edit my ~/.jedrc every time I switch to work on that particular project. I change: variable C_INDENT = 2; to variable C_INDENT = 4; and then back again when switching to another project.

No more of that. Now I just add a file into the project root like this:

$ cd dev/airmozilla
$ cat .jed.sl
variable C_INDENT = 4;

And whenever I work on any file in that tree it applies the local override setting.

Here's how you can do that too:

First, put this code into your <your jed lib>/defaults.sl: (on my OSX, the jed lib is /usr/local/Cellar/jed/0.99-19/jed/lib/)

% load .jed.sl from current or parent directories
% but only if the user is the same
define load_local_config() {
  variable dir = getcwd();
  variable uid = getuid;
  variable jsl,st;
  while (dir != "/" and strlen(dir) > 1) {
    st = stat_file(dir);
    if (st == NULL) return;
    if (st.st_uid != uid) return;
    jsl = dir + "/.jed.sl";
    st = stat_file(jsl);
    if (st != NULL) {
      if (st.st_uid == uid) {
        pop(evalfile(jsl));
        return;
      }
    }
    dir = path_dirname(dir);
  }
}

Then add this to the bottom of your ~/.jedrc:

define startup_hook() {
  load_local_config(); % .jed.sl
}

Now, go into a directory where you want to make local settings, create a file called .jed.sl and fill it to your hearts content!

hastebinit - quickly paste snippets into hastebin.com

October 11, 2012
9 comments Python, Linux

I'm quite fond of hastebin.com. It's fast. It's reliable. And it's got nice keyboard shortcuts that work for my taste.

So, I created a little program to quickly throw things into hastebin. You can have one too:

First create ~/bin/hastebinit and paste in:


#!/usr/bin/python

import urllib2
import os
import json

URL = 'http://hastebin.com/documents'

def run(*args):
    if args:
        content = [open(x).read() for x in args]
        extensions = [os.path.splitext(x)[1] for x in args]
    else:
        content = [sys.stdin.read()]
        extensions = [None]

    for i, each in enumerate(content):
        req = urllib2.Request(URL, each)
        response = urllib2.urlopen(req)
        the_page = response.read()
        key = json.loads(the_page)['key']
        url = "http://hastebin.com/%s" % key
        if extensions[i]:
            url += extensions[i]
        print url


if __name__ == '__main__':
    import sys
    sys.exit(run(*sys.argv[1:]))

Then run: chmod +x ~/bin/hastebinit

Now you can do things like:

$ cat ~/myfile | hastebinit
$ hastebinit < ~/myfile
$ hastebinit ~/myfile myotherfile

Hopefully it'll one day help at least one more soul out there!

On the command line no one can hear you screen. Or can they?

May 3, 2012
2 comments Linux

This is how you check if a command (with or without any output) exited successfully or if it exited with something other than 0, in bash:

#!/bin/bash
./someprogram
WORKED=$?
if [ "$WORKED" != 0 ]; then
  echo "FAILED"
else
  echo "WORKED"
fi

But how do you inspect this on the command line? I actually don't know, until it hit me. The simplest possible solution:

$ ./someprogram && echo worked || echo failed

What a great low-tech solution. I just works. If you're on OSX, you can nerd it up a bit more:

$ ./someprogram && say worked || say failed

I'm running pgFouine right now on my server

April 19, 2012
0 comments Linux

pgFouine is a PostgreSQL log analyzer. You basically, configure your Postgres server to be very verbose about all statements. Then, you simply run the pgfouine.php command against the log file and it spits out a page like this:

www.peterbe.com/pgfouine.html

Running all this verbose logging will obviously slow down the database server a bit so I'm only going to be running this temporarily. The overhead is actually pretty small but it's also piling on quite a few bytes in terms of the size of the log file.

So, at the time of writing, it's been about 1 day running and it has captured about 70,000 queries (by the time you look at the file it might have gone up significantly). I haven't started actually looking at it in detail yet but it's clear that there's some use of the LIKE operator that Postgres spends most of its time on.

You can configure your pgFouine to filter on specific databases. I have not done so because I'm at the moment just interested in what the whole database server is getting up to. Most of these guilty queries comes from the Crosstips site. Maybe it's time to optimize the worst performing queries there a bit.

UPDATE

After running for 24 hours, I did some low-hanging fruit optimization to the biggest culprits and reset the logs. The first 24 hours report is still here: www.peterbe.com/pgfouine.1.html

UPDATE 2

I've stopped logging all queries now. The results are still there. I'm quite pleased with the results so far.

Secs sell! How frickin' fast this site is! (server side)

April 5, 2012
2 comments Linux, Web development, Django

This is part 2. Part 1 is here about how I managed to make this site fast.

The web framework powering this site is Django and in front of that is Nginx which serves all the static content (once before Amazon CloudFront CDN takes over) and all non-static traffic is passed on to a uWSGI daemon which is running 6 worker processes. The database that stores the content is PostgreSQL and all caching is done in Redis. Actually another Redis database is used for other things such as maintaining a quick look-up index of keywords to primary keys so that I can quickly mesh together blog posts by keywords.

However, as we all know the deciding factor of a web sites server-side speed is effectively the speed of the database or any other disk-bound I/O device. To remedy this I've set up some practical caching strategies which I'm quite happy with.

So, how fast is it? Here's an ab stress test against home page with 10,000 requests spread across 10 concurrent users:

Document Path:          /
Document Length:        73272 bytes

Concurrency Level:      10
Time taken for tests:   4.426 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      734250000 bytes
HTML transferred:       732720000 bytes
Requests per second:    2259.59 [#/sec] (mean)
Time per request:       4.426 [ms] (mean)
Time per request:       0.443 [ms] (mean, across all concurrent requests)
Transfer rate:          162022.11 [Kbytes/sec] received

I could probably make that 2,300 requests/second to 3,000 or 4,000 if I just increase the number of workers. However, that costs memory and since I'm currently running 19 other uWSGI workers on this server that all (all 25) in total take up a steady 1.4 Gb I don't feel like increasing that number much more. Besides since this site doesn't really get any traffic, I'm not so concerned about massive throughput on concurrent benchmarks but more about serving each and every page as fast as possible the few times it's called.

Every single page on this site is behind some sort of internal cache. The only time the PostgreSQL is involved is in rendering a page is when it's first requested after a comment has been entered or I've added (or edited) a new post. Thing is, I don't want to be inconvenienced by a stupid cache that forces me to wait an hour every time I change something. No, instead lots of Django database model signals are put in place that fire off cache invalidation when certain pieces of data is changed. You can see the code for that here.

So, for the home page for example: For each request, a small piece of Python code checks the Redis for what the latest comment add-date is and based on that tells the Django page_cache decorator to either render the page as normal or to serve the whole HTML payload from Redis. In other words, on a successful cache "hit" it actually needs two Redis look-ups. Even that could be improved and blindly just spare these look-ups by serving from the workers allocated Python memory instead but that would make things fragile, hard to unit test and it would only make the benchmarks faster which is not necessary.

The most important thing to optimize on a web site is the static content. Well, there's little point in serving the static content fast if it takes 3 seconds to say what static content to serve. Also, a fast website is likely to appear more favorable on the Google bot which effectively makes the site appear higher on Google searches.

In the next part, I'll try to share more in-depth technical bits and pieces of what I actually did although they're no secrets I think some of them are best practice and even senior web developers sometimes get them wrong.

How much faster is Nginx+gunicorn than Apache+mod_wsgi?

March 22, 2012
9 comments Linux, Django

Short answer: about 5%

I had a few minutes and wanted to see if changing from Apache + mod_wsgi to Nginx + gunicorn would make the otherwise slow site any faster. It's not this site but another Django site for work (which, by the way, doesn't have to be fast). It's slow because it doesn't cache any of the SQL queries.

# with Apache + mod_wsgi
$ ab -n 1000 -c 10 http://thelocaldomain/
...
Requests per second:    39 [#/sec] (mean)
...
# Uses about 110 Mb

That's after running multiple times and roughly averaging the requests per seconds.

# with Nginx + guncorn --workers=4
$ ab -n 1000 -c 10 http://thelocaldomain/
...
Requests per second:    41 [#/sec] (mean)
...
# uses about 70 Mb

So, if you want to make a site fast forget about how the code is being served until all the slow db I/O is taken care of properly.

Strange socket related error with supervisord

April 5, 2011
7 comments Linux

This took me a long time to figure out so I thought I'd share.

Basically, I'm a newbie supervisor administrator and I was setting up a new config and I kept getting these errors:


# supervisord -n
2011-04-04 17:25:11,700 CRIT Set uid to user 1000
2011-04-04 17:25:11,700 WARN Included extra file "/etc/supervisor/conf.d/gkc.conf" during parsing
Error: Cannot open an HTTP server: socket.error reported errno.ENOENT (2)
For help, use /usr/local/bin/supervisord -h

The reason was that in my config I had the line:


[unix_http_server]
file=/var/lib/tornado/run/gkc.sock

but the directory /var/lib/tornado/run didn't exist. Creating that solved the problem.

Lesson learned from all this is that when specifying locations of .sock files always make sure the directories exist and that the current user can write to them.

Bash tip of the day: ff

March 25, 2011
2 comments Linux

This is helping me sooo much that it would a crime not to share it. It's actually nothing fancy, just a very convenient thing that I've learned to get used to. ff is an executable script I use to find files in a git repository. Goes like this:


$ ff list
templates/operations/network-packing-list.html
templates/sales/list_orders.html
$ ff venue
templates/venues/venues-by-special.html
templates/venues/venues.html
templatetags/venue_extras.py
templatetags/venues_by_network_extras.py
tests/test_venues.py

It makes it easy to super quickly search for added files without having to use the slow find command which would also otherwise find backup files and other junk that isn't checked in.

To install it, create a file called ~/bin/ff and make it executable:


$ chmod +x ~/bin/ff

Then type this code in:


#!/usr/bin/python
import sys, os
args = sys.argv[1:]
i = False
if '-i' in args:
   i = True
   args.remove('-i')
pattern = args[-1]
extra_args = ''
if len(args) > 1:
   extra_args = ' '.join(args[:-1])
param = i and "-i" or ""
cmd = "git ls-files | grep %s %s '%s'" % (param, extra_args, pattern)
os.system(cmd)

ssl_session_cache in Nginx and the ab benchmark

December 31, 2010
2 comments DoneCal, Linux

A couple of days ago I wrote about how blazing fast the DoneCal API can be on HTTP (1,400 requests/second) and how much slower it becomes when doing the same benchmark over HTTPS. It was, as Chris Adams pointed out, possible to run ab with Keep-Alive on and after some reading up it's clear that it's a good idea to switch on shared ssl_session_cache so that Nginx's SSL TCP traffic can cache some handshakes.

With ssl_session_cache shared:SSL:10m :


 Requests per second:    112.14 [#/sec] (mean)

Same cache size but with -k on the ab loadtest:


Requests per second:    906.44 [#/sec] (mean)

I'm fairly sure that most browsers with use Keep-Alive connections so I guess it's realistic to use -k when running ab but since this is a test of an API it's perhaps more likely than not that clients (i.e. computer programs) don't use it. To be honest I'm not really sure but it never the less feels right to be able to use ssl_session_cache to boost my benchmark by 40%.

It's also worth noticing that when doing a HTTP benchmark it's CPU bound on the Tornado (Python) processes (I use 4). But when doing HTTPS it's CPU bound on the Nginx itself (I use 1 worker process).

Making output stay on stdout

May 18, 2010
0 comments Linux

This is fairly obvious stuff I guess but it has troubled me for a long time. Some programs on Linux don't spit out their results to stdout. Instead they start a little program similar to less. So what is a console nerd to do?

Pipe it cat! I don't know why I've never thought of this before:


$ psql -l | cat