Which came first?
Does it matter?

Do you train Kung Fu?
Or know someone who does?
Then check out KungFuPeople.com
Mobile version of this pageby : Thanks, I was looking for a quick copy/paste blog entry to help me find the...
Find largest directories with du -kby Jesse: I've had the same experience with Yakuake, and have tried Yeahconsole. Have...
Guake, not Yakuake or Yeahconsoleby : Hi Matt! In a script, start with #!/bin/bash instead of #!/bin/sh...
Read in passwords with bashby Peter Bengtsson: Interesting. So with nginx we can no do basic load balancing, proxy caching...
Nginx vs. Squidby Ryan Malayter: Nginx now includes proxy_cache. Haven't done any full benchmarks (micro-ben...
Nginx vs. Squidby : I have to disagree with Chris above, stderr can be redirected on its own wi...
Redirect stderr into becoming dots in Bashby K. Howe: Nice! Thanks....
Read in passwords with bashby Matt: I constantly am getting this error: read -s -p "Password: " passw read: 4...
Read in passwords with bashby code43: The post is on target. For most projects, one should concentrate on the co...
PostgreSQL, MySQL or SQLiteby Ulf: Hey. All men are frauds. The only difference between them is that some admi...
Read in passwords with bashYou're viewing blogs from Linux only.
I've been a big fan of Yakuake for a long time. It's a terminal you have open all the time in Linux that is shown and hidden, over any other windows, by a simply hit on the F12 button.
But as of more recent versions of Yakuake it has become really slow. It sometimes take 2-3 seconds from F12 press till you can type on the terminal. So I uninstalled it and tried Yeahconsole but I uninstalled it equally fast as I understood it was broken and didn't work at all despite being in the Xubuntu apt repositories.
Last but not least I ended up using Guake which not only works but also works really really fast. Screenshots here
Pagetest web page performance test is a great tool for doing what Firebug does but not in your browser. Pagetest can do repeated tests to iron out any outliers. An alternative is Pingdom tools which has some nifty sorting functions but is generally the same thing.
So I ran the homepage of my website on it and concluded that: Wow! Half the time is spent on DNS lookup!
The server it sits on is located here in London, UK and the Pagetest test was made from a server also here in the UK. Needless to say, I was disappointed. Is there anything I can do about that? I've spent so much time configuring Squid, Varnish and Nginx and yet the biggest chunk is DNS lookup.
In a pseudo-optimistic fashion I'm hoping it's because I've made the site so fast that this is what's left when you've done all you can do. I'm hoping to learn some more about this "dilemma" without having to read any lengthy manuals. Pointers welcomed.
I have a query that looks like this (simplified for the sake of brevity):
It basically finds other entries in a table (which has columns for latitude and longitude) but only returns those that are within a certain distance (from a known latitude/longitude point). Running this query on my small table takes about 7 milliseconds. (I used EXPLAIN ANALYZE)
So I thought, how about if I wrap it in a sub-select so that the function miles_between_lat_long() is only used once per row. Surely that would make it a lot faster. I accept that it wouldn't be twice as fast because wrapping it in a sub-select would also add some extra computation. Here's the "improved" version:
To test it I wrote a little script that randomly runs these two versions many many times (about 50 times) each and then compare the averages.
I've grown quite addicted to this and finding that it's saving me tonnes of milliseconds every day. First of all, I've made this little script and put it in my bin directory called '~/bin/gg':
#!/usr/bin/python
import sys, os
args = sys.argv[1:]
i = False
if '-i' in args:
i = True
args.remove('-i')
pattern = args[-1]
extra_args = ''
if len(args) > 1:
extra_args = ' '.join(args[:-1])
if i:
param = "-in"
else:
param = "-n"
cmd = "git grep %s %s '%s'" % (param, extra_args, pattern)
os.system(cmd)
Basically, it's just a lazy short hand for git grep ("Look for specified patterns in the working tree files"). Now I can do this:
peterbe@trillian:~/MoneyVillage2 $ gg getDIYPackURL Homesite.py:526: def getDIYPackURL(self): zpt/homepage/index_html.zpt:78: tal:attributes="href here/getDIYPackURL">Get your free trial here</ zpt/moneyconcerns/index_html.zpt:36: tal:attributes="href here/getDIYPackURL">Get your free trial h zpt/moneyconcerns/index_html.zpt:50: <p><a tal:attributes="href here/getDIYPackURL" class="makea (END)
It's not much faster than normal grep but it automatically filters out junk. Obviously doesn't help you when searching in files you haven't added yet.
This behavior bit me today and caused me some pain so hopefully by sharing it it can help someone else not ending up in the same pitfall.
Basically, I use Zope to manage a PostgreSQL database and since Zope is 100% transactional it rolls back queries when exception occur. That's great but what I didn't know is that when it rolls back it doesn't roll back the sequences. Makes sense in retrospect I guess. Here's a proof of that:
In my application I often use the sequences to predict what the auto generate new ID is going to be for things that the application can use such as redirecting or updating some other tables. As I wasn't expecting this it caused a bug in my web app.
I've now written my first Git hook. For the people who don't know what Git is you have either lived under a rock for the past few years or your not into computer programming at all.
The hook is a post-commit hook and what it does is that it sends the last commit message up to a twitter account I called "friedcode". I guess it's not entirely useful but for you who want to be loud about your work and the progress you make I guess it can make sense. Or if you're a team and you want to get a brief overview of what your team mates are up to. For me, it was mostly an experiment to try Git hooks and pytwitter. Here's how I did it:
We all know that Nginx is fast and very lightweight. We also know that Squid is very fast too. But which one is fastest?
In an insanely unscientific way I added some rewrite rules to my current Nginx -> Squid -> Zope stack so that for certain static content, Nginx could go straight to the filesystem (where the Zope product holds the static stuff) to bypass the proxy pass. Then I did a quick and simple benchmark with ab comparing how to get a 700 bytes GIF image:
squid: 2275.62 [#/sec] (mean) nginx: 7059.45 [#/sec] (mean)
For the impatient: Jed and Wing IDE are programming editors I use for my Python, Javascript, HTML, CSS editing. One is ultra-light, fast and simple. The other one is very feature full, commercial and slow (in comparison to Jed).
I've been using Jed now for several years on Linux. It's an "Emacs clone" 1 in that almost the same key bindings you have in Emacs work in Jed. A few weeks ago I started using Wing IDE 3.1 instead to see if I could learn to love it. I got a professional license as a gift for participating in the PyCon 2008 sprint by Wingware (the company behind Wing IDE). As of yesterday I've gone back to Jed but I haven't uninstalled Wing yet. Here are what I've learned from using both quite a bit. Note, I'm not comparing things that they both do equally well such as macros, community support and block indentation.
I have a Thinkpad T60p which is working really well for me. On it I've run various flavors of Ubuntu and as of a couple of weeks ago I put on Ubuntu 8.10 which has been working very well too except that I didn't get any options in the Preferences menu to switch off the damn loud system beep. The beep comes through the speakers and at a much much louder volume than any other sound or music.
I tried changing thins in BIOS and I tried installing various packages hoping one of them will give me the options. Finally I've found out how to disable it:
$ sudo modprobe -r pcspkr
This tip page showed me how to put it into /etc so it's applied all the time. Thanks Andy!
...over PostgreSQL?
I've just read through this document:MySQL vs PostgreSQL and it's obvious paragraph after paragraph that PostgreSQL is the better database. Performance, features and community are all in PostgreSQL's favor. There is almost nothing in MySQL's favor apart from obscure things like faster count(*) (without conditionals) and built in replication support. In the last two weeks I've also had the great fortune of playing with full textindexing in both MySQL and PostgreSQL and again, MySQL sucks ass and PostgreSQL (8.3) is really impressive and fast. (I've used both databases quite extensively over the past 8 years as a web developer)
I once heard that Google uses MySQL for its user database with a custom built transaction machine. And I read that Google engineers had donated some great code to the MySQL project. But why do they bother? What do they know that other engineers don't? And why is MySQL so popular with cheap stack-em-high LAMP hosting sites?
I do understand that PostgreSQL came off a bad start 5 years ago(ish) when it didn't support Windows which meant that newbies had to use MySQL and that stigma is still lingering but that was a very long time ago.
I guess it takes a lot of convincing to switch from one technology to another once you've set your mind on something. That's why we're human. A proof of this is shown if you scroll down to the bottom of this page there's a little simple survey and despite being on a long article with objective convincing arguments that PostgreSQL is better MySQL is doing quite well. Why?
I'm talking about Squid the web proxy cache server and Calamaris the Squid log file analyzer not food. Calamaris was a breeze to set up and now it emails me once a month a report summary of what Squid has done for my site each month. It's brilliant because it includes a piece of information that both really easy to understand and really useful no matter how technical you are:
Proxy statistics -------------------------------------------------------------------- Average speed increase: % 44.57
Nota bena: I've had to work on this. It used to be a lot lower before but I've worked on setting no-refresh on selected few resources and I've tweaked the Cache-Control headers here and there.
What people sometimes forget about Squid is that it can actually slow down your site too. On an individual object that you test with ab you can go from 50 requests/sec to 2000 requests/sec but for large sites with many many objects Squid has to do a lot of thinking to work out what to cache, not to cache, put in RAM, etc. Just installing Squid is not necessarily good enough. You have to massage and caress it till it starts to work for you. And to get to that Calamaris really helps.
Today I moved a bunch of sites over from Apache to Nginx but still keeping Squid in between as a http accelerator (I hope to replace Squid with Varnish soon). I did a quick benchmark of a HTML page that is cached by Squid, 4 times via Apache and 4 times via Nginx. The results:
Apache2 ******** Requests per second: 1601.34 [#/sec] (mean) Time per request: 6.268 [ms] (mean) Time per request: 0.627 [ms] (mean, across all concurrent requests) Transfer rate: 13020.50 [Kbytes/sec] received Nginx ******** Requests per second: 1810.02 [#/sec] (mean) Time per request: 5.6435 [ms] (mean) Time per request: 0.5645 [ms] (mean, across all concurrent requests) Transfer rate: 14591.35 [Kbytes/sec] received
That's "only" 13% faster and I had hoped for a bigger difference but the test is very simple and depends on how Squid feels. The other important test would be to see how much less CPU and memory Nginx uses during the stresstest period but that's for another day.
One note: This is Nginx 0.4.3 on Debian Etch. The current stable release is Nginx 0.6.13. I'll need to talk to my sys admins to remedy this. Perhaps it makes a difference on the benchmark, I don't know.
I often need to know the path to a file so that I can put that in an email for example. The only way I know is to copy and paste the output of pwd followed by a slash / followed by the name of the file. This is too much work so I wrote a quick bash script to combine this into one. Now I can do this:
$ cd bin $ pwdf todo.sh /home/peterbe/bin/todo.sh
I call it pwdf since it's pwd + file. Here's the code for the curious:
#!/bin/bash echo -n `pwd` echo -n '/' echo $1
Is there no easier way built in into Linux already?
First of all, I understand that the problem cron solves is a hard one but come on, it's been many years now without much progress. At least not in the usability field of cron jobs. Secondly, I don't know of an operating system that does this better. Perhaps there is one. All I'm saying here is that this aspect of Linux sucks. The issues I have with cron are:
Beef number 1
Is it root, user1 or user2 running a crontab job? I'll have to su into each suspected user and run crontab -l. Granted, some jobs require root access and others don't but it nevertheless makes it hard to find the configured jobs when maintaining someones server.
Beef number 2
Even though they do such a similar thing, it feels like /etc/cron.* is a different battlefield from crontab. Why can't this all be in one coherent place?
Beef number 3 The crontab syntax. How difficult would it be to allow an interface to accept user input as "every 10 minutes" or "01.30 every day"?
Beef number 4
With there being 12 different ways (sarcasm) to write cron job scripts there's no coherent place to collect all log and errors that happen from cron. Couldn't it be default to always write to /var/log/cron/access.log and all executions that cause a write to stderr could append to /var/log/cron/error.log
I don't think Anacron would make me any happier since the problem Anacron solves was not one of the problems I listed above. And lastly, I wouldn't be surprised if there's a semi-abandoned Open Source project on SourceForge that is user friendly but what I'm after is something to get into stock Linux. Kind of like apt/aptitude/dselect is for dpkg maybe?
My colleague Jan showed me how to do this so I'm going to blog about it to not forget and perhaps by being here other people might be able to search and find the solution too. I installed nginx because I wanted to play with it as an alternative to apache on my laptop. Now I've played enough and I'm going to want to remove it. My first attempt didn't work:
peterbe@trillian:~ $ sudo apt-get --purge remove nginx Reading package lists... Done Building dependency tree Reading state information... Done The following packages will be REMOVED: nginx* 0 upgraded, 0 newly installed, 1 to remove and 116 not upgraded. 1 not fully installed or removed. Need to get 0B of archives. After unpacking 528kB disk space will be freed. Do you want to continue [Y/n]? (Reading database ... 242827 files and directories currently installed.) Removing nginx ... Stopping nginx: invoke-rc.d: initscript nginx, action "stop" failed. dpkg: error processing nginx (--purge): subprocess pre-removal script returned error exit status 1 Starting nginx: invoke-rc.d: initscript nginx, action "start" failed. dpkg: error while cleaning up: subprocess post-installation script returned error exit status 1 Errors were encountered while processing: nginx E: Sub-process /usr/bin/dpkg returned an error code (1)
I tried this both before and after having stopped and started nginx. Nothing worked. The trick is to fiddle with the init script /etc/init.d/nginx and insert a exit 0 at the top so that it now starts like this:
#!/bin/sh exit 0
Once saved and you try apt-get --purge remove nginx it will work. It might warn you that /var/log/nginx aren't removed because they're not empty but you can safely remove them manually unless you want to keep them.
Because I always forget, here's how to check if a file exists before attempting to delete it in bash:
[ -f foobar.html ] && rm foobar.html
If you don't do it this way and the file doesn't exist you get this:
rm: cannot remove `foobar.html': No such file or directory
I've never really needed it but I've been looking for a tool that is super easy to use that quickly looks up the location of an IP address. The super easy tool is called hostip.info and this is what happens when you do a lookup on www.peterbe.com's IP at http://api.hostip.info/get_html.php?ip=80.68.212.7 :
Country: UNITED KINGDOM (UK) City: (Unknown city)
Cool. I'll remember that till next I really need this. I discovered hostip.info by reading this cool blog by Corey Goldberg
I've installed a lot of Zope instances on my laptop since version 2.7.3 and out of curiosity and desperate need for more hard drive space I thought I'd log rotate them all with the standard Linux logrotate program.
Before doing the log rotate, the total size of all my event.log files came to about 290Mb! After running logrotate (twice of course to go from event.log.1 to event.log.2.gz) the total size become 20Mb. Not a huge significance in the world of gigabyte hard drives but at least something.
There are lots of fancy programs for Linux to find out where your gigabytes are sitting and filling your hard drive, the simplest of them is du (from disk usage). The trick is to use the --max-depth=1 option so that you get a view of which folder weighs how much. Try this:
peterbe@trillian:~/tmp $ du -h --max-depth=1 900K ./Example-Receipts 4.0K ./Foredettinghelgen 44K ./IssueTrackerBlogInterface 1.9M ./IssueTrackerProduct 12K ./fried-mugshots 2.1M ./ies4linux-2.0.5 4.8M ./pyexcelerator 52K ./levenstein 4.0K ./newitpdesign 4.7M ./photoresizing 69M ./databases 4.5M ./i18nextract-sa 532M .
Pretty nifty! That way you can quickly see which folder contains the most junk so that you can free up some hard drive space.
To sort it I don't know how to reformat it into human readable values but there's the command:
peterbe@trillian:~/tmp $ du --max-depth=1 | sort -n 4 ./Foredettinghelgen 4 ./newitpdesign 12 ./fried-mugshots 44 ./IssueTrackerBlogInterface 52 ./levenstein 900 ./Example-Receipts 1856 ./IssueTrackerProduct 2140 ./ies4linux-2.0.5 4528 ./i18nextract-sa 4796 ./photoresizing 4872 ./pyexcelerator 70392 ./databases 544608 .
I booked an appointment for a computer repair today and there was a reference number and summary that I was told to print two copies of. One to go with the computer and one to keep myself as a reference. The page was one page long. So, in Firefox I clicked to print two copies. Our printer here at work is one of those that prints on both pages and that's where the problem lies. It printed the two copies one of each side. So only one piece of physical paper. Totally stupid.
I don't know what or who to blame for this. Is it my Firefox? My Dell printer? My Linux printer drivers? Surely there was a chance for someone working on this some time ago who chose not to think for a moment.
This is silly but fun. I can with one command on the command line start updating my Facebook profile. It's not using the Facebook Developer API but a PHP script I copied from some other blog I can't find right now. Here's how I use it:
peterbe@trillian:~ $ FacebookStatusUpdater Peter is happily blogged about his latest facebook status updater Updating Facebook...
It's an interactive prompt and starts with "Peter is " and then I write till I hit the Return and it gets uploaded and saved. See attached screenshot.
Long story short, if you need to compare floating point numbers against columns defined as REAL you need to first cast them to NUMERIC in PostgreSQL. And to compare equality between two numbers with different amount of significant figures you have to use ROUND().
My Ubuntu Linux on my work laptop works great but since I've strayed far away from the default options (own kernel, own window manager etc) some things sometimes don't work as expected. Such as Flash9. The problem I had was that there some some package in there that was broken for some reason: libswfdecmozilla.so
Here's what I did:
# cd /usr/lib/mozilla/plugins # rm libswfdecmozilla.so # wget http://download.macromedia.com/pub/labs/flashplayer9_\ update/FP9_plugin_beta_101806.tar.gz # aunpack FP9_plugin_beta_101806.tar.gz # mv flash-player-plugin-9.0.21.55/libflashplayer.so . # chmod +x libflashplayer.so
Start firefox and enter about:plugins and this is what you should see
That worked for me. Hopefully this will help somebody.
The help I got from this page which might also help people with a broken Java in Firefox but they don't say that you should delete the libswfdecmozilla.so driver.
UPDATE
There's a slightly more recent beta now. The November 2006 beta
UPDATE 2
Now there's a final release on adobe.com
Here's a nifty little command I used today to find where my hard drive was being most used:
du -k /home/peterbe/Documents/ | sort -n | tail -10
I'm sure there are even fancier methods and programs but this works pretty damn well. Here's what the output can look like:
root@trillian:~ # du -k /home/peterbe/Documents/ | sort -n | tail -10 4240 /home/peterbe/Documents/Kalle 4852 /home/peterbe/Documents/ChartDirector/lib 7756 /home/peterbe/Documents/ChartDirector/doc/cdpydoc 7764 /home/peterbe/Documents/ChartDirector/doc 13044 /home/peterbe/Documents/*** FONT _ ***/- Font Applications - 14704 /home/peterbe/Documents/ChartDirector 547940 /home/peterbe/Documents/*** FONT _ *** 2171000 /home/peterbe/Documents/MacOSXSoftware/Adobe Creative Suite 2 Premium 3262580 /home/peterbe/Documents/MacOSXSoftware 5694808 /home/peterbe/Documents/
I have an application where I need to resize huge digital camera pictures down to 800x600 pixels. To do this I used ImageMagick's convert program which I feel gives much better quality than Python PIL. To reduce the file size I make sure I use the -strip option to convert but the really interesting question was what quality option should I use?
Goal: the image should be as small (in bytes) as possible without too reduced in picture quality.
To get the optimal picture quality of course the right option is -quality 100 and to get the smallest file size I should use -quality 10. To find out what quality setting to use I converted an original image with the following command 10 times:
convert vase.jpg -strip -quality <X> -resize 800x600 vase.quality-95.jpg
where <X> is the varying value between 10 to 100.
My Firefox froze in one of the tabs when in another tab I had a long Fry-IT intranet blog half finished. To avoid having to rewrite the whole text again Jan showed me how to dump the RAM memory onto disk which I could then look through with standard tools. For this to work you have plenty of diskspace since the dump file is about 1Gb big:
$ sudo su - # df -h # cat /proc/kcore > /usr/kcore.dump # strings /usr/kcore.dump > /usr/kcore.strings # ls -lh | grep kcore -rw-r--r-- 1 root root 1016M 2006-10-30 10:18 kcore.dump -rw-r--r-- 1 root root 74M 2006-10-30 10:19 kcore.strings # less kcore.strings | grep 'Bla bla bla'
Was this the most boring blog item I've written in a long time? Maybe, but it's good to have it noted the next time Firefox crashes.
I'm not a bash expert. Now I need some help with some bash syntax.
I copied a function called get_key which takes a 1 character length string from the stty input and assigns it to a variable. It's nifty because I can prompt something like this:
Here's a whacky idea; in a shell script I want to convert every line of stderr into a . on the same line. I'm still a shell scripting newbie but I know that bash can be quite powerful if you know what you're doing to it.
To begin with I've written a little wrapper on cvs commit called ~/bin/cvs_commit which does the following:
#!/bin/sh cvs commit -m "$1" | grep -v '\.bak*' | grep -v '\.pyc'
Because cvs prints all folders it recursively traverses I if it's a big tree all of that ugly stuff is print to screen via stderr. To get rid of that, I've changed my command to:
#!/bin/sh cvs commit -m "$1" 2>> /dev/null \ | grep -v '\.bak*' | grep -v '\.pyc'
Let's improve even more...
Just figured out how to call my slim web service via XML-RPC using Ruby. It's as easy as in Python.
Here's the code:
And when you run this on the command line this is what you get: