A blog and website by Peter Bengtsson
31 October 2015 0 comments PostgreSQL
We have lots of tables that weigh a lot. Some of the tables are partitions so they're called "mytable_20150901" and "mytable_20151001" etc.
To find out how much each table weighs you can use this query:
select table_name, pg_relation_size(table_name), pg_size_pretty(pg_relation_size(table_name)) from information_schema.tables where table_schema = 'public' order by 2 desc limit 10;
It'll give you an output like this:
table_name | pg_relation_size | pg_size_pretty --------------------------+------------------+---------------- raw_adi_logs | 14724538368 | 14 GB raw_adi | 14691426304 | 14 GB tcbs | 7173865472 | 6842 MB exploitability_reports | 6512738304 | 6211 MB reports_duplicates | 4428742656 | 4224 MB addresses | 4120412160 | 3930 MB missing_symbols_20150601 | 3264897024 | 3114 MB missing_symbols_20150608 | 3170762752 | 3024 MB missing_symbols_20150622 | 3039731712 | 2899 MB missing_symbols_20150615 | 2967281664 | 2830 MB (10 rows)
But as you can see in this example, it might be interesting to know what the sum is of all the
Without further ado, here's how you do that:
select table_name, total, pg_size_pretty(total) from ( select trim(trailing '_0123456789' from table_name) as table_name, sum(pg_relation_size(table_name)) as total from information_schema.tables where table_schema = 'public' group by 1 ) as agg order by 2 desc limit 10;
Then you'll get possibly very different results:
table_name | total | pg_size_pretty --------------------------+--------------+---------------- reports_user_info | 157111115776 | 146 GB reports_clean | 106995695616 | 100 GB reports | 100983242752 | 94 GB missing_symbols | 42231529472 | 39 GB raw_adi_logs | 14724538368 | 14 GB raw_adi | 14691426304 | 14 GB extensions | 12237242368 | 11 GB tcbs | 7173865472 | 6842 MB exploitability_reports | 6512738304 | 6211 MB signature_summary_uptime | 6027468800 | 5748 MB (10 rows)
You can read more about the trim() function here.
When you're using PostgreSQL for local development it's sometimes important to get an insight into ALL SQL that happens on the PostgreSQL server. Especially if you're trying to debug all the transaction action too.
To do this on OSX where you have PostgreSQL installed with Homebrew you have to do the following:
1. Locate the right
postgresql.conf file. On my computer this is in
/opt/boxen/homebrew/var/postgres/ but that might vary depending on how you set up Homebrew. Another easier way is to just start
psql and ask there:
$ psql psql (9.4.0) Type "help" for help. peterbe=# show config_file; config_file -------------------------------------------------- /opt/boxen/homebrew/var/postgres/postgresql.conf (1 row) peterbe=#
Open that file in your favorite editor.
2. Look for a line that looks like this:
#log_statement = 'all' # none, ddl, mod, all
Uncomment that line.
3. Now if you can't remember how to restart PostgreSQL on your system you can ask
$ brew info postgresql
Towards the end you'll see some files that look something like this:
$ launchctl unload ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist $ launchctl load ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist
4. To locate where PostgreSQL dumps all logging, you have to look to how Homebrew set it up. You can do that by opening
~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist. You'll find something like this:
That's it. You can now see all SQL going on by running:
$ tail -f /opt/boxen/homebrew/var/postgres/server.log
Remember to reverse this config change when you're done debugging because that
server.log file can quickly grow to an insane size since it's probably not log rotated.
Happy SQL debuggin'!
Yesterday I had the good fortune to present Crontabber to the The San Francisco Bay Area PostgreSQL Meetup Group organized by my friend Josh Berkus.
To spare you having to watch the whole presentation I'm going to summarize some of it here.
My colleague Lars also has a blog post about Crontabber and goes into a bit more details about the nitty gritty of using PostgreSQL.
So it's basically a script that gets started by UNIX crontab and runs every 5 minutes. Internally it keeps an index of all the apps it needs to run and it manages dependencies between jobs and is self-healing meaning that if something goes wrong during run-time it just retries again and again until it works. Amongst many of the juicy features it offers on top of regular "cron wrappers" is something we call "backfilling".
Backfill jobs are jobs that happen on a periodic basis and get given a date. If all is working this date is the same as "now()" but if something was to go wrong it remembers all the dates that did not work and re-attempts with those exact dates. That means that you can guarantee that the job gets started with every date possible even if it needs to catch up on past dates.
There's plenty of documentation on how to install and create jobs but it all starts with a simple
pip install crontabber.
Yes! It all started in early 2012 as a part of the Socorro code base and after some hard months of it stabalizing and maturing we decided to extract it out of Socorro and into its own project on GitHub under the Mozilla Public Licence 2.0 licence. Now it stands on its own legs and has no longer anything to do with Socorro and can be used for anything and anyone who has a lot of complicated cron jobs that need to run in Python with a PostgreSQL connection. In Socorro we use it primarily for executing stored procedures but that's just one type of app. You can also make it call out on to anything the command line with a "@with_subprocess" decorator.
No. It works really well for us but there's a decent list of features we want to add. The hope is that by open sourcing it we can get other organizations to adopt it and not only find bugs but also contribute code to add more juicy features.
One of the soon-to-come features we have in mind is to "internalize locking". At the moment you have to wrap it in a bash script that prevents it from being run concurrently. Crontabber is single-threaded and we don't have to worry about "dependent jobs" starting before "parent jobs" because the depdendencies and their state is stored in the database. But you might need to worry about the same job (the one due next) to be started concurrently. By internalizing the locking we can store, in the state database, that a particular job is being started on and thus not have to worry about it starting the same job twice.
I really hope this project can grow and continue to support us in our needs.
30 April 2014 0 comments PostgreSQL
If you run Homebrew on your OSX and you use that to install postgres you will have noted there's a new formula for Postgres 9.3(.4). Yay! (actually this was done many months ago but I'm slow to keep my brews upgraded)
When you run the upgrade you'll notice that
psql no longer works because the server can't start.
Bummer! But there's hope: This excellent blog post
That's all you need. ...unless you have some uncompatible library extension installed. E.g. json_enhancements
The problem is that you can't install
json_enhancements into a Postgres 9.3 server (json_enhancements is a backport from 9.3 to desperate people still on 9.2). And you can't do the upgrade because the new server has one less installed library. You'll get this after a failing
peterbe@mbp:~$ cat loadable_libraries.txt Could not load library "$libdir/json_enhancements" ERROR: could not access file "$libdir/json_enhancements": No such file or directory
I left some more details about this on the
pgsql-admin mailing list.
The trick that worked for me was to start the old 9.2 server like this:
/usr/local/Cellar/postgresql/9.2.2/bin/postgres -D /usr/local/var/postgres -p 9000
And now I can open
psql against the old data[base] and drop the extension:
$ psql -p 9000 mydatabase psql (9.3.4, server 9.2.2) mydatabase=# drop extension json_enhancements;
After I had done that I was able to successfully complete the
I hope this helps some other poor sucker stuck in the same chicken and egg situation.