Peterbe.com

A blog and website by Peter Bengtsson

Filtered home page!
Currently only showing blog entries under thecategory: Python. Clear filter

How to sort case insensitively with empty strings last in Django

Django, Python, PostgreSQL

Imagine you have something like this in Django:

class MyModel(models.Models):
    last_name = models.CharField(max_length=255, blank=True)
    ...

The most basic sorting is either: queryset.order_by('last_name') or queryset.order_by('-last_name'). But what if you want entries with a blank string last? And, you want it to be case insensitive. Here's how you do it:

from django.db.models.functions import Lower, NullIf
from django.db.models import Value


if reverse:
    order_by = Lower("last_name").desc()
else:
    order_by = Lower(NullIf("last_name", Value("")), nulls_last=True)


ALL = list(queryset.values_list("last_name", flat=True))
print("FIRST 5:", ALL[:5])
# Will print either...
#   FIRST 5: ['Zuniga', 'Zukauskas', 'Zuccala', 'Zoller', 'ZM']
# or 
#   FIRST 5: ['A', 'aaa', 'Abrams', 'Abro', 'Absher']
print("LAST 5:", ALL[-5:])
# Will print...
#   LAST 5: ['', '', '', '', '']

This is only tested with PostgreSQL but it works nicely.
If you're curious about what the SQL becomes, it's:

SELECT "main_contact"."last_name" FROM "main_contact" 
ORDER BY LOWER(NULLIF("main_contact"."last_name", '')) ASC

or

SELECT "main_contact"."last_name" FROM "main_contact" 
ORDER BY LOWER("main_contact"."last_name") DESC

Note that if your table columns is either a string, an empty string, or null, the reverse needs to be: Lower("last_name", nulls_last=True).desc().

Please post a comment if you have thoughts or questions.

How to close a HTTP GET request in Python before the end

Python

Does you server barf if your clients close the connection before it's fully downloaded? Well, there's an easy way to find out. You can use this Python script:

import sys
import requests

url = sys.argv[1]
assert '://' in url, url
r = requests.get(url, stream=True)
if r.encoding is None:
    r.encoding = 'utf-8'
for chunk in r.iter_content(1024, decode_unicode=True):
    break

I use the xh CLI tool a lot. It's like curl but better in some things. By default, if you use --headers it will make a regular GET request but close the connection as soon as it has gotten all the headers. E.g.

▶ xh --headers https://www.peterbe.com
HTTP/2.0 200 OK
cache-control: public,max-age=3600
content-type: text/html; charset=utf-8
date: Wed, 30 Mar 2022 12:37:09 GMT
etag: "3f336-Rohm58s5+atf5Qvr04kmrx44iFs"
server: keycdn-engine
strict-transport-security: max-age=63072000; includeSubdomains; preload
vary: Accept-Encoding
x-cache: HIT
x-content-type-options: nosniff
x-edge-location: usat
x-frame-options: SAMEORIGIN
x-middleware-cache: hit
x-powered-by: Express
x-shield: active
x-xss-protection: 1; mode=block

That's not be confused with doing HEAD like curl -I ....

So either with xh or the Python script above, you can get that same effect. It's a useful trick when you want to make sure your (async) server doesn't attempt to do weird stuff with the "Response" object after the connection has closed.

Please post a comment if you have thoughts or questions.

How to string pad a string in Python with a variable

Python

I just have to write this down because that's the rule; if I find myself googling something basic like this more than once, it's worth blogging about.

Suppose you have a string and you want to pad with empty spaces. You have 2 options:

>>> s = "peter"
>>> s.ljust(10)
'peter     '
>>> f"{s:<10}"
'peter     '

The f-string notation is often more convenient because it can be combined with other formatting directives.
But, suppose the number 10 isn't hardcoded like that. Suppose it's a variable:

>>> s = "peter"
>>> width = 11
>>> s.ljust(width)
'peter      '
>>> f"{s:<width}"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Invalid format specifier

Well, the way you need to do it with f-string formatting, when it's a variable like that is this syntax:

>>> f"{s:<{width}}"
'peter      '

Please post a comment if you have thoughts or questions.

TypeScript function keyword arguments like Python

Python, JavaScript

To do this in Python:

def print_person(name="peter", dob=1979):
    print(f"name={name}\tdob={dob}")


print_person() 
# prints: name=peter   dob=1979

print_person(name="Tucker")
# prints: name=Tucker  dob=1979

print_person(dob=2013)
# prints: name=peter   dob=2013

print_person(sex="boy")
# TypeError: print_person() got an unexpected keyword argument 'sex'

...in TypeScript:

function printPerson({
  name = "peter",
  dob = 1979
}: { name?: string; dob?: number } = {}) {
  console.log(`name=${name}\tdob=${dob}`);
}

printPerson();
// prints: name=peter  dob=1979

printPerson({});
// prints: name=peter  dob=1979

printPerson({ name: "Tucker" });
// prints: name=Tucker dob=1979

printPerson({ dob: 2013 });
// prints: name=peter  dob=2013


printPerson({ gender: "boy" })
// Error: Object literal may only specify known properties, and 'gender' 

Here's a Playground copy of it.

It's not a perfect "transpose" across the two languages but it's sufficiently similar.
The trick is that last = {} at the end of the function signature in TypeScript which makes it possible to omit keys in the passed-in object.

By the way, the pure JavaScript version of this is:

function printPerson({ name = "peter", dob = 1979 } = {}) {
  console.log(`name=${name}\tdob=${dob}`);
}

But, unlike Python and TypeScript, you get no warnings or errors if you'd do printPerson({ gender: "boy" }); with the JavaScript version.

Please post a comment if you have thoughts or questions.

How to install Python Poetry in GitHub Actions in MUCH faster way

Python

We use Poetry in a GitHub project. There's a pyproject.toml file (and a poetry.lock file) which with the help of the executable poetry gets you a very reliable Python environment. The only problem is that adding the poetry executable is slow. Like 10+ seconds slow. It might seem silly but in the project I'm working on, that 10+s delay is the slowest part of a GitHub Action workflow which needs to be fast because it's trying to post a comment on a pull request as soon as it possibly can.

Installing poetry being the slowest partt
First I tried caching $(pip cache dir) so that the underlying python -v pip install virtualenv -t $tmp_dir that install-poetry.py does would get a boost from avoiding network. The difference was negligible. I also didn't want to get too weird by overriding how the install-poetry.py works or even make my own hacky copy. I like being able to just rely on the snok/install-poetry GitHub Action to do its thing (and its future thing).

The solution was to cache the whole $HOME/.local directory. It's as simple as this:

- name: Load cached $HOME/.local
  uses: actions/cache@v2.1.6
  with:
    path: ~/.local
    key: dotlocal-${{ runner.os }}-${{ hashFiles('.github/workflows/pr-deployer.yml') }}

The key is important. If you do copy-n-paste this block of YAML to speed up your GitHub Action, please remember to replace .github/workflows/pr-deployer.yml with the name of your .yml file that uses this. It's important because otherwise, the cache might be overzealously hot when you make a change like:

       - name: Install Python poetry
-        uses: snok/install-poetry@v1.1.6
+        uses: snok/install-poetry@v1.1.7
         with:

...for example.

Now, thankfully install-poetry.py (which is the recommended way to install poetry by the way) can notice that it's already been created and so it can omit a bunch of work. The result of this is as follows:

A fast install poetry

From 10+ seconds to 2 seconds. And what's neat is that the optimization is very "unintrusive" because it doesn't mess with how the snok/install-poetry workflow works.

But wait, there's more!

If you dig up our code where we use poetry you might find that it does a bunch of other caching too. In particular, it caches .venv it creates too. That's relevant but ultimately unrelated. It basically caches the generated virtualenv from the poetry install command. It works like this:

- name: Load cached venv
  id: cached-poetry-dependencies
  uses: actions/cache@v2.1.6
  with:
    path: deployer/.venv
    key: venv-${{ runner.os }}-${{ hashFiles('**/poetry.lock') }}-${{ hashFiles('.github/workflows/pr-deployer.yml') }}

...

- name: Install deployer
  run: |
    cd deployer
    poetry install
  if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'

In this example, deployer is just the name of the directory, in the repository root, where we have all the Python code and the pyproject.toml etc. If you have yours at the root of the project you can just do: run: poetry install and in the caching step change it to: path: .venv.

Now, you get a really powerful complete caching strategy. When the caches are hot (i.e. no changes to the .yml, poetry.lock, or pyproject.toml files) you get the executable (so you can do poetry run ...) and all its dependencies in roughly 2 seconds. That'll be hard to beat!

Please post a comment if you have thoughts or questions.

An effective and immutable way to turn two Python lists into one

Python

tl;dr; To make 2 lists into 1 without mutating them use list1 + list2.

I'm blogging about this because today I accidentally complicated my own code. From now on, let's just focus on the right way.

Suppose you have something like this:

winners = [123, 503, 1001]
losers = [45, 812, 332]

combined = winners + losers

that will create a brand new list. To prove that it's immutable:

>>> combined.insert(0, 100)
>>> combined
[100, 123, 503, 1001, 45, 812, 332]
>>> winners
[123, 503, 1001]
>>> losers
[45, 812, 332]

What I originally did was:

winners = [123, 503, 1001]
losers = [45, 812, 332]

combined = [*winners, *losers]

This works the same and that syntax feels very JavaScript'y. E.g.

> var winners = [123, 503, 1001]
[ 123, 503, 1001 ]
> var losers = [45, 812, 332]
[ 45, 812, 332 ]
> var combined = [...winners, ...losers]
[ 123, 503, 1001, 45, 812, 332 ]
> combined.pop()
332
> losers
[ 45, 812, 332 ]

By the way, if you want to filter out duplicates, do this:

>>> a = [1, 2, 3]
>>> b = [2, 3, 4]
>>> list(dict.fromkeys(a + b))
[1, 2, 3, 4]

It's the most performant way to do it if the order is important.

And if you don't care about the order you can use this:

>>> a = [1, 2, 3]
>>> b = [2, 3, 4]
>>> list(set(a + b))
[1, 2, 3, 4]
>>> list(set(b + a))
[1, 2, 3, 4]

Please post a comment if you have thoughts or questions.