How to avoid a count query in Django if you can

February 14, 2024
1 comment Django, Python

Suppose you have a complex Django QuerySet query that is somewhat costly (in other words slow). And suppose you want to return:

  1. The first N results
  2. A count of the total possible results

So your implementation might be something like this:


def get_results(queryset, fields, size):
    count = queryset.count()
    results = []
    for record in queryset.values(*fields)[:size]
        results.append(record)
    return {"count": count, "results": results}

That'll work. If there are 1,234 rows in your database table that match those specific filters, what you might get back from this is:


>>> results = get_results(my_queryset, ("name", "age"), 5)
>>> results["count"]
1234
>>> len(results["results"])
5

Or, if the filters would only match 3 rows in your database table:


>>> results = get_results(my_queryset, ("name", "age"), 5)
>>> results["count"]
3
>>> len(results["results"])
3

Between your Python application and your database you'll see:

query 1: SELECT COUNT(*) FROM my_database WHERE ...
query 2: SELECT name, age FROM my_database WHERE ... LIMIT 5

The problem with this is that, in the latter case, you had to send two database queries when all you needed was one.
If you knew it would only match a tiny amount of records, you could do this:


def get_results(queryset, fields, size):
-   count = queryset.count()
    results = []
    for record in queryset.values(*fields)[:size]:
        results.append(record)
+   count = len(results)
    return {"count": count, "results": results}

But that is wrong. The count would max out at whatever the size is.

The solution is to try to avoid the potentially unnecessary .count() query.


def get_results(queryset, fields, size):
    count = 0
    results = []
    for i, record in enumerate(queryset.values(*fields)[: size + 1]):
        if i == size:
            # Alas, there are more records than the pagination
            count = queryset.count()
            break
        count = i + 1
        results.append(record)
    return {"count": count, "results": results}

This way, you only incur one database query when there wasn't that much to find, but if there was more than what the pagination called for, you have to incur that extra database query.

Pip-Outdated.py with interactive upgrade

September 21, 2023
0 comments Python

Last year I wrote a nifty script called Pip-Outdated.py "Pip-Outdated.py - a script to compare requirements.in with the output of pip list --outdated". It basically runs pip list --outdated but filters based on the packages mentioned in your requirements.in. For people familiar with Node, it's like checking all installed packages in node_modules if they have upgrades, but filter it down by only those mentioned in your package.json.

I use this script often enough that I added a little interactive input to ask if it should edit requirements.in for you for each possible upgrade. Looks like this:


❯ Pip-Outdated.py
black               INSTALLED: 23.7.0    POSSIBLE: 23.9.1
click               INSTALLED: 8.1.6     POSSIBLE: 8.1.7
elasticsearch-dsl   INSTALLED: 7.4.1     POSSIBLE: 8.9.0
fastapi             INSTALLED: 0.101.0   POSSIBLE: 0.103.1
httpx               INSTALLED: 0.24.1    POSSIBLE: 0.25.0
pytest              INSTALLED: 7.4.0     POSSIBLE: 7.4.2

Update black from 23.7.0 to 23.9.1? [y/N/q] y
Update click from 8.1.6 to 8.1.7? [y/N/q] y
Update elasticsearch-dsl from 7.4.1 to 8.9.0? [y/N/q] n
Update fastapi from 0.101.0 to 0.103.1? [y/N/q] n
Update httpx from 0.24.1 to 0.25.0? [y/N/q] n
Update pytest from 7.4.0 to 7.4.2? [y/N/q] y

and then,


❯ git diff requirements.in | cat
diff --git a/requirements.in b/requirements.in
index b7a246e..0e996e5 100644
--- a/requirements.in
+++ b/requirements.in
@@ -9,7 +9,7 @@ python-decouple==3.8
 fastapi==0.101.0
 uvicorn[standard]==0.23.2
 selectolax==0.3.16
-click==8.1.6
+click==8.1.7
 python-dateutil==2.8.2
 gunicorn==21.2.0
 # I don't think this needs `[secure]` because it's only used by
@@ -18,7 +18,7 @@ requests==2.31.0
 cachetools==5.3.1

 # Dev things
-black==23.7.0
+black==23.9.1
 flake8==6.1.0
-pytest==7.4.0
+pytest==7.4.2
 httpx==0.24.1

That's it. Then if you want to actually make these upgrades you run:


❯ pip-compile --generate-hashes requirements.in && pip install -r requirements.txt

To install it, download the script from: https://gist.github.com/peterbe/a2b158c39f1f835c0977c82befd94cdf
and put it in your ~/bin and make it executable.
Now go into a directory that has a requirements.in and run Pip-Outdated.py

Pip-Outdated.py - a script to compare requirements.in with the output of pip list --outdated

December 22, 2022
0 comments Python

Simply by posting this, there's a big chance you'll say "Hey! Didn't you know there's already a well-known script that does this? Better." Or you'll say "Hey! That'll save me hundreds of seconds per year!"

The problem

Suppose you have a requirements.in file that is used, by pip-compile to generate the requirements.txt that you actually install in your Dockerfile or whatever server deployment. The requirements.in is meant to be the human-readable file and the requirements.txt is for the computers. You manually edit the version numbers in the requirements.in and then run pip-compile --generate-hashes requirements.in to generate a new requirements.txt. But the "first-class" packages in the requirements.in aren't the only packages that get installed. For example:

▶ cat requirements.in | rg '==' | wc -l
      54

▶ cat requirements.txt | rg '==' | wc -l
     102

In other words, in this particular example, there are 76 "second-class" packages that get installed. There might actually be more stuff installed that you didn't describe. That's why pip list | wc -l can be even higher. For example, you might have locally and manually done pip install ipython for a nicer interactive prompt.

The solution

The command pip list --outdated will list packages based on the requirements.txt not the requirements.in. To mitigate that, I wrote a quick Python CLI script that combines the output of pip list --outdated with the packages mentioned in requirements.in:


#!/usr/bin/env python

import subprocess


def main(*args):
    if not args:
        requirements_in = "requirements.in"
    else:
        requirements_in = args[0]
    required = {}
    with open(requirements_in) as f:
        for line in f:
            if "==" in line:
                package, version = line.strip().split("==")
                package = package.split("[")[0]
                required[package] = version

    res = subprocess.run(["pip", "list", "--outdated"], capture_output=True)
    if res.returncode:
        raise Exception(res.stderr)

    lines = res.stdout.decode("utf-8").splitlines()
    relevant = [line for line in lines if line.split()[0] in required]

    longest_package_name = max([len(x.split()[0]) for x in relevant]) if relevant else 0

    for line in relevant:
        p, installed, possible, *_ = line.split()
        if p in required:
            print(
                p.ljust(longest_package_name + 2),
                "INSTALLED:",
                installed.ljust(9),
                "POSSIBLE:",
                possible,
            )


if __name__ == "__main__":
    import sys

    sys.exit(main(*sys.argv[1:]))

Installation

To install this, you can just download the script and run it in any directory that contains a requirements.in file.

Or you can install it like this:

curl -L https://gist.github.com/peterbe/099ad364657b70a04b1d65aa29087df7/raw/23fb1963b35a2559a8b24058a0a014893c4e7199/Pip-Outdated.py > ~/bin/Pip-Outdated.py
chmod +x ~/bin/Pip-Outdated.py

Pip-Outdated.py

Join a list with a bitwise or operator in Python

August 22, 2022
0 comments Python

The bitwise OR operator in Python is often convenient when you want to combine multiple things into one thing. For example, with the Django ORM you might do this:


from django.db.models import Q

filter_ = Q(first_name__icontains="peter") | Q(first_name__icontains="ashley")

for contact in Contact.objects.filter(filter_):
    print((contact.first_name, contact.last_name))

See how it hardcodes the filtering on strings peter and ashley.
But what if that was a bit more complicated:


from django.db.models import Q

filter_ = Q(first_name__icontains="peter")
if include("ashley"):
    filter_ | = Q(first_name__icontains="ashley")

for contact in Contact.objects.filter(filter_):
    print((contact.first_name, contact.last_name))

So far, same functionality.

But what if the business logic is more complicated? You can't do this:


filter_ = None
if include("peter"):
    filter_ | = Q(first_name__icontains="peter")  # WILL NOT WORK
if include("ashley"):
    filter_ | = Q(first_name__icontains="ashley")

for contact in Contact.objects.filter(filter_):
    print((contact.first_name, contact.last_name))

What if the list of things you want to filter on depends on a list? You'd need to do the |= stuff "dynamically". One way to solve that is with functools.reduce. Suppose the list of things you want to bitwise-OR together is a list:


from django.db.models import Q
from operator import or_
from functools import reduce


def include(_):
    import random
    return random.random() > 0.5

filters = []
if include("peter"):
    filters.append(Q(first_name__icontains="peter"))
if include("ashley"):
    filters.append(Q(first_name__icontains="ashley"))

assert len(filters), "must have at least one filter"
filter_ = reduce(or_, filters)  # THE MAGIC!

for contact in Contact.objects.filter(filter_):
    print((contact.first_name, contact.last_name))

And finally, if it's a list already:


from django.db.models import Q
from operator import or_
from functools import reduce

names = ["peter", "ashley"]
qs = [Q(first_name__icontains=x) for x in names]
filter_ = reduce(or_, qs)

for contact in Contact.objects.filter(filter_):
    print((contact.first_name, contact.last_name))

Side note

Django's django.db.models.Q is actually quite flexible with used with MyModel.objects.filter(...) because this actually works:


from django.db.models import Q

def include(_):
    import random
    return random.random() > 0.5

filter_ = Q()  # MAGIC SAUCE
if include("peter"):
    filter_ |= Q(first_name__icontains="peter")
if include("ashley"):
    filter_ |= Q(first_name__icontains="ashley")

for contact in Contact.objects.filter(filter_):
    print((contact.first_name, contact.last_name))

How to sort case insensitively with empty strings last in Django

April 3, 2022
1 comment Django, Python, PostgreSQL

Imagine you have something like this in Django:


class MyModel(models.Models):
    last_name = models.CharField(max_length=255, blank=True)
    ...

The most basic sorting is either: queryset.order_by('last_name') or queryset.order_by('-last_name'). But what if you want entries with a blank string last? And, you want it to be case insensitive. Here's how you do it:


from django.db.models.functions import Lower, NullIf
from django.db.models import Value


if reverse:
    order_by = Lower("last_name").desc()
else:
    order_by = Lower(NullIf("last_name", Value("")), nulls_last=True)


ALL = list(queryset.values_list("last_name", flat=True))
print("FIRST 5:", ALL[:5])
# Will print either...
#   FIRST 5: ['Zuniga', 'Zukauskas', 'Zuccala', 'Zoller', 'ZM']
# or 
#   FIRST 5: ['A', 'aaa', 'Abrams', 'Abro', 'Absher']
print("LAST 5:", ALL[-5:])
# Will print...
#   LAST 5: ['', '', '', '', '']

This is only tested with PostgreSQL but it works nicely.
If you're curious about what the SQL becomes, it's:


SELECT "main_contact"."last_name" FROM "main_contact" 
ORDER BY LOWER(NULLIF("main_contact"."last_name", '')) ASC

or


SELECT "main_contact"."last_name" FROM "main_contact" 
ORDER BY LOWER("main_contact"."last_name") DESC

Note that if your table columns is either a string, an empty string, or null, the reverse needs to be: Lower("last_name", nulls_last=True).desc().

How to close a HTTP GET request in Python before the end

March 30, 2022
0 comments Python

Does you server barf if your clients close the connection before it's fully downloaded? Well, there's an easy way to find out. You can use this Python script:


import sys
import requests

url = sys.argv[1]
assert '://' in url, url
r = requests.get(url, stream=True)
if r.encoding is None:
    r.encoding = 'utf-8'
for chunk in r.iter_content(1024, decode_unicode=True):
    break

I use the xh CLI tool a lot. It's like curl but better in some things. By default, if you use --headers it will make a regular GET request but close the connection as soon as it has gotten all the headers. E.g.

▶ xh --headers https://www.peterbe.com
HTTP/2.0 200 OK
cache-control: public,max-age=3600
content-type: text/html; charset=utf-8
date: Wed, 30 Mar 2022 12:37:09 GMT
etag: "3f336-Rohm58s5+atf5Qvr04kmrx44iFs"
server: keycdn-engine
strict-transport-security: max-age=63072000; includeSubdomains; preload
vary: Accept-Encoding
x-cache: HIT
x-content-type-options: nosniff
x-edge-location: usat
x-frame-options: SAMEORIGIN
x-middleware-cache: hit
x-powered-by: Express
x-shield: active
x-xss-protection: 1; mode=block

That's not be confused with doing HEAD like curl -I ....

So either with xh or the Python script above, you can get that same effect. It's a useful trick when you want to make sure your (async) server doesn't attempt to do weird stuff with the "Response" object after the connection has closed.

Previous page
Next page