Peterbe.com

Peter Bengtsson's blog

Filtered by Python

How do you thousands-comma AND whitespace format a f-string in Python

March 17, 2024
0 comments Python

For some reason, I always forget how to do this. Tired of that. Let's blog about it so it sticks.

To format a number with thousand-commas you do:


>>> n = 1234567
>>> f"{n:,}"
'1,234,567'

To add whitespace to a string you do:


>>> name="peter"
>>> f"{name:<20}"
'peter               '

How to combine these in one expression, you do:


>>> n = 1234567
>>> f"{n:<15,}"
'1,234,567      '

Leibniz formula for π in Python, JavaScript, and Ruby

March 14, 2024
0 comments Python, JavaScript

Officially, I'm one day behind, but here's how you can calculate the value of π using the Leibniz formula.

Python


import math

sum = 0
estimate = 0
i = 0
epsilon = 0.0001
while abs(estimate - math.pi) > epsilon:
    sum += (-1) ** i / (2 * i + 1)
    estimate = sum * 4
    i += 1
print(
    f"After {i} iterations, the estimate is {estimate} and the real pi is {math.pi} "
    f"(difference of {abs(estimate - math.pi)})"
)

Outputs:

After 10000 iterations, the estimate is 3.1414926535900345 and the real pi is 3.141592653589793 (difference of 9.99999997586265e-05)

JavaScript


let sum = 0;
let estimate = 0;
let i = 0;
const epsilon = 0.0001;

while (Math.abs(estimate - Math.PI) > epsilon) {
  sum += (-1) ** i / (2 * i + 1);
  estimate = sum * 4;
  i += 1;
}
console.log(
  `After ${i} iterations, the estimate is ${estimate} and the real pi is ${Math.PI} ` +
    `(difference of ${Math.abs(estimate - Math.PI)})`
);

Outputs

After 10000 iterations, the estimate is 3.1414926535900345 and the real pi is 3.141592653589793 (difference of 0.0000999999997586265)

Ruby


sum = 0
estimate = 0
i = 0
epsilon = 0.0001
while (estimate - Math::PI).abs > epsilon
    sum += ((-1) ** i / (2.0 * i + 1))
    estimate = sum * 4
    i += 1
end
print(
    "After #{i} iterations, the estimate is #{estimate} and the real pi is #{Math::PI} "+
    "(difference of #{(estimate - Math::PI).abs})"
)

Outputs

After 10000 iterations, the estimate is 3.1414926535900345 and the real pi is 3.141592653589793 (difference of 9.99999997586265e-05)

Backwards

Technically, these little snippets are checking that it works since each language already has access to a value of π as a standard library constant.

If you don't have that, you can decide on a number of iterations, for example 1,000, and use that.

Python


sum = 0
for i in range(1000):
    sum += (-1) ** i / (2 * i + 1)
print(sum * 4)

JavaScript


let sum = 0;
for (const i of [...Array(10000).keys()]) {
  sum += (-1) ** i / (2 * i + 1);
}
console.log(sum * 4);

Ruby


sum = 0
for i in 0..10000
    sum += ((-1) ** i / (2.0 * i + 1))
end
puts sum * 4

Performance test

Perhaps a bit silly but also a fun thing to play with. Pull out hyperfine and compare Python 3.12, Node 20.11, Ruby 3.2, and Bun 1.0.30:


❯ hyperfine --warmup 10 "python3.12 ~/pi.py" "node ~/pi.js" "ruby ~/pi.rb" "bun run ~/pi.js"
Benchmark 1: python3.12 ~/pi.py
  Time (mean ± σ):      53.4 ms ±   7.5 ms    [User: 31.9 ms, System: 12.3 ms]
  Range (min … max):    41.5 ms …  64.8 ms    44 runs

Benchmark 2: node ~/pi.js
  Time (mean ± σ):      57.5 ms ±  10.6 ms    [User: 43.3 ms, System: 11.0 ms]
  Range (min … max):    46.2 ms …  82.6 ms    35 runs

Benchmark 3: ruby ~/pi.rb
  Time (mean ± σ):     242.1 ms ±  11.6 ms    [User: 68.4 ms, System: 37.2 ms]
  Range (min … max):   227.3 ms … 265.3 ms    11 runs

Benchmark 4: bun run ~/pi.js
  Time (mean ± σ):      32.9 ms ±   6.3 ms    [User: 14.1 ms, System: 10.0 ms]
  Range (min … max):    17.1 ms …  41.9 ms    60 runs

Summary
  bun run ~/pi.js ran
    1.62 ± 0.39 times faster than python3.12 ~/pi.py
    1.75 ± 0.46 times faster than node ~/pi.js
    7.35 ± 1.45 times faster than ruby ~/pi.rb

Comparing Pythons

Just because I have a couple of these installed:


❯ hyperfine --warmup 10 "python3.8 ~/pi.py" "python3.9 ~/pi.py" "python3.10 ~/pi.py" "python3.11 ~/pi.py" "python3.12 ~/pi.py"
Benchmark 1: python3.8 ~/pi.py
  Time (mean ± σ):      54.6 ms ±   8.1 ms    [User: 33.0 ms, System: 11.4 ms]
  Range (min … max):    40.0 ms …  69.7 ms    56 runs

Benchmark 2: python3.9 ~/pi.py
  Time (mean ± σ):      54.9 ms ±   8.0 ms    [User: 32.2 ms, System: 12.3 ms]
  Range (min … max):    42.3 ms …  70.1 ms    38 runs

Benchmark 3: python3.10 ~/pi.py
  Time (mean ± σ):      54.7 ms ±   7.5 ms    [User: 33.0 ms, System: 11.8 ms]
  Range (min … max):    42.3 ms …  78.1 ms    44 runs

Benchmark 4: python3.11 ~/pi.py
  Time (mean ± σ):      53.8 ms ±   6.0 ms    [User: 32.7 ms, System: 13.0 ms]
  Range (min … max):    44.8 ms …  70.3 ms    42 runs

Benchmark 5: python3.12 ~/pi.py
  Time (mean ± σ):      53.0 ms ±   6.4 ms    [User: 31.8 ms, System: 12.3 ms]
  Range (min … max):    43.8 ms …  63.5 ms    42 runs

Summary
  python3.12 ~/pi.py ran
    1.02 ± 0.17 times faster than python3.11 ~/pi.py
    1.03 ± 0.20 times faster than python3.8 ~/pi.py
    1.03 ± 0.19 times faster than python3.10 ~/pi.py
    1.04 ± 0.20 times faster than python3.9 ~/pi.py

How to avoid a count query in Django if you can

February 14, 2024
1 comment Django, Python

Suppose you have a complex Django QuerySet query that is somewhat costly (in other words slow). And suppose you want to return:

The first N results
A count of the total possible results

So your implementation might be something like this:


def get_results(queryset, fields, size):
    count = queryset.count()
    results = []
    for record in queryset.values(*fields)[:size]
        results.append(record)
    return {"count": count, "results": results}

That'll work. If there are 1,234 rows in your database table that match those specific filters, what you might get back from this is:


>>> results = get_results(my_queryset, ("name", "age"), 5)
>>> results["count"]
1234
>>> len(results["results"])
5

Or, if the filters would only match 3 rows in your database table:


>>> results = get_results(my_queryset, ("name", "age"), 5)
>>> results["count"]
3
>>> len(results["results"])
3

Between your Python application and your database you'll see:

query 1: SELECT COUNT(*) FROM my_database WHERE ...
query 2: SELECT name, age FROM my_database WHERE ... LIMIT 5

The problem with this is that, in the latter case, you had to send two database queries when all you needed was one.
If you knew it would only match a tiny amount of records, you could do this:


def get_results(queryset, fields, size):
-   count = queryset.count()
    results = []
    for record in queryset.values(*fields)[:size]:
        results.append(record)
+   count = len(results)
    return {"count": count, "results": results}

But that is wrong. The count would max out at whatever the size is.

The solution is to try to avoid the potentially unnecessary .count() query.


def get_results(queryset, fields, size):
    count = 0
    results = []
    for i, record in enumerate(queryset.values(*fields)[: size + 1]):
        if i == size:
            # Alas, there are more records than the pagination
            count = queryset.count()
            break
        count = i + 1
        results.append(record)
    return {"count": count, "results": results}

This way, you only incur one database query when there wasn't that much to find, but if there was more than what the pagination called for, you have to incur that extra database query.

Pip-Outdated.py with interactive upgrade

September 21, 2023
0 comments Python

Last year I wrote a nifty script called Pip-Outdated.py "Pip-Outdated.py - a script to compare requirements.in with the output of pip list --outdated". It basically runs pip list --outdated but filters based on the packages mentioned in your requirements.in. For people familiar with Node, it's like checking all installed packages in node_modules if they have upgrades, but filter it down by only those mentioned in your package.json.

I use this script often enough that I added a little interactive input to ask if it should edit requirements.in for you for each possible upgrade. Looks like this:


❯ Pip-Outdated.py
black               INSTALLED: 23.7.0    POSSIBLE: 23.9.1
click               INSTALLED: 8.1.6     POSSIBLE: 8.1.7
elasticsearch-dsl   INSTALLED: 7.4.1     POSSIBLE: 8.9.0
fastapi             INSTALLED: 0.101.0   POSSIBLE: 0.103.1
httpx               INSTALLED: 0.24.1    POSSIBLE: 0.25.0
pytest              INSTALLED: 7.4.0     POSSIBLE: 7.4.2

Update black from 23.7.0 to 23.9.1? [y/N/q] y
Update click from 8.1.6 to 8.1.7? [y/N/q] y
Update elasticsearch-dsl from 7.4.1 to 8.9.0? [y/N/q] n
Update fastapi from 0.101.0 to 0.103.1? [y/N/q] n
Update httpx from 0.24.1 to 0.25.0? [y/N/q] n
Update pytest from 7.4.0 to 7.4.2? [y/N/q] y

and then,


❯ git diff requirements.in | cat
diff --git a/requirements.in b/requirements.in
index b7a246e..0e996e5 100644
--- a/requirements.in
+++ b/requirements.in
@@ -9,7 +9,7 @@ python-decouple==3.8
 fastapi==0.101.0
 uvicorn[standard]==0.23.2
 selectolax==0.3.16
-click==8.1.6
+click==8.1.7
 python-dateutil==2.8.2
 gunicorn==21.2.0
 # I don't think this needs `[secure]` because it's only used by
@@ -18,7 +18,7 @@ requests==2.31.0
 cachetools==5.3.1

 # Dev things
-black==23.7.0
+black==23.9.1
 flake8==6.1.0
-pytest==7.4.0
+pytest==7.4.2
 httpx==0.24.1

That's it. Then if you want to actually make these upgrades you run:


❯ pip-compile --generate-hashes requirements.in && pip install -r requirements.txt

To install it, download the script from: https://gist.github.com/peterbe/a2b158c39f1f835c0977c82befd94cdf
and put it in your ~/bin and make it executable.
Now go into a directory that has a requirements.in and run Pip-Outdated.py

Pip-Outdated.py - a script to compare requirements.in with the output of pip list --outdated

December 22, 2022
0 comments Python

Simply by posting this, there's a big chance you'll say "Hey! Didn't you know there's already a well-known script that does this? Better." Or you'll say "Hey! That'll save me hundreds of seconds per year!"

The problem

Suppose you have a requirements.in file that is used, by pip-compile to generate the requirements.txt that you actually install in your Dockerfile or whatever server deployment. The requirements.in is meant to be the human-readable file and the requirements.txt is for the computers. You manually edit the version numbers in the requirements.in and then run pip-compile --generate-hashes requirements.in to generate a new requirements.txt. But the "first-class" packages in the requirements.in aren't the only packages that get installed. For example:

▶ cat requirements.in | rg '==' | wc -l
      54

▶ cat requirements.txt | rg '==' | wc -l
     102

In other words, in this particular example, there are 76 "second-class" packages that get installed. There might actually be more stuff installed that you didn't describe. That's why pip list | wc -l can be even higher. For example, you might have locally and manually done pip install ipython for a nicer interactive prompt.

The solution

The command pip list --outdated will list packages based on the requirements.txt not the requirements.in. To mitigate that, I wrote a quick Python CLI script that combines the output of pip list --outdated with the packages mentioned in requirements.in:


#!/usr/bin/env python

import subprocess


def main(*args):
    if not args:
        requirements_in = "requirements.in"
    else:
        requirements_in = args[0]
    required = {}
    with open(requirements_in) as f:
        for line in f:
            if "==" in line:
                package, version = line.strip().split("==")
                package = package.split("[")[0]
                required[package] = version

    res = subprocess.run(["pip", "list", "--outdated"], capture_output=True)
    if res.returncode:
        raise Exception(res.stderr)

    lines = res.stdout.decode("utf-8").splitlines()
    relevant = [line for line in lines if line.split()[0] in required]

    longest_package_name = max([len(x.split()[0]) for x in relevant]) if relevant else 0

    for line in relevant:
        p, installed, possible, *_ = line.split()
        if p in required:
            print(
                p.ljust(longest_package_name + 2),
                "INSTALLED:",
                installed.ljust(9),
                "POSSIBLE:",
                possible,
            )


if __name__ == "__main__":
    import sys

    sys.exit(main(*sys.argv[1:]))

Installation

To install this, you can just download the script and run it in any directory that contains a requirements.in file.

Or you can install it like this:

curl -L https://gist.github.com/peterbe/099ad364657b70a04b1d65aa29087df7/raw/23fb1963b35a2559a8b24058a0a014893c4e7199/Pip-Outdated.py > ~/bin/Pip-Outdated.py
chmod +x ~/bin/Pip-Outdated.py

Pip-Outdated.py

Join a list with a bitwise or operator in Python

August 22, 2022
0 comments Python

The bitwise OR operator in Python is often convenient when you want to combine multiple things into one thing. For example, with the Django ORM you might do this:


from django.db.models import Q

filter_ = Q(first_name__icontains="peter") | Q(first_name__icontains="ashley")

for contact in Contact.objects.filter(filter_):
    print((contact.first_name, contact.last_name))

See how it hardcodes the filtering on strings peter and ashley.
But what if that was a bit more complicated:


from django.db.models import Q

filter_ = Q(first_name__icontains="peter")
if include("ashley"):
    filter_ | = Q(first_name__icontains="ashley")

for contact in Contact.objects.filter(filter_):
    print((contact.first_name, contact.last_name))

So far, same functionality.

But what if the business logic is more complicated? You can't do this:


filter_ = None
if include("peter"):
    filter_ | = Q(first_name__icontains="peter")  # WILL NOT WORK
if include("ashley"):
    filter_ | = Q(first_name__icontains="ashley")

for contact in Contact.objects.filter(filter_):
    print((contact.first_name, contact.last_name))

What if the list of things you want to filter on depends on a list? You'd need to do the |= stuff "dynamically". One way to solve that is with functools.reduce. Suppose the list of things you want to bitwise-OR together is a list:


from django.db.models import Q
from operator import or_
from functools import reduce


def include(_):
    import random
    return random.random() > 0.5

filters = []
if include("peter"):
    filters.append(Q(first_name__icontains="peter"))
if include("ashley"):
    filters.append(Q(first_name__icontains="ashley"))

assert len(filters), "must have at least one filter"
filter_ = reduce(or_, filters)  # THE MAGIC!

for contact in Contact.objects.filter(filter_):
    print((contact.first_name, contact.last_name))

And finally, if it's a list already:


from django.db.models import Q
from operator import or_
from functools import reduce

names = ["peter", "ashley"]
qs = [Q(first_name__icontains=x) for x in names]
filter_ = reduce(or_, qs)

for contact in Contact.objects.filter(filter_):
    print((contact.first_name, contact.last_name))

Side note

Django's django.db.models.Q is actually quite flexible with used with MyModel.objects.filter(...) because this actually works:


from django.db.models import Q

def include(_):
    import random
    return random.random() > 0.5

filter_ = Q()  # MAGIC SAUCE
if include("peter"):
    filter_ |= Q(first_name__icontains="peter")
if include("ashley"):
    filter_ |= Q(first_name__icontains="ashley")

for contact in Contact.objects.filter(filter_):
    print((contact.first_name, contact.last_name))

Previous page