Peterbe.com

A blog and website by Peter Bengtsson

Filtered home page!
Currently only showing blog entries under the category: Python. Clear filter

"ld: library not found for -lssl" trying to install mysqlclient in Python on macOS

05 February 2020 1 comment   Python, MacOSX


I don't know how many times I've encountered this but by blogging about it, hopefully, next time it'll help me, and you!, find this sooner.

If you get this:

clang -bundle -undefined dynamic_lookup -L/usr/local/opt/readline/lib -L/usr/local/opt/readline/lib -L/Users/peterbe/.pyenv/versions/3.8.0/lib -L/opt/boxen/homebrew/lib -L/usr/local/opt/readline/lib -L/usr/local/opt/readline/lib -L/Users/peterbe/.pyenv/versions/3.8.0/lib -L/opt/boxen/homebrew/lib -L/opt/boxen/homebrew/lib -I/opt/boxen/homebrew/include build/temp.macosx-10.14-x86_64-3.8/MySQLdb/_mysql.o -L/usr/local/Cellar/mysql/8.0.18_1/lib -lmysqlclient -lssl -lcrypto -o build/lib.macosx-10.14-x86_64-3.8/MySQLdb/_mysql.cpython-38-darwin.so
    ld: library not found for -lssl
    clang: error: linker command failed with exit code 1 (use -v to see invocation)
    error: command 'clang' failed with exit status 1

(The most important line is the ld: library not found for -lssl)

On most macOS systems, when trying to install a Python package that requires a binary compile step based on the system openssl (which I think comes from the OS), you'll get this.

The solution is simple, run this first:

export LDFLAGS="-L/usr/local/opt/openssl/lib"
export CPPFLAGS="-I/usr/local/opt/openssl/include"

Depending on your install of things, you might need to adjust this accordingly. For me, I have:

▶ ls -l /usr/local/opt/openssl/
total 1272
-rw-r--r--   1 peterbe  staff     717 Sep 10 09:13 AUTHORS
-rw-r--r--   1 peterbe  staff  582924 Dec 19 11:32 CHANGES
-rw-r--r--   1 peterbe  staff     743 Dec 19 11:32 INSTALL_RECEIPT.json
-rw-r--r--   1 peterbe  staff    6121 Sep 10 09:13 LICENSE
-rw-r--r--   1 peterbe  staff   42183 Sep 10 09:13 NEWS
-rw-r--r--   1 peterbe  staff    3158 Sep 10 09:13 README
drwxr-xr-x   4 peterbe  staff     128 Dec 19 11:32 bin
drwxr-xr-x   3 peterbe  staff      96 Sep 10 09:13 include
drwxr-xr-x  10 peterbe  staff     320 Sep 10 09:13 lib
drwxr-xr-x   4 peterbe  staff     128 Sep 10 09:13 share

Now, with those things set you should hopefully be able to do things like:

pip install mysqlclient

How to pad/fill a string by a variable in Python using f-strings

24 January 2020 8 comments   Python


I often find myself Googling for this. Always a little bit embarrassed that I can't remember the incantation (syntax).

Suppose you have a string mystr that you want to fill with with spaces so it's 10 characters wide:

>>> mystr = 'peter'
>>> mystr.ljust(10)
'peter     '
>>> mystr.rjust(10)
'     peter'

Now, with "f-strings" you do:

>>> mystr = 'peter'
>>> f'{mystr:<10}'
'peter     '
>>> f'{mystr:>10}'
'     peter'

What also trips me up is, suppose that the number 10 is variable. I.e. it's not hardcoded into the f-string but a variable from somewhere else. Here's how you do it:

>>> width = 10
>>> f'{mystr:<{width}}'
'peter     '
>>> f'{mystr:>{width}}'
'     peter'

What I haven't figured out yet, is how you specify a different character than a simple single whitespace. I.e. does anybody know how to do this, but with f-strings:

>>> width = 10
>>> mystr.ljust(width, '*')
'peter*****'

UPDATE

First of all, I left two questions unanswered. One was how do you make the filler something other than ' '. The answer is:

>>> f'{"peter":*<10}'
'peter*****'

The question question was, what if you don't know what the filler character should be. In the above example, * was hardcoded inside the f-string. The solution is stunningly simple actually.

>>> width = 10
>>> filler = '*'
>>> f'{"peter":{filler}<{width}}'
'peter*****'

But note, it has to be a single length string. This is what happens if you try to make it a longer string:

>>> filler = 'xxx'
>>> f'{"peter":{filler}<{width}}'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Invalid format specifier

JavaScript destructuring like Python kwargs with defaults

18 January 2020 0 comments   Python, JavaScript


In Python

I'm sure it's been blogged about a buncha times before but, I couldn't find it, and I had to search too hard to find an example of this. Basically, what I'm trying to do is what Python does in this case, but in JavaScript:

def do_something(arg="notset", **kwargs):
    print(f"arg='{arg.upper()}'")

do_something(arg="peter")
do_something(something="else")
do_something()

In Python, the output of all this is:

arg='PETER'
arg='NOTSET'
arg='NOTSET'

It could also have been implemented in a more verbose way:

def do_something(**kwargs):
    arg = kwargs.get("arg", "notset")
    print(f"arg='{arg.upper()}'")

This more verbose format has the disadvantage that you can't quickly skim it and see and what the default is. That thing (arg = kwargs.get("arg", "notset")) might happen far away deeper in the function, making it hard work to spot the default.

In JavaScript

Here's the equivalent in JavaScript (ES6?):

function doSomething({ arg = "notset", ...kwargs } = {}) {
  return `arg='${arg.toUpperCase()}'`;
}

console.log(doSomething({ arg: "peter" }));
console.log(doSomething({ something: "else" }));
console.log(doSomething());

Same output as in Python:

arg='PETER'
arg='NOTSET'
arg='NOTSET'

Notes

I'm still not convinced I like this syntax. It feels a bit too "hip" and too one-liner'y. But it's also pretty useful.

Mind you, the examples here are contrived because they're so short in terms of the number of arguments used in the function.
A more realistic thing like be a function that lists, upfront, all the possible parameters and for some of them, it wants to point out some defaults. E.g.

function processFolder({
  source,
  destination = "/tmp",
  quiet = false,
  verbose = false
} = {}) {
  console.log({ source, destination, quiet, verbose });
  // outputs
  // { source: '/user', destination: '/tmp', quiet: true, verbose: false }
}

console.log(processFolder({ source: "/user", quiet: true }));

One could maybe argue that arguments that don't have a default are expected to always be supplied so they can be regular arguments like:

function processFolder(source, {
  destination = "/tmp",
  quiet = false,
  verbose = false
} = {}) {
  console.log({ source, destination, quiet, verbose });
  // outputs
  // { source: '/user', destination: '/tmp', quiet: true, verbose: false }
}

console.log(processFolder("/user", { quiet: true }));

But, I quite like keeping all arguments in an object. It makes it easier to write wrapper functions and I find this:

setProfile(
  "My biography here",
  false,
  193.5,
  230,
  ["anders", "bengt"],
  "South Carolina"
);

...harder to read than...

setProfile({
  bio: "My biography here",
  dead: false,
  height: 193.5,
  weight: 230,
  middlenames: ["anders", "bengt"],
  state: "South Carolina"
});

How to have default/initial values in a Django form that is bound and rendered

10 January 2020 6 comments   Web development, Django, Python


Django's Form framework is excellent. It's intuitive and versatile and, best of all, easy to use. However, one little thing that is not so intuitive is how do you render a bound form with default/initial values when the form is never rendered unbound.

If you do this in Django:

class MyForm(forms.Form):
    name = forms.CharField(required=False)

def view(request):
    form = MyForm(initial={'name': 'Peter'})
    return render(request, 'page.html', form=form)

# Imagine, in 'page.html' that it does this:
#  <label>Name:</label>
#  {{ form.name }}

...it will render out this:

<label>Name:</label>
<input type="text" name="name" value="Peter">

The whole initial trick is something you can set on the whole form or individual fields. But it's only used in UN-bound forms when rendered.

If you change your view function to this:

def view(request):
    form = MyForm(request.GET, initial={'name': 'Peter'}) # data passed!
    if form.is_valid():  # makes it bound!
        print(form.cleaned_data['name'])
    return render(request, 'page.html', form=form)

Now, the form is bound and the initial stuff is essentially ignored.
Because name is not present in request.GET. And if it was present, but an empty string, it wouldn't be able to benefit for the default value.

My solution

I tried many suggestions and tricks (based on rapid Stackoverflow searching) and nothing worked.

I knew one thing: Only the view should know the actual initial values.

Here's what works:

import copy


class MyForm(forms.Form):
    name = forms.CharField(required=False)

    def __init__(self, data, **kwargs):
        initial = kwargs.get('initial', {})
        data = {**initial, **data}
        super().__init__(data, **kwargs)

Now, suppose you don't have ?name=something in request.GET the line print(form.cleaned_data['name']) will print Peter and the rendered form will look like this:

<label>Name:</label>
<input type="text" name="name" value="Peter">

And, as expected, if you have ?name=Ashley in request.GET it will print Ashley and produce this rendered HTML too:

<label>Name:</label>
<input type="text" name="name" value="Ashley">

UPDATE June 2020

If data is a QueryDict object (e.g. <QueryDict: {'days': ['90']}>), and initial is a plain dict (e.g. {'days': 30}),
then you can merge these with {**data, **initial} because it produces a plain dict of value {'days': [90]} which Django's form stuff doesn't know is supposed to be "flattened".

The solution is to use:

from django.utils.datastructures import MultiValueDict

...

    def __init__(self, data, **kwargs):
        initial = kwargs.get("initial", {})
        data = MultiValueDict({**{k: [v] for k, v in initial.items()}, **data})
        super().__init__(data, **kwargs)

(To be honest; this might work in the app I'm currently working on but I don't feel confident that this is covering all cases)

A Python and Preact app deployed on Heroku

13 December 2019 2 comments   Web development, Django, Python, Docker, JavaScript


Heroku is great but it's sometimes painful when your app isn't just in one single language. What I have is a project where the backend is Python (Django) and the frontend is JavaScript (Preact). The folder structure looks like this:

/
  - README.md
  - manage.py
  - requirements.txt
  - my_django_app/
     - settings.py
     - asgi.py
     - api/
        - urls.py
        - views.py
  - frontend/
     - package.json
     - yarn.lock
     - preact.config.js
     - build/
        ...
     - src/
        ...

A bunch of things omitted for brevity but people familiar with Django and preact-cli/create-create-app should be familiar.
The point is that the root is a Python app and the front-end is exclusively inside a sub folder.

When you do local development, you start two servers:

The latter is what you open in your browser. That preact app will do things like:

const response = await fetch('/api/search');

and, in preact.config.js I have this:

export default (config, env, helpers) => {

  if (config.devServer) {
    config.devServer.proxy = [
      {
        path: "/api/**",
        target: "http://localhost:8000"
      }
    ];
  }

};

...which is hopefully self-explanatory. So, calls like GET http://localhost:3000/api/search actually goes to http://localhost:8000/api/search.

That's when doing development. The interesting thing is going into production.

Before we get into Heroku, let's first "merge" the two systems into one and the trick used is Whitenoise. Basically, Django's web server will be responsibly not only for things like /api/search but also static assets such as / --> frontend/build/index.html and /bundle.17ae4.js --> frontend/build/bundle.17ae4.js.

This is basically all you need in settings.py to make that happen:

MIDDLEWARE = [
    "django.middleware.security.SecurityMiddleware",
    "whitenoise.middleware.WhiteNoiseMiddleware",
    ...
]

WHITENOISE_INDEX_FILE = True

STATIC_URL = "/"
STATIC_ROOT = BASE_DIR / "frontend" / "build"

However, this isn't quite enough because the preact app uses preact-router which uses pushState() and other code-splitting magic so you might have a URL, that users see, like this: https://myapp.example.com/that/thing/special and there's nothing about that in any of the Django urls.py files. Nor is there any file called frontend/build/that/thing/special/index.html or something like that.
So for URLs like that, we have to take a gamble on the Django side and basically hope that the preact-router config knows how to deal with it. So, to make that happen with Whitenoise we need to write a custom middleware that looks like this:

from whitenoise.middleware import WhiteNoiseMiddleware


class CustomWhiteNoiseMiddleware(WhiteNoiseMiddleware):
    def process_request(self, request):
        if self.autorefresh:
            static_file = self.find_file(request.path_info)
        else:
            static_file = self.files.get(request.path_info)

            # These two lines is the magic.
            # Basically, the URL didn't lead to a file (e.g. `/manifest.json`)
            # it's either a API path or it's a custom browser path that only
            # makes sense within preact-router. If that's the case, we just don't
            # know but we'll give the client-side preact-router code the benefit
            # of the doubt and let it through.
            if not static_file and not request.path_info.startswith("/api"):
                static_file = self.files.get("/")

        if static_file is not None:
            return self.serve(static_file, request)

And in settings.py this change:

MIDDLEWARE = [
    "django.middleware.security.SecurityMiddleware",
-   "whitenoise.middleware.WhiteNoiseMiddleware",
+   "my_django_app.middleware.CustomWhiteNoiseMiddleware",
    ...
]

Now, all traffic goes through Django. Regular Django view functions, static assets, and everything else fall back to frontend/build/index.html.

Heroku

Heroku tries to make everything so simple for you. You basically, create the app (via the cli or the Heroku web app) and when you're ready you just do git push heroku master. However that won't be enough because there's more to this than Python.

Unfortunately, I didn't take notes of my hair-pulling excruciating journey of trying to add buildpacks and hacks and Procfile s and custom buildpacks. Nothing seemed to work. Perhaps the answer was somewhere in this issue: "Support running an app from a subdirectory" but I just couldn't figure it out. I still find buildpacks confusing when it's beyond Hello World. Also, I didn't want to run Node as a service, I just wanted it as part of the "build process".

Docker to the rescue

Finally I get a chance to try "Deploying with Docker" in Heroku which is a relatively new feature. And the only thing that scared me was that now I need to write a heroku.yml file which was confusing because all I had was a Dockerfile. We'll get back to that in a minute!

So here's how I made a Dockerfile that mixes Python and Node:

FROM node:12 as frontend

COPY . /app
WORKDIR /app
RUN cd frontend && yarn install && yarn build


FROM python:3.8-slim

WORKDIR /app

RUN groupadd --gid 10001 app && useradd -g app --uid 10001 --shell /usr/sbin/nologin app
RUN chown app:app /tmp

RUN apt-get update && \
    apt-get upgrade -y && \
    apt-get install -y --no-install-recommends \
    gcc apt-transport-https python-dev

# Gotta try moving this to poetry instead!
COPY ./requirements.txt /app/requirements.txt
RUN pip install --upgrade --no-cache-dir -r requirements.txt

COPY . /app
COPY --from=frontend /app/frontend/build /app/frontend/build

USER app

ENV PORT=8000
EXPOSE $PORT

CMD uvicorn gitbusy.asgi:application --host 0.0.0.0 --port $PORT

If you're not familiar with it, the critical trick is on the first line where it builds some Node with as frontend. That gives me a thing I can then copy from into the Python image with COPY --from=frontend /app/frontend/build /app/frontend/build.

Now, at the very end, it starts a uvicorn server with all the static .js, index.html, and favicon.ico etc. available to uvicorn which ultimately runs whitenoise.

To run and build:

docker build . -t my_app
docker run -t -i --rm --env-file .env -p 8000:8000 my_app

Now, opening http://localhost:8000/ is a production grade app that mixes Python (runtime) and JavaScript (static).

Heroku + Docker

Heroku says to create a heroku.yml file and that makes sense but what didn't make sense is why I would add cmd line in there when it's already in the Dockerfile. The solution is simple: omit it. Here's what my final heroku.yml file looks like:

build:
  docker:
    web: Dockerfile

Check in the heroku.yml file and git push heroku master and voila, it works!

To see a complete demo of all of this check out https://github.com/peterbe/gitbusy and https://gitbusy.herokuapp.com/

Update to speed comparison for Redis vs PostgreSQL storing blobs of JSON

30 September 2019 2 comments   Redis, Nginx, Web Performance, Python, Django, PostgreSQL


Last week, I blogged about "How much faster is Redis at storing a blob of JSON compared to PostgreSQL?". Judging from a lot of comments, people misinterpreted this. (By the way, Redis is persistent). It's no surprise that Redis is faster.

However, it's a fact that I have do have a lot of blobs stored and need to present them via the web API as fast as possible. It's rare that I want to do relational or batch operations on the data. But Redis isn't a slam dunk for simple retrieval because I don't know if I trust its integrity with the 3GB worth of data that I both don't want to lose and don't want to load all into RAM.

But is it entirely wrong to look at WHICH database to get the best speed?

Reviewing this corner of Song Search helped me rethink this. PostgreSQL is, in my view, a better database for storing stuff. Redis is faster for individual lookups. But you know what's even faster? Nginx

Nginx??

The way the application works is that a React web app is requesting the Amazon product data for the sake of presenting an appropriate affiliate link. This is done by the browser essentially doing:

const response = await fetch('https://songsear.ch/api/song/5246889/amazon');

Internally, in the app, what it does is that it looks this up, by ID, on the AmazonAffiliateLookup ORM model. Suppose it wasn't there in the PostgreSQL, it uses the Amazon Affiliate Product Details API, to look it up and when the results come in it stores a copy of this in PostgreSQL so we can re-use this URL without hitting rate limits on the Product Details API. Lastly, in a piece of Django view code, it carefully scrubs and repackages this result so that only the fields used by the React rendering code is shipped between the server and the browser. That "scrubbed" piece of data is actually much smaller. Partly because it limits the results to the first/best match and it deletes a bunch of things that are never needed such as ProductTypeName, Studio, TrackSequence etc. The proportion is roughly 23x. I.e. of the 3GB of JSON blobs stored in PostgreSQL only 130MB is ever transported from the server to the users.

Again, Nginx?

Nginx has a built in reverse HTTP proxy cache which is easy to set up but a bit hard to do purges on. The biggest flaw, in my view, is that it's hard to get a handle of how much RAM this it's eating up. Well, if the total possible amount of data within the server is 130MB, then that is something I'm perfectly comfortable to let Nginx handle cache in RAM.

Good HTTP performance benchmarking is hard to do but here's a teaser from my local laptop version of Nginx:

▶ hey -n 10000 -c 10 https://songsearch.local/api/song/1810960/affiliate/amazon-itunes

Summary:
  Total:    0.9882 secs
  Slowest:  0.0279 secs
  Fastest:  0.0001 secs
  Average:  0.0010 secs
  Requests/sec: 10119.8265


Response time histogram:
  0.000 [1] |
  0.003 [9752]  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.006 [108]   |
  0.008 [70]    |
  0.011 [32]    |
  0.014 [8] |
  0.017 [12]    |
  0.020 [11]    |
  0.022 [1] |
  0.025 [4] |
  0.028 [1] |


Latency distribution:
  10% in 0.0003 secs
  25% in 0.0006 secs
  50% in 0.0008 secs
  75% in 0.0010 secs
  90% in 0.0013 secs
  95% in 0.0016 secs
  99% in 0.0068 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0000 secs, 0.0001 secs, 0.0279 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0026 secs
  req write:    0.0000 secs, 0.0000 secs, 0.0011 secs
  resp wait:    0.0008 secs, 0.0001 secs, 0.0206 secs
  resp read:    0.0001 secs, 0.0000 secs, 0.0013 secs

Status code distribution:
  [200] 10000 responses

10,000 requests across 10 clients at rougly 10,000 requests per second. That includes doing all the HTTP parsing, WSGI stuff, forming of a SQL or Redis query, the deserialization, the Django JSON HTTP response serialization etc. The cache TTL is controlled by simply setting a Cache-Control HTTP header with something like max-age=86400.

Now, repeated fetches for this are cached at the Nginx level and it means it doesn't even matter how slow/fast the database is. As long as it's not taking seconds, with a long Cache-Control, Nginx can hold on to this in RAM for days or until the whole server is restarted (which is rare).

Conclusion

If you the total amount of data that can and will be cached is controlled, putting it in a HTTP reverse proxy cache is probably order of magnitude faster than messing with chosing which database to use.