Comment

Julian Berman

Hi! jsonschema author here :)

One minor point that worries me here -- I'm curious as to why you had to "crack open the validate function" to find the validator API -- if you have suggestions on how to improve the documentation they'd be very welcome. That API is very much not internal, and I'd have thought that the docs at https://python-jsonschema.readthedocs.io/en/stable/validate/ would have led you right to it, so if you have a suggestion on what you'd have needed to see there I'd love to hear it.

And as a "philosophical" rule, `jsonschema` does not prioritize its performance on CPython. If someone notices slowness on CPython and sends a patch that doesn't slow things down elsewhere I've been happy to merge it, but I personally always prioritize performance on PyPy (and it's the only thing I look at or compare). So I'm keen to re-run these there and see what the results look like.

Also -- would you mind confirming what the license is of your benchmark? I'm considering adding it to `jsonschema`'s benchmark suite if you tell me it's something permissive :)

Replies

Peter Bengtsson

Hi,
The code on https://github.com/Julian/jsonschema (the README) only shows the `jsonschema.validate` function which forces the creation of a schema class instance every single time. There is no mention on the README about the trick of accessing the class, instantiating it once, and calling its `validate` function repeatedly.

Also, the docs on https://python-jsonschema.readthedocs.io/en/stable/validate/ demonstrate the same convenient function (that does the class instantiation on every single entry (even though the schema hasn't changed).

I think we could add a piece somewhere about the fact that "If you have multiple entries all with the same schema, consider this patterrn..."

Regarding license for the benchmark, you have my written consent right here right now to do whatever you want with it. It's not licensed so you don't even have to attribute.

Keep up the good work!

Julian Berman

Thanks (on both!)

Let me know if https://github.com/Julian/jsonschema/commit/2e082b58e44356a4acd7832f46cbf91423373380 seems like what would have helped.

Peter Bengtsson

It helps but I think it would still be a good idea to mention it in that first little code snippet in the README

Julian Berman

The README is a README, not really documentation -- to be honest I'd remove all the code from there entirely if it wasn't that the README is what's used for PyPI and is what you see when you load the repo, so it's *something* for someone to see. But beyond "show me what this library does in one sentence" I'd really expect someone to read the documentation.

But will think about it.

Peter Bengtsson

You're not wrong, it's just that reality is a like that. What code snippets ones seems in the README is usually all your eyes have time to scan.

Granted, if the project is your main at-work project and quality is super important then it might be a different story. So often, it's just one of many projects and the thing you're using a library for might not be a critical thing so you're looking for a quick fix and that's what the code snippets in the README are for.

If you think there are dangers with skimming a snippet like that I would remove it replace it with a link into the "meat of the documentation".