A blog and website by Peter Bengtsson
02 July 2020
2 comments
JavaScript
This saved my bacon today and I quite like it so I hope that others might benefit from this little tip.
So you have two "URLs" and you want to know if they are "equal". I write those words, in the last sentence, in quotation marks because they might not be fully formed URLs and what you consider equal might depend on the current business logic.
In my case, I wanted http://www.peterbe.com/path/to?a=b
to be considered equal to/path/to#anchor
. Because, in this case the both share the exact same pathname (/path/to
). So how to do it:
function equalUrls(url1, url2) {
return (
new URL(url1, "http://example.com").pathname ===
new URL(url2, "http://example.com").pathname
);
}
If you're doing TypeScript, switch the arguments to (url1: string, url2: string)
.
That "http://example.com"
is deliberate and not a placeholder. It's because:
>> new URL("/just/a/path", "http://example.com").pathname
"/just/a/path"
>> new URL("https://www.peterbe.com/a/path", "http://example.com").pathname
"/a/path"
In other words, if you do it like that the first argument to the URL
constructor can be with or without a full absolute URL.
Discussion
Be careful with junk. For example new URL(null, 'http://example.com').pathname
becomes /null
. So you might want to extend the logic to use "falsyness" like this:
return (
+ url1 && url2 &&
new URL(url1, "http://example.com").pathname ===
new URL(url2, "http://example.com").pathname
);
22 June 2020
0 comments
Node,
JavaScript
I need this function to relate to open-editor
which is a Node program that can open your $EDITOR
from Node and jump to a specific file, to a specific line, to a specific column.
Here's the code:
function* findMatchesInText(needle, haystack, { inQuotes = false } = {}) {
const escaped = needle.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
let rex;
if (inQuotes) {
rex = new RegExp(`['"](${escaped})['"]`, "g");
} else {
rex = new RegExp(`(${escaped})`, "g");
}
for (const match of haystack.matchAll(rex)) {
const left = haystack.slice(0, match.index);
const line = (left.match(/\n/g) || []).length + 1;
const lastIndexOf = left.lastIndexOf("\n") + 1;
const column = match.index - lastIndexOf + 1;
yield { line, column };
}
}
And you use it like this:
const text = ` bravo
Abra
cadabra
bravo
`;
console.log(Array.from(findMatchesInText("bra", text)));
Which prints:
[
{ line: 1, column: 2 },
{ line: 2, column: 2 },
{ line: 3, column: 5 },
{ line: 5, column: 1 }
]
The inQuotes
option is because a lot of times this function is going to be used for finding the href
value in unstructured documents that contain HTML <a>
tags.
15 June 2020
0 comments
Python
tl;dr hashin
0.15.0 makes package comparison agnostic to underscore or hyphens
See issue #116 for a fuller story. Basically, now it doesn't matter if you write...
hashin python_memcached
...or...
hashin python-memcached
And the same can be said about the contents of your requirements.txt
file. Suppose it already had something like this:
python_memcached==1.59 \
--hash=sha256:4dac64916871bd35502 \
--hash=sha256:a2e28637be13ee0bf1a8
and you type hashin python-memcached
it will do the version comparison on these independent of the underscore or hyphen.
Thank @caphrim007 who implemented this for the benefit of Renovate.
10 June 2020
0 comments
Python,
Linux,
Bash
tl;dr; Here's a useful bash script to avoid starting something when its already running as a ghost process.
Huey is a great little Python library for doing background tasks. It's like Celery but much lighter, faster, and easier to understand.
What cost me almost an hour of hair-tearing debugging today was that I didn't realize that a huey
daemon process had gotten stuck in the background with code that wasn't updating as I made changes to the tasks.py
file in my project. I just couldn't understand what was going on.
The way I start my project is with honcho which is a Python Foreman clone. The Procfile
looks something like this:
elasticsearch: cd /Users/peterbe/dev/PETERBECOM/elasticsearch-7.7.0 && ./bin/elasticsearch -q
web: ./bin/run.sh web
minimalcss: cd minimalcss && PORT=5000 yarn run start
huey: ./manage.py run_huey --flush-locks --huey-verbose
adminui: cd adminui && yarn start
pulse: cd pulse && yarn run dev
And you start that with simply typing:
When you Ctrl-C, it kills all those processes but somehow somewhere it doesn't always kill everything. Restarting the computer isn't a fun alternative.
So, to prevent my sanity from draining I wrote this script:
#!/usr/bin/env bash
set -eo pipefail
# This is used to make sure that before you start huey,
# there isn't already one running the background.
# It has happened that huey gets lingering stuck as a
# ghost and it's hard to notice it sitting there
# lurking and being weird.
bad() {
echo "Huey is already running!"
exit 1
}
good() {
echo "Huey is NOT already running"
exit 0
}
ps aux | rg huey | rg -v 'rg huey' | rg -v 'huey-isnt-running.sh' && bad || good
(If you're wondering what rg
is; it's short for ripgrep
)
And I change my Procfile
accordingly:
-huey: ./manage.py run_huey --flush-locks --huey-verbose
+huey: ./bin/huey-isnt-running.sh && ./manage.py run_huey --flush-locks --huey-verbose
There really isn't much rocket science or brain surgery about this blog post but I hope it inspires someone who's been in similar trenches that a simple bash script can make all the difference.
22 May 2020
0 comments
Python,
MDN
So recently, in MDN, we changed the setting WELCOME_EMAIL_FROM
. Seems harmless right? Wrong, it failed horribly in runtime and we didn't notice until it was in production. Here's the traceback:
SMTPSenderRefused: (552, b"5.1.7 The sender's address was syntactically invalid.\n5.1.7 see : http://support.socketlabs.com/kb/84 for more information.", '=?utf-8?q?Janet?=')
(8 additional frame(s) were not displayed)
...
File "newrelic/api/function_trace.py", line 151, in literal_wrapper
return wrapped(*args, **kwargs)
File "django/core/mail/message.py", line 291, in send
return self.get_connection(fail_silently).send_messages([self])
File "django/core/mail/backends/smtp.py", line 110, in send_messages
sent = self._send(message)
File "django/core/mail/backends/smtp.py", line 126, in _send
self.connection.sendmail(from_email, recipients, message.as_bytes(linesep='\r\n'))
File "python3.8/smtplib.py", line 871, in sendmail
raise SMTPSenderRefused(code, resp, from_addr)
SMTPSenderRefused: (552, b"5.1.7 The sender's address was syntactically invalid.\n5.1.7 see : http://support.socketlabs.com/kb/84 for more information.", '=?utf-8?q?Janet?=')
Yikes!
So, to prevent this from happening every again we're putting this check in:
from email.utils import parseaddr
WELCOME_EMAIL_FROM = config("WELCOME_EMAIL_FROM", ...)
# If this fails, SMTP will probably also fail.
assert parseaddr(WELCOME_EMAIL_FROM)[1].count('@') == 1, parseaddr(WELCOME_EMAIL_FROM)
You could go to town even more on this. Perhaps use the email validator within django
but for now I'd call that overkill. This is just a decent check before anything gets a chance to go wrong.
19 May 2020
0 comments
Node,
JavaScript
tl;dr; I wanted to see which is fastest, in Node, Highlight.js or Prism. The result is; they're both plenty fast but Prism is 9% faster.
The context is all the thousands of little snippets of CSS, HTML, and JavaScript code on MDN.
I first wrote a script that stored almost 9,000 snippets of code. 60% is Javascript and 22% is CSS and rest is HTML.
The mean snippet size was 400 bytes and the median 300 bytes. All ASCII.
Then I wrote three functions:
f1
- opens the snippet, extracts the payload, and saves it in a different place. This measures the baseline for how long the disk I/O read and the disk I/O write takes.
f2
- same as f1
but uses const html = Prism.highlight(payload, Prism.languages[name], name);
before saving.
f3
- same as f1
but uses const html = hljs.highlight(name, payload).value;
before saving.
The experiment
You can see the hacky benchmark code here: https://github.com/peterbe/syntax-highlight-node-benchmark/blob/master/index.js
Results
The results are (after running each 12 times each):
f1 0.947s fastest
f2 1.361s 43.6% slower
f3 1.494s 57.7% slower
Memory
In terms of memory usage, Prism
maxes heap memory at 60MB (the f1
baseline was 18MB), and Highlight.js
maxes heap memory at 60MB too.
Disk space in HTML
Each library produces different HTML. Examples:
Prism
<span class="token selector">.item::after</span> <span class="token punctuation">{</span>
<span class="token property">content</span><span class="token punctuation">:</span> <span class="token string">"This is my content."</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
Highlight.js
<span class="hljs-selector-class">.item</span><span class="hljs-selector-pseudo">::after</span> {
<span class="hljs-attribute">content</span>: <span class="hljs-string">"This is my content."</span>;
}
Yes, not only does it mean they look different, they use up a different amount of disk space when saved. That matters for web performance and also has an impact on build resources.
f1
- baseline "HTML" files amounts to 11.9MB (across 3,025 files)
f2
- Prism: 17.6MB
f3
- Highlight.js: 13.6MB
Conclusion
Prism is plenty fast for Node. If you're already using Prism, don't worry about having to switch to Highlight.js for added performance.
RAM memory consumption is about the same.
Final HTML from Prism
is 30% larger than Highlight.js
but when the rendered HTML is included in a full HTML page, the HTML compresses very well because of all the repetition so this is not a good comparison. Or rather, not a lot to worry about.
Well, speed is just one dimension. The features differ too. MDN already uses Prism
but does so in the browser. The ultimate context for this blog post is; the speed if we were to do all the syntax highlighting in the server as a build step.