Peterbe.com

A blog and website by Peter Bengtsson

Test if two URLs are "equal" in JavaScript

02 July 2020 0 comments   JavaScript


This saved my bacon today and I quite like it so I hope that others might benefit from this little tip.

So you have two "URLs" and you want to know if they are "equal". I write those words, in the last sentence, in quotation marks because they might not be fully formed URLs and what you consider equal might depend on the current business logic.

In my case, I wanted http://www.peterbe.com/path/to?a=b to be considered equal to/path/to#anchor. Because, in this case the both share the exact same pathname (/path/to ). So how to do it:

function equalUrls(url1, url2) {
  return (
    new URL(url1, "http://example.com").pathname ===
    new URL(url2, "http://example.com").pathname
  );
}

If you're doing TypeScript, switch the arguments to (url1: string, url2: string).

That "http://example.com" is deliberate and not a placeholder. It's because:

>> new URL("/just/a/path", "http://example.com").pathname
"/just/a/path"
>> new URL("https://www.peterbe.com/a/path", "http://example.com").pathname
"/a/path"

In other words, if you do it like that the first argument to the URL constructor can be with or without a full absolute URL.

Discussion

Be careful with junk. For example new URL(null, 'http://example.com').pathname becomes /null. So you might want to extend the logic to use "falsyness" like this:

  return (
+   url1 && url2 &&
    new URL(url1, "http://example.com").pathname ===
    new URL(url2, "http://example.com").pathname
  );

findMatchesInText - Find line and column of matches in a text, in JavaScript

22 June 2020 0 comments   Node, JavaScript

https://gist.github.com/peterbe/a210437cd282ef0963749520f6b09a1c


I need this function to relate to open-editor which is a Node program that can open your $EDITOR from Node and jump to a specific file, to a specific line, to a specific column.

Here's the code:

function* findMatchesInText(needle, haystack, { inQuotes = false } = {}) {
  const escaped = needle.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
  let rex;
  if (inQuotes) {
    rex = new RegExp(`['"](${escaped})['"]`, "g");
  } else {
    rex = new RegExp(`(${escaped})`, "g");
  }
  for (const match of haystack.matchAll(rex)) {
    const left = haystack.slice(0, match.index);
    const line = (left.match(/\n/g) || []).length + 1;
    const lastIndexOf = left.lastIndexOf("\n") + 1;
    const column = match.index - lastIndexOf + 1;
    yield { line, column };
  }
}

And you use it like this:

const text = ` bravo
Abra
cadabra

bravo
`;

console.log(Array.from(findMatchesInText("bra", text)));

Which prints:

[
  { line: 1, column: 2 },
  { line: 2, column: 2 },
  { line: 3, column: 5 },
  { line: 5, column: 1 }
]

The inQuotes option is because a lot of times this function is going to be used for finding the href value in unstructured documents that contain HTML <a> tags.

hashin 0.15.0 now copes nicely with under_scores

15 June 2020 0 comments   Python

https://github.com/peterbe/hashin/pull/119


tl;dr hashin 0.15.0 makes package comparison agnostic to underscore or hyphens

See issue #116 for a fuller story. Basically, now it doesn't matter if you write...

hashin python_memcached

...or...

hashin python-memcached

And the same can be said about the contents of your requirements.txt file. Suppose it already had something like this:

python_memcached==1.59 \
    --hash=sha256:4dac64916871bd35502 \
    --hash=sha256:a2e28637be13ee0bf1a8

and you type hashin python-memcached it will do the version comparison on these independent of the underscore or hyphen.

Thank @caphrim007 who implemented this for the benefit of Renovate.

./bin/huey-isnt-running.sh - A bash script to prevent lurking ghosts

10 June 2020 0 comments   Python, Linux, Bash


tl;dr; Here's a useful bash script to avoid starting something when its already running as a ghost process.

Huey is a great little Python library for doing background tasks. It's like Celery but much lighter, faster, and easier to understand.

What cost me almost an hour of hair-tearing debugging today was that I didn't realize that a huey daemon process had gotten stuck in the background with code that wasn't updating as I made changes to the tasks.py file in my project. I just couldn't understand what was going on.

The way I start my project is with honcho which is a Python Foreman clone. The Procfile looks something like this:

elasticsearch: cd /Users/peterbe/dev/PETERBECOM/elasticsearch-7.7.0 && ./bin/elasticsearch -q
web: ./bin/run.sh web
minimalcss: cd minimalcss && PORT=5000 yarn run start
huey: ./manage.py run_huey --flush-locks --huey-verbose
adminui: cd adminui && yarn start
pulse: cd pulse && yarn run dev

And you start that with simply typing:

honcho start

When you Ctrl-C, it kills all those processes but somehow somewhere it doesn't always kill everything. Restarting the computer isn't a fun alternative.

So, to prevent my sanity from draining I wrote this script:

#!/usr/bin/env bash
set -eo pipefail

# This is used to make sure that before you start huey, 
# there isn't already one running the background.
# It has happened that huey gets lingering stuck as a 
# ghost and it's hard to notice it sitting there 
# lurking and being weird.

bad() {
    echo "Huey is already running!"
    exit 1
}

good() {
    echo "Huey is NOT already running"
    exit 0
}

ps aux | rg huey | rg -v 'rg huey' | rg -v 'huey-isnt-running.sh' && bad || good

(If you're wondering what rg is; it's short for ripgrep)

And I change my Procfile accordingly:

-huey: ./manage.py run_huey --flush-locks --huey-verbose
+huey: ./bin/huey-isnt-running.sh && ./manage.py run_huey --flush-locks --huey-verbose

There really isn't much rocket science or brain surgery about this blog post but I hope it inspires someone who's been in similar trenches that a simple bash script can make all the difference.

Check your email addresses in Python, as a whole

22 May 2020 0 comments   Python, MDN


So recently, in MDN, we changed the setting WELCOME_EMAIL_FROM. Seems harmless right? Wrong, it failed horribly in runtime and we didn't notice until it was in production. Here's the traceback:

SMTPSenderRefused: (552, b"5.1.7 The sender's address was syntactically invalid.\n5.1.7 see : http://support.socketlabs.com/kb/84 for more information.", '=?utf-8?q?Janet?=')
(8 additional frame(s) were not displayed)
...
  File "newrelic/api/function_trace.py", line 151, in literal_wrapper
    return wrapped(*args, **kwargs)
  File "django/core/mail/message.py", line 291, in send
    return self.get_connection(fail_silently).send_messages([self])
  File "django/core/mail/backends/smtp.py", line 110, in send_messages
    sent = self._send(message)
  File "django/core/mail/backends/smtp.py", line 126, in _send
    self.connection.sendmail(from_email, recipients, message.as_bytes(linesep='\r\n'))
  File "python3.8/smtplib.py", line 871, in sendmail
    raise SMTPSenderRefused(code, resp, from_addr)

SMTPSenderRefused: (552, b"5.1.7 The sender's address was syntactically invalid.\n5.1.7 see : http://support.socketlabs.com/kb/84 for more information.", '=?utf-8?q?Janet?=')

Yikes!

So, to prevent this from happening every again we're putting this check in:

from email.utils import parseaddr

WELCOME_EMAIL_FROM = config("WELCOME_EMAIL_FROM", ...)

# If this fails, SMTP will probably also fail.
assert parseaddr(WELCOME_EMAIL_FROM)[1].count('@') == 1, parseaddr(WELCOME_EMAIL_FROM)

You could go to town even more on this. Perhaps use the email validator within django but for now I'd call that overkill. This is just a decent check before anything gets a chance to go wrong.

Benchmark compare Highlight.js vs. Prism

19 May 2020 0 comments   Node, JavaScript

https://github.com/peterbe/syntax-highlight-node-benchmark


tl;dr; I wanted to see which is fastest, in Node, Highlight.js or Prism. The result is; they're both plenty fast but Prism is 9% faster.

The context is all the thousands of little snippets of CSS, HTML, and JavaScript code on MDN.
I first wrote a script that stored almost 9,000 snippets of code. 60% is Javascript and 22% is CSS and rest is HTML.
The mean snippet size was 400 bytes and the median 300 bytes. All ASCII.

Then I wrote three functions:

  1. f1 - opens the snippet, extracts the payload, and saves it in a different place. This measures the baseline for how long the disk I/O read and the disk I/O write takes.
  2. f2 - same as f1 but uses const html = Prism.highlight(payload, Prism.languages[name], name); before saving.
  3. f3 - same as f1 but uses const html = hljs.highlight(name, payload).value; before saving.

The experiment

You can see the hacky benchmark code here: https://github.com/peterbe/syntax-highlight-node-benchmark/blob/master/index.js

Results

The results are (after running each 12 times each):

f1 0.947s   fastest
f2 1.361s   43.6% slower
f3 1.494s   57.7% slower

Memory

In terms of memory usage, Prism maxes heap memory at 60MB (the f1 baseline was 18MB), and Highlight.js maxes heap memory at 60MB too.

Disk space in HTML

Each library produces different HTML. Examples:

Prism

<span class="token selector">.item::after</span> <span class="token punctuation">{</span>
    <span class="token property">content</span><span class="token punctuation">:</span> <span class="token string">"This is my content."</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

Highlight.js

<span class="hljs-selector-class">.item</span><span class="hljs-selector-pseudo">::after</span> {
    <span class="hljs-attribute">content</span>: <span class="hljs-string">"This is my content."</span>;
}

Yes, not only does it mean they look different, they use up a different amount of disk space when saved. That matters for web performance and also has an impact on build resources.

Conclusion

Prism is plenty fast for Node. If you're already using Prism, don't worry about having to switch to Highlight.js for added performance.

RAM memory consumption is about the same.

Final HTML from Prism is 30% larger than Highlight.js but when the rendered HTML is included in a full HTML page, the HTML compresses very well because of all the repetition so this is not a good comparison. Or rather, not a lot to worry about.

Well, speed is just one dimension. The features differ too. MDN already uses Prism but does so in the browser. The ultimate context for this blog post is; the speed if we were to do all the syntax highlighting in the server as a build step.