Why is it important to escape & in href attributes in tags?

11 November 2014   13 comments   Web development

Mind That Age!

This blog post is 4 years old! Most likely, its content is outdated. Especially if it's technical.

Here’s an example of unescaped & characters in a A HREF tag attribute.
http://jsfiddle.net/32zbogfw/ It’s working fine.

I know it might break XML and possibly XHTML but who uses that still?

Red. So what?
And I know an unescaped & in a href shows as red in the View Source color highlighting.

What can go wrong? Why is it important? Perhaps it used to be in 2009 but no longer the case.

This all started because I was reviewing some that uses python urllib.urlencode(...) and inserts the results into a Django template with href="{{ result_of_that_urlencode }}" which would mean you get un-escaped & characters and then I tried to find how and why that is bad but couldn't find any examples of it.

Comments

Dan

If I make a blog post that has the following url: http://www.example.com/checkoutmyguitar&they'reawesome/

I NEED to escape the & or else the & will get processed as an &.

Invalid entities will get ignored, which is what you're seeing. It's the edge cases that are the concern. I think.

Peter Bengtsson

But in that case, the & is in the pathname part of the URL. E.g. http://jsfiddle.net/c5b5L4w1/
So not a problem.

Wladimir Palant

The issue is that browsers will close whatever they consider incomplete entity references automatically. I don't know the specific algorithm but href="?foo=1&quot" still causes Firefox to add a quotation mark to the end of the URL - and that's what you get instead of a parameter named "quot". Now this doesn't happen for parameters that actually have a value but I wouldn't be so sure about browsers other than Firefox.

Boris Zbarsky

It really depends on what you have in that attribute.

If you have href="?something&=whatever" you run into a problem if you don't escape the '&'.

If you have href="?something&amp whatever" you also run into a problem.

Or if you have href="?something&amp,something" for that matter.

So if you know for a fact that the thing after your maybe-entity-name is an equals char, you're probably OK. Otherwise, likely not.

Peter Bengtsson

So as long as I always bundle the key and value with a = in between I'm safe.

Boris Zbarsky

Not if the unquoted thing is in the value. "?something=&amp&" behaves identically to "?something=&&".

Not to mention the fact that, of course, the unquoted '&' will terminated the key-value pair.

Simon

I don't think I've ever head of such a thing... escaping ampersands in tag attributes. I mean, I see what you mean about view-source highlighting them as invalid, but I've never written them that way (unless using an XML-based generation tool), nor seen any framework (JSF, etc) that ever renders them that way...

Peter Bengtsson

For me it's the opposite. I've always been über careful turning & into & in attributes' values. This is because we used to be so strict when XHTML was all the rage.

Now I stopped to think; is it still important at all.

Neil Rashbrook

There was a time when Gecko used to allow HTML entities without the trailing semicolon. (I don't know what the current parsing rules are here.) That meant that if you had a form parameter named e.g. "macroname" then tried to use it in a hardcoded link e.g. "update.php?action=delete&macro=test" the &macr would get interpreted as a ¯ character.

Peter Bengtsson

"There was a time".

I'm guessing that goes way back. Even before people switched from HTML4 to XHTML doctypes.

Havvy

https://html.spec.whatwg.org/multipage/syntax.html#consume-a-character-reference

If the character reference is being consumed as part of an attribute, and the last character matched is not a U+003B SEMICOLON character (;), and the next character is either a U+003D EQUALS SIGN character (=) or an alphanumeric ASCII character, then, for historical reasons, all the characters that were matched after the U+0026 AMPERSAND character (&) must be unconsumed, and nothing is returned. However, if this next character is in fact a U+003D EQUALS SIGN character (=), then this is a parse error, because some legacy user agents will misinterpret the markup in those cases.

--

Basically, your link throws a parse error only because of the equals sign that follows. I did some more testing ( http://jsfiddle.net/8b1h3bqw/ ) and noted that Firefox seems to ignore the rules about ampersands in attributes showing a warning even in the valid case. But then, that's also only in view source, which for some reason, I cannot access via the developer tools. Chrome doesn't report any parse errors anywhere as far as I can.

Anyways, it's important because of those legacy user agents only, and then, only if your parameter has the same name as an HTML entity character reference. In all other cases, there's no problem, probably.

Peter Bengtsson

Awesome. Just like others have mentioned in comments here; this means that as long as you follow with a = you're fine.

As your example points out; the really big risk is the example of `href="&amp"` where you might hope that the server is going to pick that up as a {'amp': ''} or something. It won't. Instead you'd get nothing from the query string.
It would if it was `href="?ampsomething"` then you could get {'ampsomething': ''}
(NB: different servers accept or simply reject CGI params without a =)

Giorgio Maone

You should not URL-encode URLs before inserting them into a href attribute: actually, if you URL-encode them they'll likely break.

But you must HTML-escape them, which is what & turned into & is about. Django templates may be configured to do it automatically anyway, see https://docs.djangoproject.com/en/dev/ref/templates/builtins/

If you don't HTML-escape URLs and other variables before merging them in your HTML (especially if they ultimately come from user input) you risk to make your website vulnerable to cross-site scripting (XSS).

P.S.: why in the hell does this blog require JavaScript to be enabled, for extra 3rd party sources too, in order to protect your comment form against CSRF? :(

Your email will never ever be published


Related posts

Previous:
God, No! by Penn Jillette 09 November 2014
Next:
A "perma search" in AngularJS 18 November 2014
Related by Keyword:
My tricks for using AsyncHTTPClient in Tornado 13 October 2010
Related by Text:
jQuery and Highslide JS 08 January 2008
I'm back! Peterbe.com has been renewed 05 June 2005
Anti-McCain propaganda videos 12 August 2008
Ever wondered how much $87 Billion is? 04 November 2003
Guake, not Yakuake or Yeahconsole 23 January 2010