This blog post is 15 years old! Most likely, its content is outdated. Especially if it's technical.
Don't do the silly misstake that I did today. I improved my code to better support unicode by replacing all plain strings with unicode strings. In there I had code that looked like this:
if type_ is 'textarea':
do something
This was changed to:
if type_ is u'textarea':
do something
And it no longer matched since type_
was a normal ascii string. The correct wat to do these things is like this:
if type_ == u'textarea':
do something
elif type_ is None:
do something else
Remember:
>>> "peter" is u"peter"
False
>>> "peter" == u"peter"
True
>>> None is None
True
>>> None == None
True
- Previous:
- Solar power in the north African desert 29 November 2006
- Next:
- CSS selector bug in IE? 05 December 2006
- Related by category:
- How much faster is Redis at storing a blob of JSON compared to PostgreSQL? 28 September 2019 Python
- Best practice with retries with requests 19 April 2017 Python
- Fastest way to find out if a file exists in S3 (with boto3) 16 June 2017
Python
- Interesting float/int casting in Python 25 April 2006 Python
- Fastest way to unzip a zip file in Python 31 January 2018 Python
- Related by keyword:
- Unicode strings to ASCII ...nicely 08 August 2006
- How to slice a rune in Go 16 March 2015
- Matrix ASCII animated! 15 November 2003
- Sending HTML emails in Zope 26 October 2006
- Valuble site: Commonly Confused Characters 28 December 2004
You should really only use 'is' to check for object identity, and for any kind of value comparison, == is the way to go. Also, as you yourself point out, the changes you made make no difference at all, unless there are non-ascii characters in the string you are comparing the variable to, so in the case of 'textarea', I would have just left it as is. ;)
That ("peter" is "peter") works at all is an implementation quirk that shouldn't be relied upon, even for 8-bit strings.
Hi Peter,
You're mixing up identity with equality...easily done :-)
While it's true that...
"peter" == "peter" and
"peter" is "peter"
you'll note that:
"peter" is "peter1"[:-1] is not true while
"peter" == "peter1"[:-1] is
Hope this sheds some light :-)
The interpreter is obviously using two references to the same string "peter" in the first case, but is creating a new string in the second example.
Kevin
yes what you are doing was dangerous EVEN with plain 8bit data; check this:
>>> a ="p"+"eter"
>>> b = "peter"
>>> a is b
False
>>> a
'peter'
>>> b
'peter'
>>>
"peter" == u"peter" will raise a UnicodeDecodeError if you compare "müsli" instead of "peter"
so you should use e.g.
"müsli" == u"müsli".encode("utf-8")
Why would you ever use Unicode strings in the internal types?
You only need Unicode for strings displayed to the user.
This is the reason for loving python:
never ever try to do things more complicated than they need to be!
"peter" in u"peter" and viceversa will do also.
I don´t know if this would be better (performance wise) than use "==". I do know "==" have a little extra overhead versus "is". But in the case of "in" would be good to see which one perform better.