Kung FuKung Fu

Fujian White Crane Kung Fu

ZopeZope

What I have and am doing with Zope

PhotosPhotos

Photoalbum, both old and new.

ReceptsamlingenReceptsamlingen

In Swedish only. About my "Collection of Recipes" website.

Contact meContact me

My contact details and how to contact me.

  Mobile version of this page Mobile version of this page

RSS

Hot topics

by FutureNerd: Casting in C (or C++ or Java) is that (int) x syntax. It sometimes means c...

Interesting float/int casting in Python

by Peter Bengtsson: Thanks. Corrected now....

Releasing IssueTrackerProduct 0.9

by daniel: Hi, the link is incorrect. it should be: http://www.issuetrackerproduct.com...

Releasing IssueTrackerProduct 0.9

by Wu: Nice work! Just a note, the link to the issuetrackerproduct website is bro...

Releasing IssueTrackerProduct 0.9

by mypalmike: Casting does mean converting. In both C and Python, casting from float to ...

Interesting float/int casting in Python

by Moe: Hi hows it going? im trying to open a note pad file using python, the comma...

Interesting float/int casting in Python

by raj: i seem to have a problem with readlines()as below, am i doing anything wron...

To readline() or readlines()

by Bryan Eastin: Hey, I just wanted to thank you for this page. It was really helpful. I w...

Unicode strings to ASCII ...nicely

by gfd: gb...

Unicode strings to ASCII ...nicely

by tom: cheers peter... you keep popping up in my results......

To readline() or readlines()

Old entries


February, 2008
hostip.info - Look up the location from an IP
CommandLineApp by Doug Hellmann
If Americans knew - An interesting insight into the Israeli Palestine conflict
Chinese New Year and the Persecution of Falun Gong in China
logrotating all my Zope event logs
Ocado gets customer service right
Why Django and Grok matters

January, 2008
The Official Dilbert Widget
"lost my phone :("
Ugliest e-commerce site of the month - Comfy-Feet
input/textarea switcher with jQuery
jQuery and Highslide JS
The Love Mattress
EditArea vs. CodePress

2007
2006
2005
2004
2003

 

You're viewing blogs from Python only.

View all different categories

15th of May

split_search() - A Python functional for advanced search applications

http://www.peterbe.com/plog...rch/split_search.py 

Inspired by Google's way of working I today put together a little script in Python for splitting a search. The idea is that you can search by entering certain keywords followed by a colon like this:

 Free Text name:Peter age: 28

And this will be converted into two parts:

 'Free Text'
 {'name': 'Peter', 'age':'28}

You can configure which keywords should be recognized and to make things simple, you can basically set this to be the columns you have to do advanced search on in your application. For example (from_date,to_date)

Feel free to download and use it as much as you like. You might not agree completely with it's purpose and design so you're allowed to change it as you please.

Here's how to use it:

 $ wget http://www.peterbe.com/plog/split_search/split_search.py
 $ python
 >>> from split_search import split_search
 >>> free_text, parameters = split_search('Foo key1:bar', ('key1',))
 >>> free_text
 'Foo'
 >>> parameters
 {'key1': 'bar'}

29th of April

Releasing IssueTrackerProduct 0.9

http://www.issuetrackerproduct.com/News/0.9.0 

Tonight I released an experimental version of the IssueTrackerProduct that is packed with new cool stuff. I call this an experimental release (but I run it on my production systems) because it's got so many new features.

During the course of preparing for this release and writing the news item I deployed the latest version to real.issuetrackerproduct.com and immediately noticed two bugs I to do with user names. So I immediately fixed those and prepared a new release minutes after. I expect to release another more stable version within a few weeks.

10th of March

See you at PyCon 2008

I'm going to Chicago on Wednesday for the PyCon 2008 conference. I'm going to stay at the Crowne Plaza (or whatever it was called) like many of the other people at the conference.

This is what I look like:

See you at PyCon 2008

If you see this mug, go up to it and say Hi. It speaks British, Swedish and some American and loves food, beer and tea which might be helpful to know if you would feel like to talk more to it. Its interests for this conference are: Grok, Zope, Django, Plone, buildout, automated testing, agile development and Javascript. Its main claim-to-fame is an Open Source bug/issue tracker program called IssueTrackerProduct which it is more than delighted to talk about.

I've never been to Chicago before and I'm really excited about Tuesday night as I've bought tickets to a Chicago Bulls NBA game (basketball). All other nights I'm hoping to socialise, get drunk, get full and get down and dirty nerdy all week. See you there!

21st of February

CommandLineApp by Doug Hellmann

http://www.doughellmann.com...cts/CommandLineApp/ 

I just read the feature article "Command line programs are classes, too!" by Doug Hellmann in the January 2008 issue of Python Magazine about his program CommandLineApp and I've tried it out on one of my old Python programs where I do the opt parsing manually with getopt. The results are beautiful and quick. It's sprinkled with Doug specific magic but I quickly got over that when I saw out easy it was to work with. There are still a few questions of things I didn't manage to work out but that will unfortunately have to wait.

If anything, the worst thing about this library is that it's not part of the standard library so either you have to tell people to sudo easy_install CommandLineApp in the instructions or include it yourself in your packages if you prefer to ship things with a kitchen sink included.

If you want to check it out in action, either subscribe to the magazine (and support the effort) or just download csvcat

22nd of December

String comparison function in Python (alpha)

I was working on a unittest which when it failed would say "this string != that string" and because some of these strings were very long (output of a HTML lib I wrote which spits out snippets of HTML code) it became hard to spot how they were different. So I decided to override the usual self.assertEqual(str1, str2) in Python's unittest class instance with this little baby:

 def assertEqualLongString(a, b):
    NOT, POINT = '-', '*'
    if a != b:
        print a
        o = ''
        for i, e in enumerate(a):
            try:
                if e != b[i]:
                    o += POINT
                else:
                    o += NOT
            except IndexError:
                o += '*'

        o += NOT * (len(a)-len(o))
        if len(b) > len(a):
            o += POINT* (len(b)-len(a))

        print o
        print b

        raise AssertionError, '(see string comparison above)'

It's far from perfect and doesn't really work when you've got Unicode characters that the terminal you use can't print properly. It might not look great on strings that are really really long but I'm sure that's something that can be solved too. After all, this is just a quick hack that helped me spot that the difference between one snippet and another was that one produced <br/> and the other produced <br />. Below are some examples of this utility function in action.


>Read the whole text (145 more words)

17th of December

Calculator in Python for dummies

I need a mini calculator in my web app so that people can enter basic mathematical expressions instead of having to work it out themselfs and then enter the result in the input box. I want them to be able to enter "3*2" or "110/3" without having to do the math first. I want this to work like a pocket calculator such that 110/3 returns a 36.6666666667 and not 36 like pure Python arithmetic would. Here's the solution which works but works like Python:

 def safe_eval(expr, symbols={}):
    return eval(expr, dict(__builtins__=None), symbols)

 def calc(expr):
    return safe_eval(expr, vars(math))

 assert calc('3*2')==6
 assert calc('12.12 + 3.75 - 10*0.5')==10.87
 assert calc('110/3')==36


>Read the whole text (361 more words)

13th of December

WSSE Authentication and Apache

I recently wrote a Grok application that implements a REST API for Atom Publishing so that I can connect a website I have via my new Nokia phone has LifeBlog which uses the Atom API to talk to the server.

Anyway, the authentication on Atom is WSSE (good introduction article) which basically works like this:

 PasswordDigest = Base64 \ (SHA1 (Nonce + CreationTimestamp + Password))

This is one of the pieces in a request header called Authorization which can look something like this:

 Authorization: WSSE profile="UsernameToken"
X-WSSE: UsernameToken Username="bob", PasswordDigest="quR/EWLAV4xLf9Zqyw4pDmfV9OY=", 
 Nonce="d36e316282959a9ed4c89851497a717f", Created="2003-12-15T14:43:07Z"

What I did was I wrote a simple Python script to mimic what the Nokia does but from a script. The script creates a password digest using these python modules: sha, binascii and base64 and then fires off a POST request. Here's thing, if you generate this header with base64.encodestring(ascii_string) you get something like this:

 quR/EWLAV4xLf9Zqyw4pDmfV9OY=\n

Notice the extra newline character at the end of the base64 encoded string. This is perfectly valid and is decoded easily with base64.decodestring(base64_string) by the Grok app. Everything was working fine when I tried posting to http://localhost:8080/++rest++atompub/snapatom and my application successfully authenticated the dummy user. I was happy.

Then I set this up properly on atom.someotherdomain.com which was managed by Apache who internally rewrote the URL to a Grok on localhost:8080. The problem now was that the Authentication header value was broken into two lines because of the newline character and then the whole request was rejected by Apache because some header values came without a : semi-colon.

The solution was to not use base64.encodestring() and base64.decodestring() but to instead use base64.urlsafe_b64encode() and base64.urlsafe_b64decode(). Let me show you:

 >>> import base64
 >>> x = 'Peter'
 >>> base64.encodestring(x)
 'UGV0ZXI=\n'
 >>> base64.urlsafe_b64encode(x)
 'UGV0ZXI='
 >>> base64.decodestring(base64.urlsafe_b64encode(x))
 'Peter'

If you're still reading, then hopefully you won't make the same mistake as I did and wasting time on trying to debug Apache. The lesson learned from this is to use the URL safe base64 header values and not the usual ones.

10th of December

geopy distance calculation pitfall

Geopy is a great little Python library for working with geocoding and distances using various online services such as Google's geocoder API.

Today I spent nearly half an hour trying to debug what was going on with my web application since I was getting this strange error:

 AttributeError: 'VincentyDistance' object has no attribute '_kilometers'


>Read the whole text (156 more words)

24th of September

Spellcorrector 0.2

Unlike previous incarnations of Spellcorrector not it does not by default load the two huge language files for English and Swedish. Alternatively/additionally you can load your own language file. The difference between loading a language file and training on your own words is that trained words are always assumed to be correct.

Another major change with this release is that a pickle file is created once the language file or own training file has been parsed once. This works like a cache, if the original text file changes, the pickle file is recreated. The outcome of this is that the first time you create a Spellcorrector instance it takes a few seconds if the language files is large but on the second time it takes virtually no time at all.


>Read the whole text (189 more words)

10th of August

html2plaintext Python script to convert HTML emails to plain text

From the doc string:

 A very spartan attempt of a script that converts HTML to
 plaintext.

 The original use for this little script was when I send HTML emails out I also
 wanted to send a plaintext version of the HTML email as multipart. Instead of 
 having two methods for generating the text I decided to focus on the HTML part
 first and foremost (considering that a large majority of people don't have a 
 problem with HTML emails) and make the fallback (plaintext) created on the fly.

 This little script takes a chunk of HTML and strips out everything except the
 <body> (or an elemeny ID) and inside that chunk it makes certain conversions 
 such as replacing all hyperlinks with footnotes where the URL is shown at the
 bottom of the text instead. <strong>words</strong> are converted to *words* 
 and it does a fair attempt of getting the linebreaks right.

 As a last resort, it strips away all other tags left that couldn't be gracefully
 replaced with a plaintext equivalent.
 Thanks for Fredrik Lundh's unescape() function things like:
    'Terms &amp; Conditions' is converted to
    'Termss & Conditions'

 It's far from perfect but a good start. It works for me for now.

Version at the time of writing this: 0.1.

I wouldn't be surprised if I've reinvented the wheel here but I did plenty of searches and couldn't really find anything like this.

Let's run this for a while until I stumble across some bugs or other inconsistencies which I haven't quite done yet. The one thing I'm really unhappy about is the way I extract the body from the BeautifulSoup parse object. I really couldn't find another better way in the few minutes I had to spare on this.

Feel free to comment on things you think are pressing bugs.

You can download the script here html2plaintext.py version 0.1

UPDATE

I should take a second look at Aaron Swartz's html2text.py script the next time I work on this. His script seems a lot more mature and Aaron is brilliant Python developer.

30th of April

I'm Prolog

Like many other Python fellow geeks on Planet Python I too took the Which Programming Language Are You? quiz. Apparently I'm Prolog.

You are Prolog. You enjoy looking for different ways to solve a problem.  You take longer to solve them, but usually come up with more than one solution.

I've never used Prolog and I barely know how it works or what it's syntax looks like. Well, I guess I'll just erase all my current projects and recode them in Prolog from now on. Unpractical but necessary.

18th of April

Spellcorrector

Spellcorrector being used on my not-yet-released web app I think a lot of Python people have seen Peter Novig's beautiful article about How to Write a Spelling Corrector. So have I and couldn't wait to write my own little version of it to fit my needs.

The changes I added were:

  • Python 2.4 compatible
  • Uses a pickleable dict instead of a collection
  • Compiled a huge list of Swedish words
  • Skipped edit distances 2 of words longer than 10 characters
  • Added a function suggestions()
  • All Unicode instead
  • A class instead of a function
  • Ability to train on your own words and to save that training persistently


>Read the whole text (220 more words)

1st of December

is is not the same as equal in Python

Don't do the silly misstake that I did today. I improved my code to better support unicode by replacing all plain strings with unicode strings. In there I had code that looked like this:

 if type_ is 'textarea':
    do something

This was changed to:

 if type_ is u'textarea':
    do something

And it no longer matched since type_ was a normal ascii string. The correct wat to do these things is like this:

 if type_ == u'textarea':
     do something
 elif type_ is None:
     do something else

Remember:

 >>> "peter" is u"peter"
 False
 >>> "peter" == u"peter"
 True
 >>> None is None
 True
 >>> None == None
 True

14th of August

Fastest way to uniqify a list in Python

Suppose you have a list in python that looks like this:

 ['a','b','a']
 # or like this:
 [1,2,2,2,3,4,5,6,6,6,6]

and you want to remove all duplicates so you get this result:

 ['a','b']
 # or
 [1,2,3,4,5,6]

How do you do that? ...the fastest way? I wrote a couple of alternative implementations and did a quick benchmark loop on the various implementations to find out which way was the fastest. (I haven't looked at memory usage). The slowest function was 78 times slower than the fastest function.


>Read the whole text (567 more words)

8th of August

Unicode strings to ASCII ...nicely

http://effbot.org/librarybook/unicodedata.htm 

This has been a problem for a long time for me. Whenever someone enters a title in my CMS the id of the document is derived from the title. Spaces are replaced with '- and &' is replaced with and etc. The final thing I wanted to do was to make sure the Id is ASCII encoded when it's saved. My original attempt looked like this:

 >>> title = u"Klüft skräms inför på fédéral électoral große"
 >>> print title.encode('ascii','ignore')
 Klft skrms infr p fdral lectoral groe

But as you can see, a lot of the characters are gone. I'd much rather that a word like "Klüft" is converted to "Kluft" which will be more human readable and still correct. My second attempt was to write a big table of unicode to ascii replacements.

It looked something like this:

 u'\xe4': u'a',
 u'\xc4': u'A',
 etc...


>Read the whole text (71 more words)

 

Older entriesOrder entries