I needed to write this little function because I need to add some parameters to a URL that I was going to open with urllib2. The benefit with this script is that it can combine a any URL with some structured parameters. The URL could potentially already contain a query string (aka CGI parameters). Here's how to use it if it was placed in a file called 'urlfixer.py':


>>> from urlfixer import parametrize_url
>>> parametrize_url('https://www.peterbe.com?some=thing',
                    any='one', tv="b b c")
'https://www.peterbe.com?some=thing&tv=b+b+c&any=one'
>>> 

The function needed some extra attention (read hack) if the starting url was of the form http://foo.com?bar=xxx which is non-standard. The standard way would be http://foo.com/?bar=xxx. You can download urlfixer.py or read it here:


from urlparse import urlparse, urlunparse
from urllib import urlencode

def parametrize_url(url, **params):
   """ don't just add the **params because the url
   itself might contain CGI variables embedded inside
   the string. """
   url_parsed = list(urlparse(url))

   encoded = urlencode(params)
   qs = url_parsed[4]
   if encoded:
       if qs:
           qs += '&'+encoded
       else:
           qs = encoded
   netloc = url_parsed[1]
   if netloc.find('?')>-1:
       url_parsed[1] = url_parsed[1][:netloc.find('?')]
       if qs:
           qs = netloc[netloc.find('?')+1:]+'&'+qs
       else:
           qs = netloc[netloc.find('?')+1:]

   url_parsed[4] = qs

   url = urlunparse(url_parsed)
   return url

Comments

chuy

look i have a problem i need to create a unique parameter like a id for a url and that parameter get to a form man i dont know how to do it so i need your help

Britney

Hello, nice site look this:

Anonymous

As '?' cannot be in url_parsed.netloc, 'netloc.find('?') > -1' is always false, so that block is useless.

Using '.find()' is discouraged, the Pythonic idiom is 'if "?" in netloc'.

Anonymous

I guess the hack was necessary exactly becuse the non-standard 'http://foo.com?bar=xx' form. As of Python 2.5 this is parsed correctly:

>>> u = urlparse('http://myfoo.com?a')
>>> u.netloc
'myfoo.com'
>>> u.query
'a'

Your email will never ever be published.

Related posts