17 December 2007 17 comments Python

I need a mini calculator in my web app so that people can enter basic mathematical expressions instead of having to work it out themselfs and then enter the result in the input box. I want them to be able to enter "3*2" or "110/3" without having to do the math first. I want this to work like a pocket calculator such that `110/3`

returns a `36.6666666667`

and not `36`

like pure Python arithmetic would. Here's the solution which works but works like Python:

```
def safe_eval(expr, symbols={}):
return eval(expr, dict(__builtins__=None), symbols)
def calc(expr):
return safe_eval(expr, vars(math))
assert calc('3*2')==6
assert calc('12.12 + 3.75 - 10*0.5')==10.87
assert calc('110/3')==36
```

But to make it work like non-Python-geek users would expect it I ended up with the following solution which also adds a few more bells and whistles:

```
import math
import re
integers_regex = re.compile(r'\b[\d\.]+\b')
def calc(expr, advanced=False):
def safe_eval(expr, symbols={}):
return eval(expr, dict(__builtins__=None), symbols)
def whole_number_to_float(match):
group = match.group()
if group.find('.') == -1:
return group + '.0'
return group
expr = expr.replace('^','**')
expr = integers_regex.sub(whole_number_to_float, expr)
if advanced:
return safe_eval(expr, vars(math))
else:
return safe_eval(expr)
def test():
print calc("147.43 - 40") # 107.43
print calc('110/3') # 36.6666666667
print calc('110/3.0') # 36.6666666667
print calc('(10-(3+5))^2') # 4.0
print calc('sys.exit(100)') # None
print calc('a+b') # None
print calc('(3+10))') # None
print calc('del expr') # None
print calc('cos(2*pi)') # None
print calc('pow(3,2)', advanced=True) # 9.0
print calc('cos(2*pi)', advanced=True) # 1.0
```

What this does is that it replaces whole numbers into floating point looking numbers before the expression is evaluated. It also replaces `**`

with `^`

as an alias because I think most non-Python people expect `10^2`

to be 100.

I haven't put this into production yet. I'm still playing around with it to get a feel for how it could work and what the implications might be. There is of course more work needed to wrap this with try-except statements so that dodgy attempts are captured correctly.

## Comments

If you use "from __future__ import division" then "1/2" returns 0.5.

Note that it's still possible to do evil things like cos.__class__.__bases[0].__subclasses__() and get access to other types in the system, or create a list comprehension which grabs a huge amount of memory.

How does that work? If I do that import won't it taint the rest of the module? (at the moment I have my function 'calc()' in a file called utils.py

About the dunder __, I'll just kick that out with a search for the string '__'

Yes, the __future__ affects all eval and exec statements inside that module, and only that module. But that change actually affect other code in the module? If so, move the eval code into its own module. That would isolate the problem.

You can't search for "__" because someone can use "_"+"_" or even "_" "_" because of the implicit string concatenation by the parser.

Hi,

the following will check the input and make it safe to use. Lets user use all functions in `math` module as well as `natural` expression.

import math

import re

whitelist = '|'.join(

# oprators, digits

['-', '\+', '/', '\\', '\*', '\^', '\*\*', '\(', '\)', '\d+']

# functions of math module (ex. __xxx__)

+ [f for f in dir(math) if f[:2] != '__'])

valid = lambda exp: re.match(whitelist, exp)

>>> valid('23**2')

<_sre.SRE_Match object at 0xb78ac218>

>>> valid('del exp') == None

True

Thanks! Every little helps.

Instead of checking to see if the string contains a valid expression, it might be better to see if it is a valid expression:

whitelist = '^('+'|'.join(

# oprators, digits

['-', r'\+', '/', r'\\', r'\*', r'\^', r'\*\*', r'\(', r'\)', '\d+']

# functions of math module (ex. __xxx__)

+ [f for f in dir(math) if f[:2] != '__']) + ')*$'

The little "r"s are just to make the strings work more correctly, the "^...$" forces it to check the whole string, and the "(...)*" matches an arbitrary string of allowable tokens. Now re.match(whitelist, expr)actually does what was expected above.

Cool! Thanks!

and of course, be carefull with calc('9999999**99999999'), an easy denial of service possibility

I could do a regex on something like '\d{6,99}' to filter out too big numbers.

What is 9**9**9**9**9**9**9**9?

It's very hard to make Python's eval safe. It's much easier to use something like PyParsing or PLY to parse the string yourself, and in doing that add the extra precautions you need, like checking for too large results before actually doing the computation.

If you can trust your users then don't worry about it.

I think arbitrary code execution is fairly easy with this system.

Another option would be to start another Python process in a chroot jail, and send expressions to that process and get the response back. You could place process limits on the executable to avoid some DoS problems.

I guess you did consider it but why don't you simply use Javascript on the client side? Maybe the app must run without JS too than of course this is no option but on the other hand this feature seems like an add-on only? Also not quite as many security issues as a server-side solution (knowing JS eval not being completely safe either but same issues on the server with much higher risk)

I'm only considering Javascript as a nice-if.

You can also use the compiler module to create an abstract syntax tree and then traverse it and do the computation yourself. That way you get complete control of what you allow in the computation. Should be safer then the other options as the user code isn't actually run by the Python interpreter.

A nice example of how to do that is at http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/364469.

just came across this article:

http://blog.dowski.com/2007/12/19/simpleparse-plug/

it might be helpful too. ;-)

import re, math

class calculator(object):

def __init__(self):

self.error = None

self.intRegex = re.compile(r'\b[\d\.]+\b')

def _return(self, value, error=None):

if error:

self.error = error

return value

def _safeEval(self, expr, symbols={}):

return eval(expr, dict(__builtins__=None), symbols)

def _toFloat(self, match):

group = match.group()

if group.find('.') == -1:

return group + '.0'

return group

def calc(self, expr, advanced=False):

self.error = None

expr = expr.replace('^','**')

expr = self.intRegex.sub(self._toFloat, expr)

try:

if advanced:

return self._return(self._safeEval(expr, vars(math)))

else:

return self._return(self._safeEval(expr))

except Exception, e:

return self._return(None, error=e)

def fancyCalc(self, expr, advanced=False):

result = self.calc(expr, advanced=advanced)

if not result:

return "Error [{1}]: `{0}`".format(self.error, self.error.__class__.__name__)

else:

return result

calc = calculator()

for equation in ["2+2","test"]:

print "Result for equation `{0}` is: {1}".format(equation, calc.fancyCalc(equation))

#That's my little addition :3 - thanks!!

I suggest you to add

expr = string.replace(expr,",",".")

in the beginnig to handle users, who use "," instead of ".", because it is common to write 3,14 instead od 3.14 in several European countries.