Embedding a Low Performance Scripting Language in Python -
i have web-application. part of this, need users of app able write (or copy , paste) simple scripts run against data.
the scripts can simple, , performance minor issue. , example of sophistication of script mean like:
ratio = 1.2345678 minimum = 10 def convert(money) return money * ratio end if price < minimum cost = convert(minimum) else cost = convert(price) end
where price , cost global variables (something can feed environment , access after computation).
i do, however, need guarantee stuff.
any scripts run cannot access environment of python. cannot import stuff, call methods don't explicitly expose them, read or write files, spawn threads, etc. need total lockdown.
i need able put hard-limit on number of 'cycles' script runs for. cycles general term here. vm instructions if language byte-compiled. apply-calls eval/apply loop. or iterations through central processing loop runs script. details aren't important ability stop running after short time , send email owner , "your scripts seems doing more adding few numbers - sort them out."
it must run on vanilla unpatched cpython.
so far i've been writing own dsl task. can that. wondered if build on shoulders of giants. there mini-language available python this?
there plenty of hacky lisp-variants (even 1 wrote on github), i'd prefer more non-specialist syntax (more c or pascal, say), , i'm considering alternative coding 1 myself i'd bit more mature.
any ideas?
here take on problem. requiring user scripts run inside vanilla cpython means either need write interpreter mini language, or compile python bytecode (or use python source language) , "sanitize" bytecode before executing it.
i've gone quick example based on assumption users can write scripts in python, , source , bytecode can sufficiently sanitized through combination of filtering unsafe syntax parse tree and/or removing unsafe opcodes bytecode.
the second part of solution requires user script bytecode periodically interrupted watchdog task ensure user script not exceed opcode limit, , of run on vanilla cpython.
summary of attempt, focuses on 2nd part of problem.
- user scripts written in python.
- use byteplay filter , modify bytecode.
- instrument user's bytecode insert opcode counter , calls function context switches watchdog task.
- use greenlet execute user's bytecode, yields switching between user's script , watchdog coroutine.
- the watchdog enforces preset limit on number of opcodes can executed before raising error.
hopefully @ least goes in right direction. i'm interested hear more solution when arrive @ it.
source code lowperf.py
:
# std import ast import dis import sys pprint import pprint # vendor import byteplay import greenlet # bytecode snippet increment our global opcode counter increment = [ (byteplay.load_global, '__op_counter'), (byteplay.load_const, 1), (byteplay.inplace_add, none), (byteplay.store_global, '__op_counter') ] # bytecode snippet perform yield our watchdog tasklet. yield = [ (byteplay.load_global, '__yield'), (byteplay.load_global, '__op_counter'), (byteplay.call_function, 1), (byteplay.pop_top, none) ] def instrument(orig): """ instrument bytecode. place call our yield function before jumps , returns. choose alternate places depending on use case. """ line_count = 0 res = [] op, arg in orig.code: line_count += 1 # note: put advanced bytecode filter here. # whenever code block loaded must instrument if op == byteplay.load_const , isinstance(arg, byteplay.code): code = instrument(arg) res.append((op, code)) continue # 'setlineno' opcode safe place increment our global # opcode counter. if op == byteplay.setlineno: res += increment line_count += 1 # append opcode , argument res.append((op, arg)) # if we're @ jump or return, or we've processed 10 lines of # source code, insert call our yield function. # choose other places yield more appropriate app. if op in (byteplay.jump_absolute, byteplay.return_value) \ or line_count > 10: res += yield line_count = 0 # finally, build , return new code object return byteplay.code(res, orig.freevars, orig.args, orig.varargs, orig.varkwargs, orig.newlocals, orig.name, orig.filename, orig.firstlineno, orig.docstring) def transform(path): """ transform python source form safe execute , return bytecode. """ # note: call ast.parse(data, path) here # abstract syntax tree, filter tree down before compiling # bytecode. i've skipped step pretty verbose. data = open(path, 'rb').read() suite = compile(data, path, 'exec') orig = byteplay.code.from_code(suite) return instrument(orig) def execute(path, limit = 40): """ transforms user's source code bytecode, instrumenting it, kicks off watchdog , user script tasklets. """ code = transform(path) target = greenlet.greenlet(run_task) def watcher_task(op_count): """ task yielded user script, making sure doesn't use many resources. """ while 1: if op_count > limit: raise runtimeerror("script used many resources") op_count = target.switch() watcher = greenlet.greenlet(watcher_task) target.switch(code, watcher.switch) def run_task(code, yield_func): "this greenlet task runs our user's script." globals_ = {'__yield': yield_func, '__op_counter': 0} eval(code.to_code(), globals_, globals_) execute(sys.argv[1])
here sample user script user.py
:
def otherfunc(b): return b * 7 def myfunc(a): in range(0, 20): print i, otherfunc(i + + 3) myfunc(2)
here sample run:
% python lowperf.py user.py 0 35 1 42 2 49 3 56 4 63 5 70 6 77 7 84 8 91 9 98 10 105 11 112 traceback (most recent call last): file "lowperf.py", line 114, in <module> execute(sys.argv[1]) file "lowperf.py", line 105, in execute target.switch(code, watcher.switch) file "lowperf.py", line 101, in watcher_task raise runtimeerror("script used many resources") runtimeerror: script used many resources
Comments
Post a Comment