Wednesday, October 7, 2009

From now on mainly blogging on Atbrox.com's blog

Atbrox is my new startup company (I left Google 1 month ago after 4 exciting years).

Sunday, March 8, 2009

Snakes on a Cloud

.. or Mobile Agents with Python

What are mobile agents?
"A mobile agent is a process that can transport its state from one environment to another, with its data intact, and be capable of performing appropriately in the new environment", source: Wikipedia.

Why Mobile Agents .. or how to deal with Data Gravity
The phrase "with its data" most likely refers to the agent's state data (and not all types of data), since one of the nicest properties of mobile agents is that they can move to where potentially huge amounts of data is. By plotting the curve "data gravity" - i.e. a rough estimate of how long time it takes to empty/fill/process a hard drive over a network connection - hard disk size divided by (typical) ethernet network speed over the last 20 years - the motivation for moving code to data (and not vice versa) is clearly increasing, making mobile agents a potentially interesting approach.

Mobile Agent Runtime Environment
A basic requirement for a mobile agent runtime environment is the ability to receive and run the agent's code, e.g. typically support one/several of the following (with Python-related examples):
i) receive and run binary code
python example: receive python compiled to binary with Shedskin and g++
ii) receive and compile source code and run it
python example: receive c source code and compile/integrate it with Python using Cinpy, or use Shedskin/g++ on received Python code
iii) receive and run interpreter on source code
python example: receive python code and interpret using the eval() method.

Bandwidth
If the mobile agents move around a bit, you probably want their representation as compact as possible to reduce bandwidth requirements, i.e. prefer agents represented in (small amounts of) source code - alternative ii) or iii) - over (larger amounts of) binary code - alternative i).

Processing
compiled code (or just-in-time compiled code) is usually more efficient than interpreted code (interpreted code can perhaps be seen as analog to the "gas guzzling cars" of computing wrt resource utilization, fortunately there are tools to deal with that), so alternative ii) is probably preferred over iii), and with the cinpy case compilation overhead is negligible (a few milliseconds to compile and make a short C method ready to be called from Python), which matter if you have a large amount of distributed mobile agents. Pareto principle also matters for mobile agents, so a mix (80-99% of code) in interpreted Python and the things that really need to perform in C (quickly compiled with cinpy) might be a common mix.

Example of Cinpy-wrapped C function in python
fibc=cinpy.defc(
"fib",
ctypes.CFUNCTYPE(ctypes.c_int,ctypes.c_int),
"""
int fib(int x) {
if (x<=1) return 1; return fib(x-1)+fib(x-2); } """)


Security
In case mobile agents move around on the cloud it is nice to know that the agent you receive is from a known source (yourself), this can e.g. be done using ezPyCrypto's signString() to sign the agent source and then use verifyString() methods on the signed agent source code together with its signature to check the origin of the agent (assuming the signer's public key is available on the receiving end).

disclaimer: this posting (and all others on this blog) only represents my personal views.

Saturday, December 27, 2008

Ajax with Python - Combining PyJS and Appengine with json-rpc

I recently (re)discovered pyjs  - also called Pyjamas - which  is a tool to support development of (client-side) Ajax applications with Python, it does that by compiling Python code to Javascript (Pyjamas is inspired by GWT  - which supports writing Ajax applications in Java).

pyjs' kitchensink comes with a JSON-RPC example, this posting shows how to use Appengine to serve as a JSON-RPC server for the pyjs json-rpc example.

(Rough) Steps:
1) download appengine SDK and create an appengine application (in the dashboard)
2) download pyjs 
3) Replace pyjs example/jsonrpc/JSONRPCExample.py with this code

from ui import RootPanel, TextArea, Label, Button, HTML, VerticalPanel, HorizontalPanel, ListBox


from JSONService import JSONProxy


class JSONRPCExample:
    def onModuleLoad(self):
        self.TEXT_WAITING = "Waiting for response..."
        self.TEXT_ERROR = "Server Error"
        self.remote_py = UpperServicePython()
        self.status=Label()
        self.text_area = TextArea()
        self.text_area.setText(r"Please uppercase this string")
        self.text_area.setCharacterWidth(80)
        self.text_area.setVisibleLines(8)
        self.button_py = Button("Send to Python Service", self)
        buttons = HorizontalPanel()
        buttons.add(self.button_py)
        buttons.setSpacing(8)
        info = r'This example demonstrates the calling of appengine upper(case) method with JSON-RPC from javascript (i.e. Python code compiled with pyjs to javascript).'
        panel = VerticalPanel()
        panel.add(HTML(info))
        panel.add(self.text_area)
        panel.add(buttons)
        panel.add(self.status)
        RootPanel().add(panel)


    def onClick(self, sender):
        self.status.setText(self.TEXT_WAITING)
        text = self.text_area.getText()
        if self.remote_py.upper(self.text_area.getText(), self) < 0:
            self.status.setText(self.TEXT_ERROR)


    def onRemoteResponse(self, response, request_info):
        self.status.setText(response)


    def onRemoteError(self, code, message, request_info):
        self.status.setText("Server Error or Invalid Response: ERROR " + code + " - " + message)


class UpperServicePython(JSONProxy):
    def __init__(self):
        JSONProxy.__init__(self, "/json", ["upper"])



4) Use the following code for the appengine app
from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
import logging
from django.utils import simplejson

class JSONHandler(webapp.RequestHandler):
  def json_upper(self,args):
    return [args[0].upper()]

  def post(self):
    args = simplejson.loads(self.request.body)
    json_func = getattr(self, 'json_%s' % args[u"method"])
    json_params = args[u"params"]
    json_method_id = args[u"id"]
    result = json_func(json_params)
    # reuse args to send result back
    args.pop(u"method")
    args["result"] = result[0]
    args["error"] = None # IMPORTANT!!
    self.response.headers['Content-Type'] = 'application/json'
    self.response.set_status(200)
    self.response.out.write(simplejson.dumps(args))

application = webapp.WSGIApplication(
                                     [('/json', JSONHandler)],
                                     debug=True)

def main():
  run_wsgi_app(application)

if __name__ == "__main__":
  main()
5) compile pyjs code in 3) and create static dir in appengine app to store compiled code (i.e in app.yaml)
6) use dev_appserver.py to test locally or appcfg.py to deploy on appengine

Facebook application
By using this recipe it was easy to create a facebook app of the example in this posting - check it out at http://apps.facebook.com/testulf/


Alternative backend - using webpy


This shows how to use webpy as backend, just put the javascript/html resulting from pyjs compile into the static/ directory. Very similar as the previous approach, with the code in blue being the main diff (i.e. webpy specific)
import web
import json
urls = (
'/json', 'jsonhandler'
)
app = web.application(urls, globals())

class jsonhandler:
  def json_upper(self,args):
    return [args[0].upper()]

  def json_markdown(self,args):
    return [args[0].lower()]

  def POST(self):
    args = json.loads(web.data())
    json_func = getattr(self, 'json_%s' % args[u"method"])
    json_params = args[u"params"]
    json_method_id = args[u"id"]
    result = json_func(json_params)
    # reuse args to send result back
    args.pop(u"method")
    args["result"] = result[0]
    args["error"] = None # IMPORTANT!!
    web.header("Content-Type","text/html; charset=utf-8")
    return json.dumps(args)

if __name__ == "__main__":
  app.run()

Wednesday, December 17, 2008

Simulating Mamma Mia under the xmas tree (with Python)

Norway has a population of ~4.7 million, and "Mamma Mia!" has during a few weeks been sold in ~600 thousand DVD/Blueray copies (> 12% of the population, i.e. breaking every sales record, perhaps with the exception of Sissel Kyrkjebø's Christmas Carol album which has sold a total of ~900 thousand copies in the same population, but that was over a period of 21 years).  In the UK it has sold more than 5 million (in a population of 59 million, i.e. >8% of the population).


Well, to the point. Guesstimating that there will be ~2 million xmas trees in Norway, one can assume that many of the trees will have (much) more than one "Mamma Mia!" dvd/blueray underneath it*,  the question is how many?

Simulating Mamma Mia under the xmas tree
.. Before you start recapping probability theory, combinatorics, birthday paradox, multiplication of improbabilities and whatnot, how about finding an alternative solution using simulation

A simple simulation model could be to assume that for every Mamma Mia dvd/blueray copy there is a lottery where all trees participate,  and then finally (or incrementally) count how many copies each tree won.

A friend wrote a nice Python script to simulate this:

import random

NUM_XMAS_TREES = 2000000
NUM_MAMMA_MIAS = 600000

tree_supplies = {}
for mamma_mia in range(NUM_MAMMA_MIAS):
    winner_tree = random.randint(0, NUM_XMAS_TREES)
    tree_supplies[winner_tree] = tree_supplies.get(winner_tree,0) + 1

tree_stats = {}
for tree in tree_supplies:
    tree_stats[tree_supplies[tree]] = tree_stats.get(tree_supplies[tree], 0) + 1

print tree_stats

Results:

$ for k in `seq 1 10`; do echo -n "$k " ; python mammamia.py; done
1 {1: 443564, 2: 67181, 3: 6618, 4: 510, 5: 36}
2 {1: 444497, 2: 66811, 3: 6543, 4: 520, 5: 32, 6: 2}
3 {1: 444796, 2: 66376, 3: 6738, 4: 510, 5: 36, 6: 3}
4 {1: 444499, 2: 66750, 3: 6652, 4: 469, 5: 30, 6: 2, 7: 1}
5 {1: 444347, 2: 66717, 3: 6697, 4: 494, 5: 28, 6: 2}
6 {1: 444511, 2: 66389, 3: 6763, 4: 551, 5: 40, 6: 3}
7 {1: 443914, 2: 66755, 3: 6785, 4: 511, 5: 33, 6: 2}
8 {1: 444747, 2: 66558, 3: 6667, 4: 484, 5: 40}
9 {1: 444553, 2: 66703, 3: 6631, 4: 497, 5: 32}
10 {1: 443903, 2: 66853, 3: 6774, 4: 487, 5: 23, 6: 1}

Conclusion

So we see that in run 4 there was one xmas tree that according to the simulation model got 7(!) Mamma Mia DVD/Bluerays underneath it, but the overall simulation shows that 5 or 6 (at most) is probably more likely (assuming the model is right).

Regarding the simulation model, it is probably way too simplistic, i.e. not taking into account people buying mamma mia for themselves (or not as xmas gifts), a likely skewness in terms of number of gifts per christmas tree, interaction between buyers, etc. But it can with relatively simple manners be extended with code to make it more realistic. Check out Markov Chain Monte Carlo simulation for more info on how to create more realistic simulation models.

Wednesday, December 3, 2008

cinpy - or C in Python

cinpy is a tool where you can write C code in your Python code (with the help of ctypes - included in modern Python versions). When you execute your python program the C code is compiled on the fly using Tiny C Compiler (TCC).

In this posting I will describe:
1) installation and testing cinpy
2) a simple benchmark (c-in-py vs python)
3) compare performance with gcc (c-in-py vs gcc)
4) measure cinpy (on-the-fly) compilation time
5) how to dynamically change cinpy methods

1. How to install and try cinpy (note: also found in cinpy/tcc README files)
1) download, uncompress and compile TCC
./configure
make
make install
gcc -shared -Wl,-soname,libtcc.so -o libtcc.so libtcc.o

2) download, uncompress and try cinpy
cp ../tcc*/*.so .
python cinpy_test.py  # you may have to comment out or install psyco 

2. Sample performance results (on a x86 linux box):

python cinpy_test.py
Calculating fib(30)...
fibc : 1346269 time: 0.03495 s
fibpy: 1346269 time: 2.27871 s
Calculating for(1000000)...
forc : 1000000 time: 0.00342 s
forpy: 1000000 time: 0.32119 s

Using cinpy for fibc (Fibonacci) method was ~65 times faster than fibpy, and and cinpy for forc (loop) was ~93 times faster than forpy, not bad.

3. How does cinpy (compiled with tcc) compare to gcc performance?
Copying the C fib() method and calling it with main program
$ time fibgcc


fib(30) = 1346269

real    0m0.016s
user    0m0.020s
sys     0m0.000s


GCC gives roughly twice as fast code as cinpy/tcc (0.034/0.016). 




4. How long time does it take for tcc to on-the-fly compile cinpy methods?


#!/usr/bin/env python
# cinpy_compile_performance.py
import ctypes
import cinpy
import time
t=time.time
t0 = t()
fibc=cinpy.defc(
     "fib",
     ctypes.CFUNCTYPE(ctypes.c_int,ctypes.c_int),
     """
     int fib(int x) {
       if (x<=1) return 1;
       return fib(x-1)+fib(x-2);
     }
     """)
t1 = t()
print "Calculating fib(30)..."
sc,rv_fibc,ec=t(),fibc(30),t()
print "fibc :",rv_fibc,"time: %6.5f s" % (ec-sc)
print "compilation time = %6.5f s" % (t1-t0)

python cinpy_compile_performance.py

Calculating fib(30)...
fibc: 1346269 time: 0.03346 s
compilation time: 0.00333 s


So compilation (and linking) time is about 3.3ms, which is reasonably good, not a lot of overhead! (note: TCC is benchmarked to compile, assemble and link 67MB of code in 2.27s)


5. "Hot-swap" replacement of cinpy code?
Let us assume you have a system with a "pareto" situation, i.e. 80-99% of the code doesn't need be fast (written in Python), but 1-20% need to be really high performance (and written in C using cinpy), and that you need to frequently change the high performing code, can that be done? Examples of such a system could be mobile (software) agents.


Sure, all you need to do is wrap your cinpy definition as a string and run exec on it, like this:

origmethod="""
fibc=cinpy.defc("fib",ctypes.CFUNCTYPE(ctypes.c_int,ctypes.c_int),
          '''
          int fib(int x) {
              if (x<=1) return 1;
              return fib(x-1)+fib(x-2);
          }
          ''')
"""
# add an offset to the Fibonacci method
alternatemethod = origmethod.replace("+", "+998+")
print alternatemethod
# alternatemethod has replaces origmethod with exec(alternatemethod)
print fibc(2) # = 1000
Conclusion:
cinpy ain't bad.

Remark: depending on the problem you are solving (e.g. if it is primarily network IO bound and not CPU bound) , becoming 1-2 orders of magnitude faster (cinpy vs pure python) is probably fast enough (CPU wise), the doubling from GCC may not matter (since network IO wise Python performs quite alright, e.g. with Twisted ). 

Saturday, November 29, 2008

Tools for Accelerating Python

If you need to speed up your Python program there are several possible approaches, with varying degree of effort needed, here is (probably not complete) overview:

  1. Rewrite your Python code by parallelizing or optimizing/replacing/tuning algorithm(s), e.g. using: 
  2. Use a tool that can speed up your code more or less unchanged
    • Psyco
      • Just in time compiler, note: this is probably the easiest approach to try
    • Pyrex
      • Write and compile Python with a flavor of C data structures
    • Cython 
    • PyJs 
      • Compile (large subset of) Python to Javascript, note: probably more interesting for client development (ajax) than server side.
    • Rpython
      • Compile (large subset of) Python to native code, note: part of PyPy project
    • PyStream
    • GPULib
    • Shedskin 
  3. Replace (parts of) your Python code with another language

Saturday, October 18, 2008

Example of Microsimulation with Stackless Python


Stackless Python is roughly Python with support for microthreads (called tasklets). Tasklets communicate with each other by sending messages through channels , i.e. similar to Erlang, but of course with Python syntax (guess which I prefer :-).  Stackless has a sibling called Greenlet Python , which is a library that can be used with traditional Python to get decent mictrothread support.

Scale
Tasklets have low overhead (compared to traditional threads) so you can have hundreds of thousands of them simultaneously, and when it gets overly crowded and busy on one machine, you can easily scale up by pickling some tasklets and send them to another machine (better known as mobile agents in computer science). Since tasklets also live Python's GIL regime, it might be idea to use the processing API to spawn several processes to better utilize multicore machines (and have many tasklets within each process), or use some of these libraries .

API Alignment Challenge
In Python 2.6 the process and thread APIs are aligned, but a missing piece would be to align those apis with stackless/tasklet apis. From my perspective a natural order would be process > thread > tasklet, and perhaps adding your favorite piece of the cloud would make sense for the API as well?

Example - simulation with 100k robots
The robots are playing a variant of musical chairs on a rectangular arena, it differs from regular musical chairs since if a chair has been sat on once, it can't be sat on again. Finding out who won is left as an exercise (hint: it can be found more than 1 place)

from random import randint
import stackless
import sys
import time

class Arena:
    def __init__(self, xdim, ydim):
        self.arena = [[None]*ydim for x in range(xdim)]
        (self.xdim, self.ydim) = (xdim, ydim)

    def find_unused_place_for_robot(self, robotname):
        (x, y) = randint(0,self.xdim-1), randint(0,self.ydim-1)
        if not self.arena[x][y]:
            self.arena[x][y] = robotname
        return self.arena[x][y] == robotname

class Robot:
    def __init__(self, name=0, arena=None, maxrounds=0):
        self.name = name
        self.arena = arena
        self.points = 0
        self.maxrounds = maxrounds

        # bind Robot's live method to it's tasklet
        self.tasklet = stackless.tasklet(self.live)()

    def live(self):
        rounds = 0
        while rounds < self.maxrounds:
            self.play()
            rounds += 1
        self.tasklet.kill()

    def play(self):
        # entire play method is atomically executed
        atomic = self.tasklet.set_atomic(True)
        if self.arena.find_unused_place_for_robot(self.name):
            self.points += 1
        self.tasklet.set_atomic(atomic)

class RobotArenaSimulator:
    def __init__(self, num_robots, xdim, ydim, maxrounds):
        self.maxrounds = maxrounds
        self.arena = Arena(xdim, ydim)
        self.robots = [Robot(id, self.arena, maxrounds)
                       for id in range(1,num_robots+1)]
    def run_preemptive(self, nopreempt_steps=1):
        tstart = time.clock()
        while stackless.getruncount() != 1:
            # run() takes out the tasklet after nopreempt_steps
            t = stackless.run(nopreempt_steps)
            # so need to add the tasklet to scheduler again
            if t: t.insert()
        return (time.clock()-tstart)

if __name__ == "__main__":
    tstart = time.clock()
    simulator = RobotArenaSimulator(num_robots = 100000,
                                    xdim = 10000, ydim = 10000,
                                    maxrounds = 10)
    simulationtime = simulator.run_preemptive()
    print "simulation time was %.2f seconds" % (simulationtime)
    print "total running time was %.2f seconds" % (time.clock()-tstart)