Supervision trees

Sep 21, 2011 at 2:48 PM

I want to implement supervision trees similar in spirit to erlang. My goal is to do exception handling on workers' behalf, stopping and restarting them as necessary. It seems I need to do some sort of method interception on workers, but I could be wrong. How would you go about implementing supervision trees?

Coordinator
Sep 22, 2011 at 3:29 PM

Hi mitri,

I'm not really familiar with this principle (at least not by this name) - from what I saw in the erlang documentation, here are some things that came to my mind:

- Workers would probably be very similar to what they are in the appspace itself - they can handle a specific type of work item, so a general contract could look something like this:
class WorkerContract<T> : Port<T> where T : WorkItem
In the WorkItem class you could define some general properties about work items, e.g. that all of them have a response port and a progress port.

- Supervisors could either be appspace workers themselves, or just standalone applications that connect to workers (depends on how they should get the tasks that they need to supervise)
  * To monitor if a worker is still running, a work item could contain an "alive" port where the worker would give messages to the supervisor that it is still working on a cerain work item
  * To catch errors of worker, the supervisor can use a causality - I don't think you need any special interception mechanisms here - if the worker throws an exception, the supervisor can handle it over the exception port, and then just give the task to the next available worker (you could say that this is the "stop and restart" mechanism; you start the task by posting it to some worker, no need to manually stop a worker
  * Over the work item's response port, the supervisor can see when the worker finished the task

Best regards
Thomas