Dynamic Scalability and Failover

Aug 5, 2010 at 2:41 PM
Hi, I was wondering how to tackle 2 scenarios: 1) Dynamic scalability: With this I mean, the ability for a sysadmin (or a proces which monitors activity) to add certain components to the system to dynamically scale up and down the capacity when the system needs this. I saw the 'simpleloadbalancer' example on one of the wiki pages. Would this be the preferred way of doing something like this? 2) Automatic failover: When one of the machines (or components) fails, it should be so that another component on another machine takes over the workload. This way we can have highly available systems. (Also, it gives a way of updating components without ever having downtime). Is this functionality already present or is this also a layer which needs to be made on top of your framework? If so, do you have suggestions on what would be a good way to create this? thanks
Coordinator
Aug 6, 2010 at 11:48 AM

Both scenarios require an indirection, I´d say. A client does not send messages to the server, but to an intermediary.

1) Load balancing: This is easy if available resources (servers) query a central message store for work. Clients put their work into some queue and resources read from that queue whenever it suites them. This way you can add/remove resources at your will. If the central message store is empty, the intermediary can notify idle resources upon arrival of new work packets.

Compared to for example SQS the AppSpace has the benefit of being able to notify resources. (Or the hosting application of the intermediary AppSpace.)

2) Failover can be solved the same way. If one of the resources crashes, others will naturally take over. Not actively, but passively by processing messages left in the central store which usually would have been processed by the failing resource.

Aug 6, 2010 at 12:56 PM
I was thinking that load balancing could both be done at the sender and through an intermediary. The intermediary solution would give great benefits like you say (smart routing, failover,...), but would also potentially be a bottleneck (with high load). Sending to different components directly would remove the bottleneck, but adds knowledge of the topology at the client which also would not be very nice. Maybe something were a client component queries a central routing server for the current available components could be a nice middle ground. Your discovery example would lend itself for this I guess where based on a name it could give back a list of available workers (instead of 1 as it is now). The only single point of failure would then be the discovery server, but I guess creating a failover for this would also not be too hard.
Coordinator
Aug 9, 2010 at 6:53 AM

How does querying a "topology server" differ from sending messages via a load balancing server? Since topology can change any time (especially when server crashes need to be handled) the "topologoy server" needs to be queried for every message to be sent to servers.

In addition: load balancing at the client is not efficient because the many clients simply don´t know about the total load of the servers. How should they decide to which server to send messages to? This could only be done in a round robin fashion - which is the most primitive variant of load balancing.