Developing distributed component based software

Sep 26, 2012 at 12:09 AM
Edited Sep 26, 2012 at 5:54 AM

Dear Xcoordination Application Space team,

We came across your framework and found it to be extremely interesting and powerful. We are working on a software framework for laboratory automation systems and we are trying to see if we can use your framework to solve some of the problems we are facing in our framework. I am sorry if my post is very long.

Our framework is based on components that could be distributed over the network and communicate with each other by passing messages. I think the problem that we are trying to solve is a bit different from yours. We are mainly focused on how best to design individual  components in a thread-safe, deadlock free and responsive manner. We also want the interactions between components to appear the same irrespective of whether components communicate in the same processes or across different process spaces or even across different computers but however we are not really worried about how the communication happens internally as long as it is efficient.

Our components are CCR components and they are designed like the design pattern that you would typically use for DSS services. Every component has a state data that defines the "condition" the component is in along with the data that explains its internal state, and a PortSet that defines the set of operations the component supports. A component has a messages queue  where messages arrive and also has a dispatcher that has the threads to process these messages. Every component should implement a state machine that defines how the component should behave when it receives the set of messages it supports. The state machine is run by a state machine engine that executes the state machine logic in a thread-safe, deadlock free manner and also brings in determinism and completeness.

Every message that arrives at the messages queue is processed exclusively by the state machine and the component designer can decide whether you want to execute the message synchronously (blocking other messages from being processed usually done for messages that take very less time to process) or you can have the component go to a doing state where he processes the message asynchronously. This aspect of the design makes the components responsive and provides the ability to respond to messages when it is doing another operation. A doing state operation can change state only by passing messages to the messages queue thereby ensuring thread safety.

As per our framework, every component has the concept of a basic state data that every components should have, a basic set of messages every component must process and a basic state machine every component must follow. This commonality is captured by a base component that we have developed and the base component also has the state machine design logic that the components can use to build their state machine. The base component also has the logic for the state machine engine that the components must use to process messages. We want to be able to design components in such a way that the state data and operations could be extended by components that derive from them and this is highly critical for our design. Also, we want the components to be completely isolated from one another and every message that is exchanged must be cloned and sent. 

We are trying to model our components as DSS services. Although we are able to derive from the  DSS service service operations, we find it very difficult to derive from a DSS service state as custom serializing of message requests does not seem to be very easy (as DSS uses binary serialization). Also, we are not able to add some useful functions in the class that we are trying to serialize as they wont appear in the proxy because they are not [DataMember]. Also, there are lots of restrictions of what types could be serialized. Also, every DSS message can have only a single response but we want our messages to have two responses (one for Acknowledgement and another for completion). The cloning of message requests are handled by DSS and that is one of the aspects of DSS that we really like. Although, DSS satisfies some of our requirements, it does not seem to handle the scenario that we trying to use it.

I looked at your framework and found it very interesting. In particular, a message having multiple response port, a serialization mechanism that seems to be very simple to use. Can you please let me know whether your framework is designed to handle the scenario that we are trying to implement? Currently, we have designed our components as CCR components which are wrapped around DSS services. We make use of DSS as a transport mechanism and transfer all data as a binary serialized byte array and make use of reflection to post the message of the right type at the other end.

I once again apologize for my lengthy post. Please get back to me for any issues/clarifications.

Thanks,

Venkat

Oct 3, 2012 at 3:05 PM

Hello Venkat,

If I understand you correctly, you are looking for a way to distribute the CCR over multiple processes, and you are currently using DSS to do that. From your desription I would say yes, the appspace seems to be very suitable for what your are doing. When we started developing the appspace, we also reviewed the DSS, but found it too heavy-weight and hard to use. Many of the points you are having problems with now should hopefully be easier with the appspace:

  • You can define multiple response ports
  • If needed, you can define your own serialization mechanisms. By default, binary serialization is used (BinarySerializer class, not the DataContractSerializer from WCF), alternatively you can use JSON serialization which automatically serializes all public properties. From your description I don't know if you would really need to define your own serialization mechanism, because the appspace doesn't work with "proxies" - instead you define messages in a separate contract assembly and reference this assembly on the client and server side. This way no functions of the class itself will get lost on either side.
  • In contrast to DSS, where the port of a service is defined as a property, the appspace works with inheritance, meaning the worker itself is a port or portset. In our opinion this makes working with it more natural (more like the relation between service contract and implementation). You mentioned that you are also working with inheritance, so this may suit your needs better. E.g. if you have a worker with Messages types A, B and C, you could define the processing method for type A in the base class, and the methods for B and C in a sub class. What you can NOT do on the other hand, is extending your PortSet in a sub class, so you cannot derive a worker that supports additional message types. If you really need something like that, I see no other way than defining a Port<object> and casting the received messages. You could take a look at our concept of "worker extensions", which could let you implement and reuse such a casting logic for different workers.
  • The exclusive/concurrent message processing logic is also there, with the XcoConcurrent and XcoExclusive attributes.
  • For every worker you can of course define a state (though it's not a must).

For some points it is not exactly clear to me if there could be problems:

  • In contrast to your framework, not every worker has its own dispatcher and queue, but there is a single Dispatcher per appspace instance (normally = per process). But you if needed, you can overwrite which dispatcher queue a worker uses, so this shouldn't be a problem.
  • I'm not exactly sure what you mean when you say that every message needs to be cloned. Do you mean this also needs to be done when the message is processed locally? If yes, may I know what's the reason behind it? (Remotely it's no problem of course because the message automatically looses any relations, but locally the appspace doesn't do any kind of cloning. Except if you have your own appspace instance for every worker, so that remote communication would also be used to communicate on the same machine.)

Hope I could help your! If you have any further questions, feel free to ask.

Best regards
Thomas

Oct 3, 2012 at 8:44 PM
Edited Oct 3, 2012 at 9:11 PM

Hi Thomas,

Thank you so much for your time and response. We also felt that DSS was not designed for our kind of use and that is the reason why we are seeing if there are other alternatives.

1) Your understanding of our requirement is correct (that we want to distribute CCR over multiple processes) but with a small addition. Our level of separation is not processes but components. For us a component is an autonomous entity and there can be multiple components in the same process space. The only way to communicate to our component is by passing messages. Every message passed to the component has a body, an acknowledgement port and a completion port. The caller is notified that the message has been received at the ack port the result of the argument and other validation, and once the processing of the message is done, the caller is notified at the completion port the result of the processing and also the result (if any). So, when posting the message to the component, we want the message body to be cloned and posted. Also, we want the messages posted to the ACK and operation complete ports to also be cloned so that the components do not have any data dependency on one another. I hope that explains why we are cloning messages even when the components are in the same process space. DSS solves the cloning problem because every DSS service is hosted in a different app-domain but however DSS messages have only one response port. We are happy that appspace can have multiple response ports for a message.

2) As you said, we are not very intent in introducing new serializing mechanisms as long as one of the existing ones meets our performance and design requirements  and I believe more than one will.

3) I am familiar with the XcoConcurrent and XcoExclusive attributes and they are like the ServiceHandlerBehavior property of the DSS ServiceHandler attribute.

4) We use a dispatcher queue and dispatcher for every component because we would like the components not to share (memory, threads and other resources) with other components and want the components to be completely independent of one another.

5) From what you say about the appspace serialization and workers, and also from what I read from the documentation, I understand the following. Please correct me if I am wrong. There is a separate contract assembly that contains all the messages supported by the worker and also the contract (basically a PortSet) that the worker supports. You then have classes that implement these contracts (the implementations can be distributed across base classes and derived classes). The caller gets bound to one concrete implementation of the contract. All the messages that are sent by the caller are serialized using a binary serializer and transferred as byte arrays. 

Some of my questions may be because of the fact that I have not yet played around with appspace and I apologize for that.

Thanks,

Venkat

Oct 4, 2012 at 8:35 AM

Hello Venkat,

Yes it is correct what you understand about the serialization of workers and messages. The worker contract and messages are defined in a separate assembly, which is referenced by both client (caller) and server. The server provides an implementation for the worker, which doesn't need to be known by the client. So the client doesn't know which implementation the server uses, it just communicates by using the defined contract.

Concerning the cloning of messages, as I said with the appspace the easiest way to solve this would probably be to used an appspace instance per worker, and use remote communication even on the same machine. You will need to check if the performance is sufficient, but basically the appspace has good remote messaging performance (we measured up to 10000 messages per second for simple messages, but it strongly depends on the size of the message - there is an example in the usage demos for making performance tests).

Best regards
Thomas

Oct 4, 2012 at 9:25 AM
Edited Oct 4, 2012 at 11:09 AM

Hi Thomas,

Thank you once again for your time and reply. Regarding having a single dispatcher and queue per appspace instance, I think DSS also follows a similar design. They have a a single dispatcher and queue per DSS node and all the DSS services share it (although we can change the setting and make the dispatcher and the dispatcher queue created for individual services). From the CCR documentation and the MSRDS book by Dr. Trevor and Dr. Kyle Johns, I find that the dispatcher and queues are light weight and we can have a lot of them. From your experience of developing this framework, can you please tell me in your opinion whether dispatcher and queues are really light weight?

Regarding cloning of messages, as you said, I need to check whether the performance is sufficient for whatever we are doing. Do you think having the remote communication as a WCF named pipe for workers in the same process space fit the scenario and also give the best performance?

Also, we also need to look at whether having our components as appspace workers would fit in our design. One of the main things we need to look at is how we are going to support creation of new components by extending messages supported by existing components.

Thanks,

Venkat

Oct 4, 2012 at 1:26 PM

Hi Venkat,

That is an interesting question. In my opinion yes, the dispatcher and queue are light weight structures, so having multiple of them shouldn't be a problem. From our experience, there are situations where it is absolutely necessary to have multiple dispatchers, for example when you need to have a worker that is always responsive, regardless if other workers are currently blocking all threads of the default dispatcher. The appspace also uses a second internal dispatcher to send and receive messages for that reason.

Of course you have to keep in mind that the dispatcher is designed so that you have one processing thread per core for maximum efficiency. So if you create multiple dispatchers you somewhat work against this mechanism. A possibility in this case would of course be to lower the number of threads per dispatcher. So, I think you just need to find a good balance between number of dispatchers and threads per dispatcher. Perhaps you don't even need multiple dispatchers if you concentrate on having no long running tasks and preventing that a dispatcher thread goes to sleep (using CCR iterative tasks proved to be very good for that).

Concerning the CCR in general, I think that at some point we will need to switch to the newer mechanisms that are now provided directly in the .Net framework (TPL + DataFlow), because these are the ones that will be further optimized in the future. But until we know how we can replace all the cool CCR features we will surely be staying with the CCR for another while :-)

Concering WCF named pipes, from our experience they aren't really faster than our embedded tcp communication service. So it probably doesn't matter which of the two you use - but best try it out yourself.

Best regards
Thomas

Oct 4, 2012 at 8:26 PM

Hi Thomas,

Thank you once again for your time and reply. I agree with you that while it may be advantageous to have multiple dispatchers, it is always better to have a balance so that the total number of threads across all dispatchers is approximately equal to the number of cores in the machine.

I mistakenly assumed that since named pipes do not have the overhead of TCP and hence maybe much more faster than WCF TCP. Thank you for bringing to my notice this information.

I agree that we may have to switch to TPL + DF sometime. I have played around with TPL and found it to be very powerful and at the same time less verbose for doing some of the things that CCR does (I do not yet know whether this is good or bad :) ).  In case you have not seen this, there are posts by Emil Gustafsson (an engineer at the MSRDS team at Microsoft) comparing CCR and TPL data flows (http://blogs.msdn.com/b/cellfish/archive/2011/12/14/tpl-dataflow-and-async-await-vs-ccr-introduction.aspx). Just like you, we may also be staying with CCR for a another while :)

Thanks,

Venkat