Suggestion: Leverage more features of .NET and CCR

Apr 13, 2011 at 7:29 PM

Hi Thomass! I have been reviewing your code to understand where we will find bottlenecks and memory issues if this system were scaled to a large (enterprise wide) deployment. I noticed that it seems like you are not taking full advantage of the features baked into CCR (and .NET itself) throughout a significant portion of your code. Specifically, I have listed some aspects below:

 

  1. You are creating discreet System.Threading.Timer objects in several classes instead of leveraging the EnqueueTimer() feature of CCR.
  2. You are performing synchonous reads / writes on the NetworkStream object in the XcoTCPTransportService.
    Note: We moved to fully async TCP/IP communications last year and noticed a nearly 100x improvement in message throughput. 
  3. You are blocking code in several areas and using locks to prevent concurrent execution (which could be handled using CCR ports and iterators).
  4. You are making direct calls to the threadpool and creating discreet threads instead of leveraging the CCR backend.

 

Do you have any plans to review these implementation choices in the future? If so, is there an opportunity for me to get involved to help improve this system? Also, are you currently allowing developers to push patches to you via the CodePlex source control system?

Thanks,

Keith

Coordinator
Apr 15, 2011 at 8:54 AM

Hi Keith,

Thanks for your code review! It is always interesting for us to hear suggestions about where we could improve our code. Concerning the things you mentioned:

  • You are creating discreet System.Threading.Timer objects in several classes instead of leveraging the EnqueueTimer() feature of CCR.

    We haven't used the ccr timer features in our code until now - but this is an interesting idea and makes sense to me.

  • You are performing synchonous reads / writes on the NetworkStream object in the XcoTCPTransportService.
    Note: We moved to fully async TCP/IP communications last year and noticed a nearly 100x improvement in message throughput.

    That's definitely a thing on our todo list. The current solution creates an own thread for each incoming connection, which won't scale very well for large numbers of connections. From the experience you made it seems to make an even larger difference than I expected until now.

  • You are blocking code in several areas and using locks to prevent concurrent execution (which could be handled using CCR ports and iterators).

    In some places this is unfortunately inevitable, where users can call code that needs synchronization directly from the appspace API (e.g. Run/Connect/Resolve workers).  In other areas of the code a more efficient solution may be possible - we are definitely interested to hear where other mechanisms than locking could improve the code.

  • You are making direct calls to the threadpool and creating discreet threads instead of leveraging the CCR backend.

    Most of these calls are concentrated in the transport service code, which is - as you said above - currently built on sync read/write, so changing to async should probably also remove most of the calls to threadpool/thread. Which would definitely be a nice improvement.

    One reason why we didn't integrate the CCR deeper into our code until now is, that it unfortunately also has a problematic side for us, because it sort of conflicts with our plans to integrate the appspace for other platforms like Silverlight and Mono/MonoTouch/MonoDroid. Unfortunately there is currently no CCR Version for these platforms. Which also means that it becomes more difficult for us to port the appspace to these platforms the more we integrate the CCR into our code. Therefore we will have to carefully look for possibilities to make such features replaceable (e.g. for Timers it should be rather easy to hide the ccr functionality behind an interface; for ccr ports and iterators it will be more difficult).

    That aside, we are really happy if you want to get involved into improving our system! We will gladly accept patches submitted over codeplex and integrate them into our system. An of course we are always interested in further suggestions where improvements could be made. Since you seem to have experience with async socket communication, I would especially be interested to see parts of your solution, and how you would integrate this into the tcp transport service.

    Best Regards
    Thomas

  • Apr 15, 2011 at 5:48 PM

    Thanks for the quick reply Thomass. I have started working on some of the aforementioned changes but I have quickly run into a snag. I would like to be able to use some of the features of CCR within the XcoAppSpaces.Transport.Sockets project but I do not have a reference to the CCR dispatcher being used by the parent XcoAppSpace object. I started working backwards to acquire this reference and quickly realized that this is not an easy task. Am I missing something here? Would you be able to point me in the right direction for acquiring this reference? If obtaining a reference to the CCR dispatcher is an issue I will attempt to implement my changes without it but CCR would make it much more streamlined and efficient.

    Coordinator
    Apr 18, 2011 at 10:20 PM
    Hi keith,
    It's not really obvious how to get a reference to the ccr dispatcher,
    but fortunately quite easy:
    You may have noticed the Initialize() method that every service
    implements. This method is called when the appspace is starting up,
    and gets handed in a service registry object. From this you can get
    out an ICcrDispatcherService object, which holds references to all the
    important ccr objects you need (you will additionally need to add a
    reference to XcoAppSpaces.Contracts.CcrDispatching).
    Hope this helps!

    Best regards
    Thomas
    Apr 19, 2011 at 3:38 AM
    Edited Apr 19, 2011 at 3:45 AM

    Thanks thomass... that was too easy! I am back to making the performance enhancements and further attempting to understand your TCP transport implementation. It looks fairly straightforward. However, one area I am concerned about is the TCP read implementation (i.e. the Convert method in the TransferHelper.cs class). This method appears to make several expensive blocking TCP reads for each received XcoMessage. It seems as it would be a significant performance improvement to simply read the maximal TCP receive buffer length then perform all of the byte[] to XcoMessage conversion in memory instead of through individual TCP read calls. Is there any specific reasoning behind your current implementation?

     

    Also, one change that could drastically enhance TCP transport performance is to simply queue the messages to be sent (using CCR ports) and not actually block until they are sent. Is there anything in your current implementation that actually depends on having the messages sent prior to continuing? If not, I plan to change your send call to simply post to a CCR port and have an Interleave receiver registered on the port that is responsible for actually sending the messages in FIFO order using the transfer helper. This way, the sender is not blocked waiting for the previous message to be sent and the associated CCR receiver can also be torn down if the assigned TCP client is closed. Does this sound like an acceptable implementation or will this somehow negatively impact your overall design?

    Coordinator
    Apr 20, 2011 at 4:18 PM
    No, I don't think there is any special reason behind this. It probably
    just was the easiest way of implementing this at the time, but may not
    be an optimal solution. The code also went through several
    restructurings already, where not always the optimal solutions may
    have been found. So feel free to change any of this if you like!
    Apr 20, 2011 at 4:26 PM

    Thanks again for the quick reply... I have built a small test app that simply sends 100,000 messages (each with a string of 512 bytes in length) from a client XcoAppSpace to a remote server XcoAppSpace using TCP transport over the TCP loopback connection (i.e. 127.0.0.1 address). I have tested with some minor mods (i.e. eliminating timers, etc...) to the XcoAppSpaces.Transport.Sockets project files and am currently seeing a maximum sustained throughput of approximately 4300 messages per second (not too bad). I have acquired a second PC and will test again across a real gigabit IP network as well. I will also post the results after I have completed rework of the actual TCP send and receive methods.

    Coordinator
    Apr 20, 2011 at 4:36 PM
    Concerning the queuing of messages to be sent via ccr ports, there is
    one problem: you may have noticed that there are currently two
    possibilities for sending messages, which is either via port.Post() or
    port.Send(). The second variant should ensure that after the call
    returns the messages has definitely been sent (or, if it could not be
    sent, that an exception is thrown). It does that by calling the Send()
    mehtod of the transport service directly. So, if you make the sending
    asynchronous there is the problem that this won't work any more.
    Therefore it would be better to let the Send() method work
    synchronously.

    In any case, it would be interesting, how much performance improvement
    the async Send() variant would bring, if you happen to have time to
    test that. If there is a noticeable difference in performance, it
    would be worth to think about deeper changes in the port.Send()
    functionality.
    Apr 20, 2011 at 5:34 PM

    Your current implementation may require some larger changes in order to support a fully asynchronous messaging system. I the case of our system, we emulate an event notification system using CCR ports (see example below):

    Example for supporting asynchronous TCP sends (similar approach to support TCP receives).

    1. User sets up a receiver for TCP send completion notifications on a CCR port (TcpSendCompletedPort).
    2. User requests to send a TCP message by posting the message to a CCR port (TcpSendPort).
    3. The receiver registered to the TcpSendPort obtains the message from the port and sends it using async TCP methods.
    4. Upon completion of sending the TCP message (from within the TCP send callback) a message is posted to the TcpSendCompletedPort.
    5. The user code receives the send completed message and takes the appropriate action.

    This implementation has worked well for us as it eliminates all use of thread blocking mechanisms (i.e. locks, ManualResetEvents, etc...) as these mechanisms have been found to drastically impact the overall performance of CCR. This also helps eliminate the use of infinite while loops and allows everything to begin to execute as standalone functional blocks (which make unit testing and mocking much easier).

    Coordinator
    May 5, 2011 at 5:11 PM

    Hi Keith,

    I agree with you that a structure like this would probably make sense in terms of asynchronous communication. Unfortunately, as you say, this would require some larger changes in our system (changing all transport services, adapting the behavior of port.Send() ...). For a first step I think it would be best to allow asynchronous receiving of messages in the tcp transport service (which would already remove all the thread creation and infinite loops), and in a later step do the asynchronous send. The asynchronous receive will need no changes to other appspace components, so only the tcp transport service itself will need to be changed.

    In the mean time we will further look into the impact and needed changes for an asynchronous send mechanism.

    May 31, 2011 at 7:03 PM
    Edited May 31, 2011 at 7:06 PM

    @thomass - I have been working on a custom implementation of the TCP transport service. I think I have it close to working properly but I keep receiving an error message when my client space attempts to connect to my server space. The error I receive is shown below:

     

    "Worker at 10.100.93.144:9000 with type StringWorkerContract could not be resolved. Reason: NoResponse"

     

    I am concerned that the ConnectWorker() method is not properly parsing the worker address / port config string. I have verified that I am able to create a new AppSpace using my custom transport and the parameters are populated correctly (following your guide from this website). However, it seems like I should have to do something to tell the ConnectWorker() method how to parse the provided config string. See my example below using my custom XcoTcpTransportService class:

     

    SERVER SIDE CODE:

    using (XcoAppSpace serverSpace = XcoAppSpace.ConfigureWithConfigString("XcoTcp.Port=9000").UsingService<XcoTcpTransportService>().WithName("MyServer"))
    {
        serverSpace.RunWorker<StringWorkerContract, StringWorker>("StringService");
        Console.WriteLine(String.Format("Server started at address = {0}.", serverSpace.Address));
        Console.ReadLine();
    }
    

     

    CLIENT SIDE CODE:

    using (XcoAppSpace clientSpace = XcoAppSpace.ConfigureWithConfigString("XcoTcp.Port=9001").UsingService<XcoTcpTransportService>().WithName("MyClient"))
    {
        StringWorkerContract contract = clientSpace.ConnectWorker<StringWorkerContract>("10.100.93.144:9000/StringService");
        Console.WriteLine("Connected to server. Starting test....");
    }
    
    May 31, 2011 at 9:09 PM

    @thomass - One other quick question... Do you have any design documents and / or would you be able to provide some detail on what exactly you are doing in the TransferHelper.cs file that is part of the XcoAppSpaces.Transport.Sockets project? I understand that this class is used to serialize / deserialize XcoMessage objects but I am confused about the actual implementation of the two Convert() methods. Specifically, why do you perform so many "len = BitConverter.GetBytes(addr.Length); memStream.Write(len, 0, len.Length);" executions (and the opposite for the conversion in the other direction)? I am trying to clean up this implementation to make more efficient use of the underlying sockets but I want to make sure that I am serializing / deserializing the XcoMessage objects correctly.

    Coordinator
    Jun 1, 2011 at 9:33 AM
    Edited Jun 1, 2011 at 9:35 AM

    Hi Keith,

    Glad to hear about your progress!

    Concerning ConnectWorker:
    The ConnectWorker method actually doesn't do much interpretation of the connection string. All it does is splitting the string with the last "/" it finds - the first part is then used as address, the second part as worker name (which would work without problems in your case). So, the Send() method of your transport service will be called with the remoteAddress "10.100.93.144:9000" (this is also what the error message you are getting says). Does any message arrive at the server side? The "NoResponse" reason normally means that the message to resolve the worker has been correctly sent to the server space, but the server space didn't answer - so I assume that there could be some problem with returning the answer from the server to the client.
    As for the configuration you are using to instantiate both appspace instances, I see that your are using both a config string and the fluent interface - I'm not sure if this could lead to problems, because the config string will create a new instance of your transport service, but the fluent configuration following after that will to this as well (so you might end up having to transport services). It should be enough if your just use the config string (XcoAppSpace.ConfigureWithConfigString("XcoTcp.Port=9000")).

    Concerning the TransferHelper class:
    The actual goal is just to serialize the contents of an XcoMessage ( + the local address, which is needed by the other side to send answers) in a way so that it is independet of a certain serializer (so it can easily be deserialized also with other platforms, like Compact Framework or Micro Framework). The length values of the single parts of the message are just written to the stream so that during deserialization these parts can correctly be separated from each other again. Perhaps it would be more efficient to just put some kind of "separation byte" between the values (though I don't know which byte value would be suitable for something like that, since it would need to be guaranteed that it doesn't appear anywhere else in the stream).
    So, as long as the XcoMessage + address comes out correctly on the other side, feel free to change it to what you feel would be most effective. 

    Jun 1, 2011 at 1:30 PM

    Thanks thomass! That makes sense. I am seeing the client worker make a TCP connection to my server worker. I am also seeing the Send() method called several times. This could be what you were alluding to with the config string / fluent config combo. I will take a look at the serialization that I need to perform. This could also be causing the problem of the "no response" error. For further clarification, would you be able to show the message chunks for a serialized XcoMessage object like in the following example?

    Serialized Object (byte array):

    [lengthInfo][remoteAddress][contentInfo][causalityContext]

    Coordinator
    Jun 1, 2011 at 4:44 PM

    Sure - currently the data is serialized as follows:

    • the length values are 32 bit integers, meaning they take 4 bytes of space each
    • first come all the length values, then the actual data

    Structure:
    [overall msg length][address length][content info length][context length][content length][address][content info][context][content]

    • the overall message length is just used as a control value and could probably also just left away
    • "address" is the address of the sender, so that the receiver knows who the sender is
    • "content info" is a string information about the content, which can be used to find out something about the message content in the case when deserialization fails. it is optional, so if the content info string is null, 0 is sent as length
    • "context" contains the causality context, which is already serialized into a byte array. it is also optional (since the user doesn't necessarily need to use a causality), so if the byte array is empty or null, 0 is sent as length
    • "content" is the actual message that the user posted either to a worker or remote port, already serialized into a byte array
    • You may wonder why context and content are serialized seperately - this is because if the message content fails to deserialize (e.g. because the message that the user posted is not deserializable), the context can still be deserialized independently and used to post back an exception to the causality port

    Hope this helps!