This project is read-only.
1
Vote

Communication fails, but no exception thrown

description

Hello !

I am just on the way to find the best way to configure a 24x7 running app, where client and server may start independently and do auto-reconnect [or delayed connect on the client side, if the server is not running at the start moment].

I've started from: "08_ErrorHandlingWithCausality" and I have enabled logging.
I connect to the server, post a request and get a reply, all like to expect.

While the client is connected, I pull out the network cable [not really, I disabled the interface, it's a virtual machine].

I am creating a new request and post it. Astoundingly, this succeeds and no exception is thrown, but I receive a "transmitted" message from the ACK port !!!

The logs shows this:

Logging Debug: Sending XcoAppSpaces.Contracts.Messaging.WorkerMessage to "rx:9001", using CommService: mb-m500:49161
Computername: Logging Debug: Port registered with Guid: cc9a82d5-2e19-471f-8fcc-a97b4e8f47f6
Logging Debug: Sending Message to rx:9001
MMSClient(local).Ack [11.06.2013 22:40:57], Transfer ok at: 11 22:40:57.160 [my modified message handler]

Logging Error: Error converting stream to XcoMessage. Error:
Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond
after a period of time, or established connection failed because connected host has failed to respond.

[mb-m500, client, windows 7; rx, server, Server 2008 R2].

Would be great to get some hints!!

Thanks anyway and
best regards,

++mabra

comments

thomass wrote Jun 13, 2013 at 2:39 PM

Hello mabra,

Unfortunately I have never seen such a problem. In the log you posted I cannot see anywhere that the message has actually been sent (there should be a log message somewhere from the TCP transport service, saying exactly "Sending Message to rx:9001"). The error that is shown in your log is one from the receiving part of the connection. Are there maybe more messages in the log that you left out here?

The only thing I could imagine right now is that disabling the connection virtually leaves the client in a state where it cannot recognize that the connection has actually been closed. If that is the case, this should be reproducible with a TCP socket.

If there is an error with the ACK port: Have you reused the ACK port from the previous message, or have you created a new one? It should not matter, but you should try out separate ones just to be sure.

Also: Please tell me which appspace version you are using.

Best regards
Thomas

mabra wrote Jun 14, 2013 at 10:59 PM

Hi !

I am sorry, I was refering the "15_AsyncWaitForMessageSent" where this line:

var ackPort = clientSpace.Receive<Ack>(ack => Console.WriteLine("Message transferred successfully at " + ack.Date));

was modified by me to use a method instead. So this output in my original log:

MMSClient(local).Ack [11.06.2013 22:40:57], Transfer ok at: 11 22:40:57.160 [my modified message handler]

refers to the Ack handler in the original sample.

In deed, I reused the ack-port until now. But after changing the method to create
the ack and exception port for each request, the problem remains.

Just to be sure. I changed the loghandler, to output DateTime.Now and this
is the current result:

[14.06.2013 23:02:46]Logging Debug: Sending XcoAppSpaces.Contracts.Messaging.WorkerMessage to "rx:9001", using CommService: mb-m500:49206
[14.06.2013 23:02:46]Logging Debug: Port registered with Guid: 6d4c9f64-e19f-492b-9379-151580bdf4a3
[14.06.2013 23:02:46]Logging Debug: Port registered with Guid: 072c6ba1-2ba5-4682-99e0-0173a828d1b8
[14.06.2013 23:02:46]Logging Debug: Sending Message to rx:9001
=> Message transferred successfully at 14.06.2013 23:02:46
[14.06.2013 23:03:06]Logging Error: Error converting stream to XcoMessage. Error:
Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond
after a period of time, or established connection failed because connected host has failed to respond.

Note, the exception logged, is about ~20 s later. In the output above,
I just put "=>" in front of the samples message for the Ack port [so it is
like the original sample].

Even more strange:If I change from post to send, my call returns
immidiately and the message from IXcoLogger appears at about
the same delay as with using post!

I am using Xco 1.4.0 with the CCR 2.2.76

Best regards,
++mabra

thomass wrote Jun 17, 2013 at 2:44 PM

Hi mabra,

According to this log, the message seems to have been sent successfully. Unfortunately, the only thin I can think of is still that for some reason it is not recognized that the connection is not working any more. Maybe because you just plugged it "virtually". I have two ideas:
  • You could try communication with a simple TCP Socket connection and see whether you can reproduce the problem. If yes, this would confirm my theory and I probably wouldn't be able to do much about it in the AppSpace.
  • You could try this in a more "real-world" scenario with two non-virtual separate machines, where you can also physically severe the connection between the two, and see if the problem still occurs. I make some tests like this as well as soon as I find the time.
Best regards
Thomas

mabra wrote Jun 21, 2013 at 7:54 PM

Hi !

Logically, I agreed, it could be a problem with the behavior of the VM/their network
driver. Now, I found the next timeslot and modified the sample to use json serialization
and used a physical linux machine to try the same.

Sad and sorry to say, but the behavior is exactly the same. I am getting the
ACK about 1 ms after the post method returns. There is no exception.

As before, if I use send, the same thing happens !!!
I feel a little mad in the moment. There is the example, which show
to use send and catch the exception. But the execption is not there.

I changed my behavior a little bit:
I exchange [done with both:post/send] one message and then pull out the network cable. Then I send/post the next message [which does not fail] and within - I think
about 3-4 seconds, I connect the network cable. Then I see, the message will
be send and is really processed by the server.

The idea with the ack port was, to bundle post in a handler, which will continue
after ack or fail. Getting back to sync-send, was not the real option. But even
I use this, I do not get a proper error handling. Send, at all, would not be the
most worst situation, because there is no interactivity [also beeing async would
save a thread].

Any pointer to a working example would be great!

Thanks,
++mabra

thomass wrote Jun 24, 2013 at 7:44 AM

Hi mabra,
From what you describe it really seems to be a problem with the TCP layer. The ACK is returned as soon as the write operation to the NetworkStream returns successfully. When you say that plugging the cable back in really leads to the message being transferred, I assume that this really means that the message has been sent "successfully" and is waiting somewhere in the tcp stack to get through.
The ACK port (as well as the Send method, which simply waits for an ACK or Exception internally) is designed to just hand over the success/error information of the transport service, and if the transport service in this case is not able to correctly detect an error, this won't work as expected. Probably after being disconnected for some time, when either the socket closes automatically or when the message buffer is full, you will get an error as expected, though that may not be enough in your case.
What you could try is to alternatively use the WCF transport service (instead of "tcp.port=..." simply use "wcf.port=..."). Although WCF also uses TCP sockets by default, perhaps it has a better error recognition built in.
If that doesn't help, I fear that there may be no other possibility than to manually implement a "real" ack, by adding an ack port to your message and letting the receiver reply an ack as soon as the message has been received. At least that is simple with the AppSpace, since you can define as many response ports as you need in a single message.
One more thought: I am not exactly sure if pulling the network cable really reflects a reald world scenario. It may be better to try shutting down the server, or killing the application that is running on the server. Or do you really have a special use case where the connection between two machines can break like that?

mabra wrote Jun 24, 2013 at 10:01 AM

Hi Thomas!

Thanks for your reply.

The last thing first. What I am seeing mostly in (our companies) IT's infra is, that a part of the
network goes down. From top to down, I think, this is the order: WAN/LAN(Switch)/App/Server.

To not be able to send a message is bad, but not to avoid :-(
But to not get a failure is even more worse and I cannot really understand, why this can happen.
Sending an explicit response will have a big impact on the communication. TCP handles this
usually well with tcp windows. Could you probably point me to the right code in the AppSpace, which creates the tcp settings??

I made a tcp connection-checker some years ago and I'll spent time on it now, to see, how
this behaves in the mentioned situation. The tcp keepalive is usually only helpful, if there
are packages regularly exchanged [otherwise, it would nee about 1 hour].

I'll come back on this after some time.

br,
++mabra

thomass wrote Jun 27, 2013 at 2:35 PM

Hi mabra,
The code for the tcp socket communication can be found in the XcoAppSpaces.Transport.Sockets solution, see here: https://xcoappspace.codeplex.com/SourceControl/latest#trunk/source.implementation/XcoAppSpaces.Transport.Sockets/

Unfortunately I am rather busy right now, but if I find some time I will do some tests myself. Anyway, if you find improvements to the existing code, I will gladly integrate them.

Best regards
Thomas