There is a reason I haven't blogged over the last two days. Beside having to prepare my Delphi 2007 seminar (taking tomorrow morning) I had to fix a very nasty bug that took me almost 4 days to chase. Here is the story. With a company I work for and I'm a partner with, we've built a complex web-based architecture in which multiple server -side applications share the workload and communicate over sockets. For this reason, we've created an XML-based communication architecture. A request is send in the form of an XML string and a response is received in the form of XML data in a stream. This worked for years, both running multiple server applications on the same server or distributing them over a few boxes transparently, as the inter-process communication is socket-based.

Now one of the programs built with this technology is my newsgroup front end, dev.newswhat.com. On this site, we experienced intermittent problems due to our choice for the socket message terminator. We originally picket character #27, a symbol almost never used. And it is not used by itself, but it can be the second character of an escaped UTF-8 sequence. So with some messages in newsgroups with odd characters, the socket transmission would end too soon and the following portion of the data would be send after the following request, loosing synchronization.

That's why we decided to make things safer going for a two characters (or two bytes) separator, simply doubling the special character we were using. Tests were quite successful and the newshat.com site didn't got into the same problem any more. Great. Last week, however, we deployed a multi-servers system and moved a couple of large applications to a second server. Or we tried to, as they'll quite soon stop working and simply keep listening for more data in the sockets.

After a lot of debugging (on the actual Linux servers, as everything works locally) I found the problem. The Indy code responsible for the communication reads the incoming data in chunks and looks for the terminator characters in each buffer. However, in case the two characters of the terminator are slip among the two buffers, it will keep reading and never get to the end of the stream. We did use Indy 9 (latest available version) so the problem might have been fixed in version 10. The method in question is:

      function TIdTCPConnection.ReadLn(ATerminator: string = LF; 
const ATimeout: Integer = IdTimeoutDefault; AMaxLineLength: Integer = -1): string;

and this is the line that looks for the terminator in the current buffer (after the LSize position, already scanned in the past) and can fooled by a multi-byte terminator:

LTermPos := MemoryPos(ATerminator, PChar(InputBuffer.Memory) + LSize, LInputBufferSize - LSize);

Now the odd thing is that is the two programs are on the same computer, the buffer is filled very fast and it is unlikely to have a split terminator. By remoting the program, buffer read at each iteration is much smaller and (reading thousands of documents as we do) there is a much higher change for an error