Compressing HTTP

As you might remember from previous posts, this web sites and other similar ones I've built are powered by XML and SOAP. The problem is that moving around large XML documents is far from efficient. I realized, for example, that an XML document I need to download on clients when the data is refreshed (about once a week) is a whopping 1.17 MB. The trouble is the need for several these files. The good news is that is zips to 86K, a nice 93% ratio. How do you obtain the same over an HTTP connection?

Apache mod_deflate

It turns out that using Apache to activate HTTP compression is quite easy. Just add a SetOutputFilter DEFLATE for a Location, and it is done. If you see this web site working better/faster since a few minutes ago, that's why. Of course, to have proper compression your browser needs to ask for the compressed content. This is something Mozilla Firefox does out of the box (and easy to detect with the Web Developer plugin I have installed, using the Information - Response Headers command). I'm still not sure about Microsoft's Internet Explorer, but it should work as well. In any case, if the client doesn't ask for the compressed page it will get a plain page, so it will keep working anyway.

Custom Clients

What about writing a custom client like I'll need for the SOAP request? I still haven't managed to do that, but I've written an Indy client app in Delphi that demonstrate the steps with a plain HTTP connection (no SOAP involved). This is the relevant code:

  memstr := TMemoryStream.Create;
idHttp1.Request.AcceptEncoding := 'gzip,deflate';
idHttp1.Get('...URL...', memstr);

// let's see what is returned
ShowMessage (idHttp1.Response.ContentType);
ShowMessage (idHttp1.Response.ContentEncoding);

// save to a file
filestr1 := TFileStream.Create ('test.gz', fmCreate);
memStr.Position := 0; // reset
filestr1.CopyFrom(memstr, memstr.Size);

Now that the program has retrieved a local compressed file it can uncompress it. It seems that Delphi own compressed stream classes support only the ZIP format, so I had to look for something else. To complete my demo I've used Abbrevia's TAbUnZipper component (available on sourceforge):

  unzipper := TAbUnZipper.Create(nil);
unzipper.ForceType := True;
unzipper.ArchiveType := atGzip;
unzipper.FileName := 'd:\test.gz';
unzipper.BaseDirectory := 'd:\';
unzipper.ExtractAt (0, 'index2.html');

Looking Forward

Of course I'm looking for a solution that allows in-memory decompression, without the temporary files (TAbUnZipper allows uncompressiong to a memory stream, but it seems the source needs to be a file). And I'm also looking forward to integrate this with the SOAP request code (the Indy or the standard version)...

I've already come up with the Delphi client code for the .NET CF platform, using the #ZipLib, only to find out interesting issues when using GPRS connectivity. More on this another time.