September 14, 2005
RSS Blues
When I started this blog, last week, I didn't take a lot of care to the output in RSS format. Following suggestions by a few readers on the blog talkback I ended up giving a second look to the format. The most relevant suggestion came from Stefano Falda and was to use the FeedValidator site to check my RSS feed.
No Namespaces
RSS ( Really Simple Syndication) is quite an odd XML format. For one think, it doesn't use a namespace. The reason mentioned in the standard documents is backward compatibility with previous versions. Still, having an optional namespace could have helped moving towards a better format... while to keep waiting doens't seem a good solution to me. To be more precise, RSS 2.0 uses namespaces but only for extensions. Odd.
Dates and Author Emails
Another odd feature is that is doesn't use the standard XML format for dates (to be precise, the format required by XML schemas). RSS 2.0 adopts a clumsy date format (10 Sep 2005 01:22:56 GMT, albeit a standard) instead of the one defined by XML Schemas (2005-09-10T01:22:56.978). Using the Date and Time Specification of RFC 822 makes sense for email and NNTP messages. I don't see it in an XML format. This means that writing a schema for an RSS is much more complex than is should be. But also processing an RSS file with XSLT is much more complex than it should. Luckily enough I had to do the opposite (moving form the XML format to the semi-textual one) which isn't hard at all.
To fix the original RSS of my site I had also to add the euthor email (something I don't really like, although I get so much spam that I won't probably see the difference) to allow having the author name, and a couple of other minor issues.
The Actual Content
The last issue I've handled, at a later time as you can see from some of my own talkback posts) relates to the actual information of the blog entries. The RSS format indicates that the description of each item is: The item synopsis. However, the document states also that:
in case the item is complete, the description contains the text (entity-encoded HTML is allowed)Now, after thinking about it for a while I decided to omit the full text and go for a short description, but I've already got some complains. I'm still puzzled by the fact the RSS standard is so vague about one of its key elements. Anyway, if you feel you prefer having the full text of the enties in the RSS feed directly (you still get the link to the full post...) I'll grab one of the XLS book and get out the entity-encoded HTML. Or I could possibly provide both, you choose...
Wrapping Up: ATOM feed coming...
After looking at the issues regarding RSS 2.0 I've looked again to the alternative ATOM format. One of these nights I'll add another page to the site and write the XSLT for the conversion. As it has a namespace and uses proper dates, it should even be easier to support.
Update (Sep 14): RSS with Escaped HTML
I've now added support for a second RSS feed with the full content in escaped HTML. You can find the two URLs of the two alternative feeds in the site menu. I though there was simple way to obtain this effect with XSLT, but I was wrong. It is possible to do some add or remove some escaping, but apparently not to covert a node fragment into escaped XML. Luckily, I could modify the XML script driving the page to make the script escape the content, so that the XSLT needs only to output it as it is.The reason XSLT is weak with escaped content, is that escaped content shuold not be used in XML (see, for example, Escaped Markup Considered Harmful by Normal Walch). I found that a similar discussion is taking place around the ATOM standard (which does have the same problem with the blog feeds content) at a wiki on Escaped Html Discussion. I'll work later on the ATOM feed.
4 Comments
ATOM feed
I've added the ATOM feed to the blog, see the link on the side. FeedValidator OKs it. Hope you don't find any problem. Regarding the format convesions, these are easy to do in Delphi, as you suggest. A little more complex in XSLT, as I have to do for this site.Comment by Marco Cantù [http://www.marcocantu.com] on September 16, 16:55
RSS Blues
Like you, I use XSLT to produce the HTML. Our mechanisms may differ, though. So let me explain how I do it. I have a WebBroker module to get the data out of the tables and into XML format. The TIdTimeStamp component is on a TWebModule, so it is easy to have the timestamp in the format RSS expects. In my WebBroker module the XSLT processor, MSXML, is called via a COM interop, as demonstrated by - I believe - Craig Murphy in a past issue of The Delphi Magazine.Comment by Wilbert van Leijen on September 17, 16:48
RSS Blues - XSLT
On this sever data does not reside in database tables, but in versioned XML files. An engine does indexing and offers searching features. So the only processing I have from the document data to the HTML is either the internal script or the XSLT. No specific Delphi code (although the server is compiled with Delphi, its code is rather general). I could add a custom Delphi function to the script, but this is something we do only in extreme circumstances. We do have a function to compute the day of the week, which I use on the site, for example. I hope to have time soon to describe the entire architecture, which is quite original.Comment by Marco Cantù [http://www.marcocantu.com] on September 20, 00:11
Post Your Comment
Click here for posting your feedback to this blog.
There are currently 0 pending (unapproved) messages.

RSS Blues
Comment by Wilbert van Leijen on September 15, 17:00