When I started this blog, last week, I didn't take a lot of care to the output in RSS format. Following suggestions by a few readers on the blog talkback I ended up giving a second look to the format. The most relevant suggestion came from Stefano Falda and was to use the FeedValidator site to check my RSS feed.

No Namespaces

RSS ( Really Simple Syndication) is quite an odd XML format. For one think, it doesn't use a namespace. The reason mentioned in the standard documents is backward compatibility with previous versions. Still, having an optional namespace could have helped moving towards a better format... while to keep waiting doens't seem a good solution to me. To be more precise, RSS 2.0 uses namespaces but only for extensions. Odd.

Dates and Author Emails

Another odd feature is that is doesn't use the standard XML format for dates (to be precise, the format required by XML schemas). RSS 2.0 adopts a clumsy date format (10 Sep 2005 01:22:56 GMT, albeit a standard) instead of the one defined by XML Schemas (2005-09-10T01:22:56.978). Using the Date and Time Specification of RFC 822 makes sense for email and NNTP messages. I don't see it in an XML format. This means that writing a schema for an RSS is much more complex than is should be. But also processing an RSS file with XSLT is much more complex than it should. Luckily enough I had to do the opposite (moving form the XML format to the semi-textual one) which isn't hard at all.

To fix the original RSS of my site I had also to add the euthor email (something I don't really like, although I get so much spam that I won't probably see the difference) to allow having the author name, and a couple of other minor issues.

The Actual Content

The last issue I've handled, at a later time as you can see from some of my own talkback posts) relates to the actual information of the blog entries. The RSS format indicates that the description of each item is: The item synopsis. However, the document states also that:

in case the item is complete, the description contains the text (entity-encoded HTML is allowed)

Now, after thinking about it for a while I decided to omit the full text and go for a short description, but I've already got some complains. I'm still puzzled by the fact the RSS standard is so vague about one of its key elements. Anyway, if you feel you prefer having the full text of the enties in the RSS feed directly (you still get the link to the full post...) I'll grab one of the XLS book and get out the entity-encoded HTML. Or I could possibly provide both, you choose...

Wrapping Up: ATOM feed coming...

After looking at the issues regarding RSS 2.0 I've looked again to the alternative ATOM format. One of these nights I'll add another page to the site and write the XSLT for the conversion. As it has a namespace and uses proper dates, it should even be easier to support.

Update (Sep 14): RSS with Escaped HTML

I've now added support for a second RSS feed with the full content in escaped HTML. You can find the two URLs of the two alternative feeds in the site menu. I though there was simple way to obtain this effect with XSLT, but I was wrong. It is possible to do some add or remove some escaping, but apparently not to covert a node fragment into escaped XML. Luckily, I could modify the XML script driving the page to make the script escape the content, so that the XSLT needs only to output it as it is.
The reason XSLT is weak with escaped content, is that escaped content shuold not be used in XML (see, for example, Escaped Markup Considered Harmful by Normal Walch). I found that a similar discussion is taking place around the ATOM standard (which does have the same problem with the blog feeds content) at a wiki on Escaped Html Discussion. I'll work later on the ATOM feed.