RSS Blues

September 14, 2005

RSS Blues

I've been trying to fix the problems with the RSS for my blog, only to figure out a few odd problems with this pupolar XML standard.

When I started this blog, last week, I didn't take a lot of care to the output in RSS format. Following suggestions by a few readers on the blog talkback I ended up giving a second look to the format. The most relevant suggestion came from Stefano Falda and was to use the FeedValidator site to check my RSS feed.

No Namespaces

RSS ( Really Simple Syndication) is quite an odd XML format. For one think, it doesn't use a namespace. The reason mentioned in the standard documents is backward compatibility with previous versions. Still, having an optional namespace could have helped moving towards a better format... while to keep waiting doens't seem a good solution to me. To be more precise, RSS 2.0 uses namespaces but only for extensions. Odd.

Dates and Author Emails

Another odd feature is that is doesn't use the standard XML format for dates (to be precise, the format required by XML schemas). RSS 2.0 adopts a clumsy date format (10 Sep 2005 01:22:56 GMT, albeit a standard) instead of the one defined by XML Schemas (2005-09-10T01:22:56.978). Using the Date and Time Specification of RFC 822 makes sense for email and NNTP messages. I don't see it in an XML format. This means that writing a schema for an RSS is much more complex than is should be. But also processing an RSS file with XSLT is much more complex than it should. Luckily enough I had to do the opposite (moving form the XML format to the semi-textual one) which isn't hard at all.

To fix the original RSS of my site I had also to add the euthor email (something I don't really like, although I get so much spam that I won't probably see the difference) to allow having the author name, and a couple of other minor issues.

The Actual Content

The last issue I've handled, at a later time as you can see from some of my own talkback posts) relates to the actual information of the blog entries. The RSS format indicates that the description of each item is: The item synopsis. However, the document states also that:

in case the item is complete, the description contains the text (entity-encoded HTML is allowed)

Now, after thinking about it for a while I decided to omit the full text and go for a short description, but I've already got some complains. I'm still puzzled by the fact the RSS standard is so vague about one of its key elements. Anyway, if you feel you prefer having the full text of the enties in the RSS feed directly (you still get the link to the full post...) I'll grab one of the XLS book and get out the entity-encoded HTML. Or I could possibly provide both, you choose...

Wrapping Up: ATOM feed coming...

After looking at the issues regarding RSS 2.0 I've looked again to the alternative ATOM format. One of these nights I'll add another page to the site and write the XSLT for the conversion. As it has a namespace and uses proper dates, it should even be easier to support.

Update (Sep 14): RSS with Escaped HTML

I've now added support for a second RSS feed with the full content in escaped HTML. You can find the two URLs of the two alternative feeds in the site menu. I though there was simple way to obtain this effect with XSLT, but I was wrong. It is possible to do some add or remove some escaping, but apparently not to covert a node fragment into escaped XML. Luckily, I could modify the XML script driving the page to make the script escape the content, so that the XSLT needs only to output it as it is.
The reason XSLT is weak with escaped content, is that escaped content shuold not be used in XML (see, for example, Escaped Markup Considered Harmful by Normal Walch). I found that a similar discussion is taking place around the ATOM standard (which does have the same problem with the blog feeds content) at a wiki on Escaped Html Discussion. I'll work later on the ATOM feed.

posted by marcocantu @ 2:05AM | 4 Comments [0 Pending]

RSS Blues

The TIdTimeStamp component has a SetFromTDateTime 
property for loading a TDateTime value from a table 
and also a handy AsRFC822 property.

This is what I used in my own RSS aggregator.

Comment by Wilbert van Leijen on September 15, 17:00

ATOM feed

I've added the ATOM feed to the blog, see the link on
the side. FeedValidator OKs it. Hope you don't find
any problem.

Regarding the format convesions, these are easy to do
in Delphi, as you suggest. A little more complex in
XSLT, as I have to do for this site.

Comment by Marco Cantù [http://www.marcocantu.com] on September 16, 16:55

RSS Blues

Like you, I use XSLT to produce the HTML. Our 
mechanisms may differ, though.

So let me explain how I do it.

I have a WebBroker module to get the data out of the 
tables and into XML format. The TIdTimeStamp 
component is on a TWebModule, so it is easy to have 
the timestamp in the format RSS expects.

In my WebBroker module the XSLT processor, MSXML, is 
called via a COM interop, as demonstrated by - I 
believe - Craig Murphy in a past issue of The Delphi 
Magazine.

Comment by Wilbert van Leijen on September 17, 16:48

RSS Blues - XSLT

On this sever data does not reside in database tables,
but in versioned XML files. An engine does indexing
and offers searching features. So the only processing
I have from the document data to the HTML is either
the internal script or the XSLT. No specific Delphi
code (although the server is compiled with Delphi, its
code is rather general). 
I could add a custom Delphi function to the script,
but this is something we do only in extreme circumstances.
We do have a function to compute the day of the week,
which I use on the site, for example. I hope to have
time soon to describe the entire architecture, which
is quite original.

Comment by Marco Cantù [http://www.marcocantu.com] on September 20, 00:11

Marco Tech Blog

RSS Blues

No Namespaces

Dates and Author Emails

The Actual Content

Wrapping Up: ATOM feed coming...

Update (Sep 14): RSS with Escaped HTML

4 Comments

RSS Blues

ATOM feed

RSS Blues

RSS Blues - XSLT

Post Your Comment