Marco Tech Blog

March 27, 2009

On Strings and Unicode in Delphi 2009

There have been a few posts about strings in Delphi 2009. Here are a couple of comments.

There have been a few posts about strings in Delphi 2009:

Breaking existing code (on The Doric Temple). The main point here is that CodeGear shoudl have left string as AnsiString, PChar as PAnsiChar to make existing code exactly identical to the past... and introduce Unicode support alongside.
2009 and backwards compatibility (by Gurock Software). The main point here is conversion wasn't that hard and CodeGear made a good job with new warnings.
Misunderstood. Who me? (on the Doric Temple). Here the author counters some of the comments and adds that conversion is going on (seems less worried).
...and probably many others I missed

Having delved into Unicode in Delphi 2009 and given sessions about it, I see that as a standard reaction and understand it. But I think CodeGear did the right job making string an alias of UnicodeString and the like. They had two options, which are clear if you look at code like:

          var
  myString: string;
begin
  myString := 'some text';
  MessagBox ('title', PChar (myString), ...)

One option was to let those who wanted Unicode make an extra effort. Converting this code to Unicode would have meant changing the string type declaration (to UnicodeString), the API function call (to MessageBoxW), and the PChar cast (to PWideChar). The second option was to let those who want to stick with Ansi make an extra effort, which is what they did. Keeping this code in Ansi in Delphi 2009 means changing the string type declaration (to AnsiString), the API function call (to MessageBoxA), and the PChar cast (to PAnsiChar). But moving the code above to Unicode means... recompiling it!

I know there are several other issues, like the misuse of the PChar pointer for referring to data other than string characters, but they came up with PByte and the TBytes array for that... or the Bookmark property madness. I've converted a lot of code and seens many problems, including performance issues...

However from a high level perspective I think it all boiled down to these two options. And I think asking for an extra effort to those who want to stick to Ansi better serves the product to move to Unicode, even if the initial acceptance might be slower. That's from the point of view of a Western Europe citizen (not too much Unicode need around here, but some) with an accented letter in his last name (so I tend to put more emphasis on Unicode and character support than other people). But my "accented letter in last name" horror stories are for another blog post.

Now I need to pack to go to the US, for the training session with Cary Jensen. You can follow me on twitter, of course.

posted by marcocantu @ 7:22PM | 13 Comments [0 Pending]

13 Comments

On Strings and Unicode in Delphi 2009

I disagree.  I think the best idea would have been to 
add a compiler directive that forced $OldStrings so 
that developers could easily move their projects to 
Delphi 2009 and then phase in support for Unicode 
throughout their projects.  It is okay to default 
String to UnicodeString, though.

As of now, we are basically first changing everything 
to AnsiString, etc. in all source code to move to 
Delphi 2009 and then we are converting back to 
String.  Since we have hundreds of thousands of lines 
of code, this takes a bit of time.

Comment by Allen Drennan [http://www.nefsis.com] on March 28, 04:01

On Strings and Unicode in Delphi 2009

This is bizarre.  I cannot think of any other 
situation when anyone would sensibly suggest that 
"running to stand still" is something to aspire to.

By definition most people using ANSI Delphi didn't 
need Unicode (a generalisation remember).  They might 
have wanted it, as a nice to have, but if they 
absolutely *needed* it they could have done something 
about it already.

And it could have been made a switch.  Then those who 
wanted to stay ANSI wouldn't have had to do anything, 
and those that wanted to go Unicode wouldn't have had 
to do so much.


"Oh no it couldn't!  Then we'd need two parallel 
VCL's... blah blah blah"

Wrong - the "switch is impossible" argument rests 
entirely on an assumption that they way Unicode was 
implemented was the ONLY way it could have been done.  
And it wasn't.  The implementation even goes some way 
towards an approach that could have supported an 
ANSI/Unicode compiler switch, it just didn't go all 
the way necessary.

But it's all TWater under a TBridge now.  We have what 
we have.

I do find it interesting that the issue has reared 
it's head again though.

Before the release of 2009 those of us who criticised 
the approach might fairly have been accused of jumping 
at shadows, but it seems there was something to be 
scared of in those shadows after all.

Comment by Jolyon Smith on March 29, 22:56

On Strings and Unicode in Delphi 2009

I've read most of the Unicode blog posts over the past
few months, but somehow I've missed the "Bookmark
property madness".  What's this referring to?  Is it
the bookmark feature in the IDE, or (searching the
help) the Indy Bookmark in TIdURI, or...?

Comment by David M on March 30, 02:45

On Strings and Unicode in Delphi 2009

"By definition most people using ANSI Delphi didn't 
need Unicode (a generalisation remember)."
This is just the usual assertion by English speaking 
people writing "simple" software for their "little" 
English speaking world. "By definition"? LOL! I am 
just sorry Delphi went Unicode too late, although 
they made it the right way.
"Then those who wanted to stay ANSI", stick with 
Delphi 2007 and Windows 95, can't they?
"assumption that they way Unicode was implemented was 
the ONLY way it could have been done."
Of course there could have been many worse 
implementation. Could someone suggest a better one?
Anyway, reading posts and comments like this, I am 
more and more persuaded that are two big categories 
of Delphi developers:
1) "Ageing" developers who needs that everything 
stays the same - who cares if any operating system is 
nowadays Unicode? Who cares about different 
languages? Outside USA, they're all bad people! :D - 
Just give them a new TDBGrid and a Firebird driver 
and they're happy
2) Part time developers who just need new cool 
features to play with just because their friends 
using Python/Ruby have. They dream Delphi will 
include any new cool feature from language X, Y and 
Z, even if they use totally different paradigms and 
can hardly fit in Delphi one (mixins? They would 
require multiple inheritance - it can be done with 
interfaces - but that's not so cooool...)
In the middle there are those developers who need a 
modern development tool that evolves incrementally 
but rapidly with the underlying environment, and to 
cope with newer challenges without preventing low-
level coding.
For example, the lack of pointer arithmetic and 
indexing lead to the use of PChar for the same task. 
It would have helped many developers who needs to 
work at a lower level than web frontend developers. 
Someone still needs.

Comment by Luigi D .Sandon on March 30, 16:30

On Strings and Unicode in Delphi 2009

"This is just the usual assertion by English speaking 
people writing "simple" software for their "little" 
English speaking world."


The only relevance of "English speaking" worlds to 
this is that NEED and WANT have two very different 
meanings in the English language.


Anyone that NEEDed unicode would either NOT be using 
Delphi or would already be using one of the Unicode 
solutions that work with Delphi.

It is "by definition" because if you NEED something 
that it is in your gift to obtain then you go and get 
it - IF you NEED it.  If you have the power to get 
something you say you need, but choose not to acquire 
it then you don't actually need it at all.  You simply 
want it.


The approach in Delphi 2009 places demands on people 
who need to migrate to Unicode AND on those who have 
no wish - or no need - to migrate to Unicode.

And we end up with such idiotic outcomes as an 
ANSIPos() function that takes a UnicodeString 
parameter.  Such "smells", are in my experience, a 
pretty reliable indication of bad design decisions.


We are told: "A compiler switch was impossible"

This is demonstrably FALSE, since there also exists 
advice for those who NEED to avoid Unicode (it's not 
just about code, but also databases and existing 
customers/users won't be happy about the disruption to 
their business caused by having to undertake 
potentially HUGE database migrations for ZERO GAIN (to 
them)).

The advice is to go through application code and 
change "String" to ANSIString "Char" to ANSIChar, etc 
etc.

In other words, to do EXACTLY what a compiler switch 
could do for us!!!

It's a compiler switch by any other name!!

The "parallel RTL/VCL" argument is a straw man.

I see no problem with the RTL and VCL going Unicode.  
All that would be required is for CodeGear to include 
a declaration at the top of each RTL/VCL unit to 
override any per project setting for the compiler 
String switch:

  {$UNICODE ON}

(or whatever - I'm not precious about what the switch 
should be called)

With $UNICODE = ON, String = UnicodeString, Char = 
WideChar, etc etc and unadorned Windows API calls map 
to the "W" variants.

With $UNICODE = OFF, String = ANSIString, Char = 
ANSIChar, etc etc  and unadorned Windows API calls map 
to the "A" variants.

Impossible?

I don't see why.  Perhaps not exactly as simple as 
only briefly described above, but not much more 
complex either I think.

Comment by on April 1, 04:10

On Strings and Unicode in Delphi 2009

It's funny that people who don't use/need/want 
Unicode (their words, not mine) and thereby probably 
haven't a deep knowledge of Unicode because they 
don't use it, think they know how to implement 
Unicode support in Delphi in the best way.

Comment by Luigi D .Sandon on April 2, 11:43

On Strings and Unicode in Delphi 2009

  There should be a $UNICODE switch which is defaulted
to on.  Simply changing the string type to Unicode
does not magically make your application support
multiple languages.  Forcing everybody to change their
source code is foolish.   There might be a lot of
lines of delphi code in the VCL that would need to be
fixed to handle this, but it can't compare to the
amount of Delphi code that is out in the wild that
would need to be fixed.  For most people, 'upgrading'
to Delphi 2009 will be a huge productivity loss for
minimal gain.

Comment by anon on April 13, 23:40

On Strings and Unicode in Delphi 2009

some people need to be forcibly dragged into the 
future.

Comment by kotekzot [] on April 15, 14:36

On Strings and Unicode in Delphi 2009

 I don't mind new projects being created in the 
"future", but we all have large legacy projects that 
need to be updated, and Delphi 2009/2010 has made that 
needlessly complicated and potentially disastrous. I 
would have been very happy with a compiler directive, 
because that would have given me a one-stop option.

We've been using Unicode for many years where it was 
needed, but using the old string type where it was 
simple and appropriate - for example for network 
messaging.

The simplest solution the short term for non-trivial 
projects is to keep a couple of computers running XP 
and D7 for maintenance and upgrades, and develop new 
applications on new computers with D2010. I don't like 
this, but somebody produces a neat automated update 
tool, and don't like the alternatives.

Comment by Jim Hawkins [http://www.melissi.co.uk] on February 9, 12:56

On Strings and Unicode in Delphi 2009

I Passionately agree on the compiler directive 
approach. I’ve been involved (on and off) on a 
relatively large Delphi project that we have upgraded 
over the years from Delphi 2 and up to Delphi 2007. 
13 years of hard work. This solution is used in-house 
by 2500+ users and is considered a great success, but 
Unicode will NEVER be relevant. E.g. the benefit is 
NIL, and the cost is very high. 

We will just wait until the compiler directive is 
introduced, or when the other features of Delphi 
outweighs the cost of upgrading to Unicode. We will 
upgrade with the Delphi 2046 release.

Comment by Havardo on June 18, 16:21

On Strings and Unicode in Delphi 2009

I do agree with having such a switch.
All my code, which compiled perfectly in d2006 is now
not working in d2010. Do you know how many days of
engineering time is spent on this stupid unicode
strings?!?

Example code:
   header : string[100];
   version : byte;
   ...
   Header := 'Data ' + EditComment.Text;
   Header[100] := CHR (version);    <-- fails here:
CHR command in d2010 returns AnsiChar!

And, besides of all, Help file was not updated! It
still states: "function Chr(X: Byte): Char;" but
compiler uses AnsiChar.

My 100-byte "Header" gets written into a binary file,
which format is fixed. It is impossible to fit unicode
shit there!

Stupid world.

Comment by Aly [] on July 21, 06:54

shifting to unicode

As we look at reality, not all code can be shifted to
Unicode, as there are interfaces to old systems that
does not allow change.

Did you ever consider that?

Comment by name on May 31, 12:10

On Strings and Unicode in Delphi 2009

Dear "name", in future versions of Delphi you'll be able 
to keep your projects as 32 projects or move them over, 
or write code that can compile on both. So not all code 
will be shifted, for sure.

Comment by Marco Cantu [http://www.marcocantu.com] on May 31, 13:08

Post Your Comment

Click here for posting your feedback to this blog.

There are currently 0 pending (unapproved) messages.