February 2, 2009

Delphi Super-Duper Strings

Follow my thoughts from strings optimizations to strings in Delphi: we are so used to have power at hand that we forget.

On Friday I was reading Jeff Atwood (of Coding Horror fame) post on micro-optimization, which is basically a post on using strings to create HTML. The core of the article revolves around the problem caused by the fact that "In most garbage collected languages, strings are immutable: when you add two strings, the contents of both are copied." This is certainly true for .NET. Jeff goes on covering Shlemiel the painter, showing how code can be improved, and realizing this his optimizations (for a limited set of string concatenations) made little difference. In the end, Jeff concludes "you should be more worried about the maintainability and readability of your code than its performance."

My first thought was, good luck I use Delphi, which is very fast on string concatenation, so I don't have to worry at all. And I can have code that is readable and fast at the same time. I partially disagree with the fact this optimizations are not relevant when, for example, you start cranking out XML (whcih I tend to do a lot). Think about 500 rows with 30 fields requiring each an open and closed node (so you have three strings for each node), and you are concatenating 50,000 strings. That might be for a single AJAX response...

In any case, what drew my attention was a comment: "Since the html/xml/etc explosion I've been waiting to see the emergence of a string-based language". In fact, there are other comments saying how the Java runtime will automatically optimize string concatenations, others claiming you don't worry about performance when using C or C++... and this made me realize, all of a sudden, we Delphi developer leave in a sort of string beauty world.

Strings in Delphi are a native and easy to use, unlike C or C++. Strings in Delphi have copy-on-write semantics. When passing strings as parameters you can pick standard reference counting and copy-on-write, opt for pass by reference, or choose pass-by-const to avoid the overhead of reference counting (and that's really a micro-optimization, but so nicely built into the language that it is worth using it). Strings concatenation in Delphi is so fast that the new optimized StringBuilder class in Delphi 2009 cannot beat it. Strings in Delphi 2009 are fully UTF-16 enabled, with a minimal overhead you can get rid of with a compiler setting. Strings in Delphi are really great.

What about other languages? Scripting langauge lack the speed of the compiled code for string manipulation (but have much larger libraries... when will we have a large string processing library in the core Delphi RTL?), garbage collected languages don't like concatenations and offer much less flexibility in terms of semantics, C and C++ require low-level pointer-based and far from safe code...

Is there any language out there that matches Delphi strings in terms of both native implementation, readability, flexibilitym and speed? I really doubt, but might be wrong, of course. We are so used to the power of the string type we have in Delphi that we often forget how great it is.

 





 

15 Comments

Delphi Super-Duper Strings 

I certainly don't know of easier and faster string
handling.  If someone does, I hope they back it up
with code.
Comment by Bruce McGee on February 2, 21:07

Delphi Super-Duper Strings 

Not counting the many ways a developer can burn 
himself because of the "backstage" work the compiler 
is doing :) the "out" variables are just one example 
of this.
Comment by alex [http://alex.ciobanu.org] on February 2, 22:01

Delphi Super-Duper Strings 

 Well, for a string library, there was HyperString.
There is a thread "hyperstring and DELPHI 2009" from
last December.
Comment by Tor Helland [http://helland.org/tor-en] on February 2, 22:42

Delphi Super-Duper Strings 

Ciao Marco,

I read that blog and curiosity got the better of me. 
So I coded up the tiny examples in D2007 and the
results where what I thought they would be.  A simple
concat() call on my 3.2Mhz P4 (a few years old) called
a routine to return "stuff" was about 42ms (I used the
inexact GetTickCount).  A TStringList call with a
.AsText took about 422ms.

I've used C++, C, C#, and am now on a Java WebSphere
team, but I still am able to do the bulk of my work in
Delphi since I'm nearly always able to get stuff done
correctly and way faster than anyone else at my
company simply because I use Delphi and have taken the
time to figure out what is going on under the covers.
Comment by Steve McForest on February 2, 22:54

Delphi Super-Duper Strings 

Hi Marco, I have to disagree;

Sure, plain concatenation of small strings might be 
faster than using a stringbuilder(ish) approach, but 
the larger a string becomes, the slower concatenation 
will be. This is, because an ever increasing amount of 
memory needs to be copied around every time the string's allocated storage space is exhausted.

As far as I know, Delphi strings are still allocated 
with a precise-fit. Perhaps the memory-manager below 
it (FastMM nowadays) could anticipate growth in-place, 
but that won't solve the slowdown either.

Just try to imagine what would happen to the memory-
manager if you would re-allocate a string millions of 
times, each time a few characters longer, each time 
undergoing a copy - to me that's just plain 
unacceptable! Especially when the application in 
question has to run 24/7, and/or has to handle huge 
volumes of data.

Note, that TStringBuilder is not a complete solution 
either, as someone called Eric already mentioned on 
your previous blog-entry on this subject 
(http://blog.marcocantu.com/blog/not_so_fast_tstringbu
ilder.html).

A better solution would be to use a block-based 
allocation scheme. The general idea behind this is, to 
prevent copying data altogether, by allocating 
additional storage in blocks of equal size. A nice 
side-effect of this is, that it helps preventing 
memory-fragmentation at the same time!

Also, for anyone looking into this subject, I would 
suggest to also consider a completely different way of 
representing strings, called 'Ropes'. Here's two good 
resources on them : http://www.cs.ubc.ca/local/reading/proceedings/spe91-
95/spe/vol25/issue12/spe986.pdf and http://www.ibm.com/developerworks/library/j-ropes/

Instead of calling Delphi a leading language, I feel 
quite the opposite : I fear we're actually falling 
more and more behind on other languages.
Some examples of techniques & libraries I would love 
to see in Delphi, but are almost impossible to realize 
right now : Reflection, Inversion of Control, Lazy 
Evaluation, Linq, NDepend, StructureMap, etc, etc.

Don't take this the wrong way, but we (Delphi 
developers) shouldn't pat ourselves on the back too 
much!
Comment by Patrick van Logchem [http://www.everyangle.com] on February 3, 00:34

Delphi Super-Duper Strings 

"the new optimized StringBuilder class in Delphi 2009
cannot beat it"

Errr... actually, it's not optimized at all. It's a
just a compatibility class leftover from the
Delphi.Net era.

Algorithmically, it does nothing that String doesn't
do on its own: it maintains a buffer that is grown by
large increments, rather than just by the bytes
needed, however, that's also what you get on regular
String when using FastMM (which is ON by default in
D2009).

But then yes, the point you make about immutable
strings is very valid: their immutability brings no
benefits *most of the time*, and it can bring
performance trouble even in harmless-looking code.
Comment by Eric on February 3, 01:01

Delphi Super-Duper Strings 

Don't worry Marco, I bet the next, new Delphi 
compiler will introduce immutable strings because 
they want "to modify the Delphi front end to enable 
new and modern language features"... just "to make 
the very cool changes that will keep Delphi on the 
cutting edge language technology".
Comment by Luig D. Sandon [http://www.sandon.it] on February 3, 01:17

Delphi Super-Duper Strings 

I don't know about Utf 16 strings (yet), but in ansi 
string land, if you want to optimize your string 
usage, you can take a page from TList's book.

Pre-allocate a string of the expected length and move 
memory buffers around.  If you need to copy more than 
is available in the buffer, SetLength the string even 
larger.

Once done, SetLength to the actual length.  If you 
can accurately predict the string's final length (not 
that hard to do in many cases), you can bring a large 
number of allocations down to 1 or 2, even if you 
miss that mark, if you allocate the string in large 
enough chunks, you can significantly reduce the 
reallocations.

That aside, yes, Delphi Strings Rock (at least they 
do since D2.  I always hated the pascal string limit 
of 255.  It was fine for a learning language, but it 
could seriously hamper working with real world 
situations) - but you forgot one of their greatest 
strengths.

Since Delphi developers leave it up to the efficient 
RTL to handle the heavy lifting instead of manually 
creating char arrays and hoping they are large 
enough, it is nearly impossible to stack smash a 
Delphi app.  (which is not to say you can not corrupt 
the stack or heap if you try hard enough, but you 
REALLY have to try - a more likely cause would be a 
rogue pointer)
Comment by Xepol on February 3, 08:14

Delphi Super-Duper Strings 

 @Luig: so true...
Comment by ajasja on February 3, 11:13

Delphi Super-Duper Strings 

Patrick,
  even at very large sizes string concatenation 
remains very fast because the memory is generally 
reallocated in place (no copy needed) and FastMM does 
an extra very nice job with this scenario.

I didn't claim Delphi is a leading language, but it 
has some great feature that make it more suitable for 
tasks (like producing HTML or XML)for which other 
languages are generally preferred.

And I don't think, as Luigi fears, we'll get immutable 
strings in the near future.

And Eric is right StringBuilder is not really 
optimized, bad choice of words on my side. It 
improveds on memory allocations, but marginally.
Comment by Marco Cantù [http://www.marcocantu.com] on February 3, 12:24

Delphi Super-Duper Strings 

For a lot of concatenations I just use a memory stream
and dump each part in there. It is way faster than
anything else. You can also estimate the final output
size and avoid to many realocations. I missed this
solution in the Atwood blog. Nobody mentioned it. I
suppose string builder does it in a similar way, but I
haven't looked at the code yet.
Comment by Iztok Kacin [http://www.cromis.net/blog] on February 3, 14:20

Delphi Super-Duper Strings 

I use C++ Builder 5 and I found on concatenating 30 
to 100 characters onto the end of a large AnsiString 
got slower once the string had climbed past 10 
megabytes in length. To alleviate the problem, I 
concatenate to a small AnsiString, and once this 
climbs past 100 kilobytes in length, I concatenate 
that to the end of the large AnsiString, and blank 
out the small string , and then continue. At the end 
of the loop I add whatever is in the small string to 
the large string to "flush it out", so to speak. It's 
a kind of buffered technique of concatenation and it 
is very fast indeed. The fastest you can get, though, 
is the C RTL function stpcpy, which returns a pointer 
to the end of the string, so repeated concatenations 
are easy to maintain.

@Luig, I totally agree : the new "advances" in 
programming techniques seem to be a waste of time, or 
seem to only help those who shouldn't be programming 
systems anyway! All this hand-holding gets on my 
nerves, but then, I love C stuff like char 
***myptr ... :-)
Comment by Mark Jacobs [http://jacobsm.com] on February 3, 15:38

Delphi Super-Duper Strings 

Marco,

  if testing string concatenation is the only thing an 
application does, then yes, FastMM should be quite 
capable to prevent copy-actions when reallocating 
memory. (Assuming that once a string occupies the last 
few allocated virtual memory pages, extending this 
with another page should just 'stretch up' the virtual 
memory limit - and as such no copy action for the 
existing part is needed.)

When however you're working in an environment where 
most of the virtual memory is allocated ("committed" 
is the actual term, I believe) with heavy 
fragmentation as a side-effect - Then you'll 
definitely see slowdowns when concatenating to 
already-large strings. Also, EOutOfMemory errors will 
arise, because virtual-memory fragmentation will 
hamper the memory-manager to arrange for ever 
increasing sizes of contiguous memory-ranges.

The best solution for this phenomenon is what I 
suggested above : Try to allocate all your memory in 
even-sized blocks (our app. uses 64Kb blocks for most 
data structures), or switch to a fundamentally 
different string-representation altogether.

Sure, these solutions do have their drawbacks. For 
example, both 'blocked' strings and 'Rope' strings 
need specific logic to enable reading character-
offsets; Ropes need manual reference-counting (unless 
you're really clever with interfaces, Variants or 
whatnot); Both need (implicit or explicit) conversion-
functions from-and-to the native Delphi string type 
(whenever a method needs to be called that needs a 
contiguous array of characters, which is already the 
case with a PChar argument for example).

Writing reliant, high-volume, high-performance, high-
availability and/or real-time software is not very 
common in Delphi, so I can understand you normally 
don't want to go to all this trouble.

But let me tell you : When you do need to fulfill 
these requirements, the very real speed improvements, 
and the fact the virtual memory won't fragment as much 
as using 'normal' string-concatenation, is a big 
enough gain to seriously consider these options.

Just my 5 cents ;-)
Comment by Patrick van Logchem [http://www.everyangle.com] on February 3, 23:07

Delphi Super-Duper Strings 

Keep in mind that there are downsides too. In
comp.lang.delphi.misc some posters made a little list
about the string types of Delphi:

(Possible) Delphi string types and types that auto
converted to them in some form or the other:

- 1 (ansi)char
- 1 wide char,   (both = char and widechar from now on)
- 2 (static) array of both
- 2 dynamic array of both. 
- 2 open array of both, 
- 2 pointer to both, 
- 2 pointer to array of both , 
- 3 short,ansi,widestring types.
- 1 open array shortstring.
----
  16

Note that I didn't test all this. It was just what I
could quickly think up.

Theoretic maximal number of conversions: 16*16 = 256

This is way too high of cousre:
- assumes every conversion is possible (some like open
arrays are possibly readonly), 
- assumes every conversion reversable.  
- also 16 direct assignments to same type are
included, which are less prone to automatic 
   conversion problems.

Some other remarks gathered from the various comments
(IRC and
c.l.p.delphi.misc)

- Note that this even does not include some of these
types in variants!!!!! 
  Variants probably count as one stringtype (in the
overloading sense)
- literals. Are string literals different ? 
	- Are there more types of literals? (e.g. unicode,
pchar typed etc)
	- are there conversions, not over the base string
types between
	literals and e.g. static array of char?


Note that this is all before the gigantic D2009 changes.
Comment by Marco van de Voort [http://www.freepascal.org] on February 6, 16:06

Delphi Super-Duper Strings 

Well, I read all the comments and tend to agree more
with Marco. 
Why ?
Because in 99% ot the cases, we work with small
strings (up to 64k). Large, multimegabyte strings are
a Special Case and should be solved as such. If I had
to generate such string to put it somewhere, I
wouldn`t concatenate at all, but would write
sequentally what needs to be written. We have
something, called TstringList, but one doesn`t have to
concatenate all strings to write `em.
    Ajax response, consisting of 50,000 lines IMHO is
a bad design indicator, because it violates the
purpose AJAX was designed for: Interactive Web
applications, faster than having to reload a web page...
Comment by Angel Genchev [http://lonebrain.net] on March 18, 22:59


Post Your Comment

Click here for posting your feedback to this blog.

There are currently 0 pending (unapproved) messages.