February 2, 2009
Follow my thoughts from strings optimizations to strings in Delphi: we are so used to have power at hand that we forget.
On Friday I was reading Jeff Atwood (of Coding Horror fame) post on micro-optimization, which is basically a post on using strings to create HTML. The core of the article revolves around the problem caused by the fact that "In most garbage collected languages, strings are immutable: when you add two strings, the contents of both are copied." This is certainly true for .NET. Jeff goes on covering Shlemiel the painter, showing how code can be improved, and realizing this his optimizations (for a limited set of string concatenations) made little difference. In the end, Jeff concludes "you should be more worried about the maintainability and readability of your code than its performance."
My first thought was, good luck I use Delphi, which is very fast on string concatenation, so I don't have to worry at all. And I can have code that is readable and fast at the same time. I partially disagree with the fact this optimizations are not relevant when, for example, you start cranking out XML (whcih I tend to do a lot). Think about 500 rows with 30 fields requiring each an open and closed node (so you have three strings for each node), and you are concatenating 50,000 strings. That might be for a single AJAX response...
In any case, what drew my attention was a comment: "Since the html/xml/etc explosion I've been waiting to see the emergence of a string-based language". In fact, there are other comments saying how the Java runtime will automatically optimize string concatenations, others claiming you don't worry about performance when using C or C++... and this made me realize, all of a sudden, we Delphi developer leave in a sort of string beauty world.
Strings in Delphi are a native and easy to use, unlike C or C++. Strings in Delphi have copy-on-write semantics. When passing strings as parameters you can pick standard reference counting and copy-on-write, opt for pass by reference, or choose pass-by-const to avoid the overhead of reference counting (and that's really a micro-optimization, but so nicely built into the language that it is worth using it). Strings concatenation in Delphi is so fast that the new optimized StringBuilder class in Delphi 2009 cannot beat it. Strings in Delphi 2009 are fully UTF-16 enabled, with a minimal overhead you can get rid of with a compiler setting. Strings in Delphi are really great.
What about other languages? Scripting langauge lack the speed of the compiled code for string manipulation (but have much larger libraries... when will we have a large string processing library in the core Delphi RTL?), garbage collected languages don't like concatenations and offer much less flexibility in terms of semantics, C and C++ require low-level pointer-based and far from safe code...
Is there any language out there that matches Delphi strings in terms of both native implementation, readability, flexibilitym and speed? I really doubt, but might be wrong, of course. We are so used to the power of the string type we have in Delphi that we often forget how great it is.
posted by
marcocantu @ 6:38PM | 16 Comments
[0 Pending]
16 Comments
Delphi Super-Duper Strings
I certainly don't know of easier and faster string
handling. If someone does, I hope they back it up
with code.
Comment by Bruce McGee on February 2, 21:07
Delphi Super-Duper Strings
Not counting the many ways a developer can burn
himself because of the "backstage" work the compiler
is doing :) the "out" variables are just one example
of this.
Comment by alex
[http://alex.ciobanu.org]
on February 2, 22:01
Delphi Super-Duper Strings
Well, for a string library, there was HyperString.
There is a thread "hyperstring and DELPHI 2009" from
last December.
Comment by Tor Helland
[http://helland.org/tor-en]
on February 2, 22:42
Delphi Super-Duper Strings
Ciao Marco,
I read that blog and curiosity got the better of me.
So I coded up the tiny examples in D2007 and the
results where what I thought they would be. A simple
concat() call on my 3.2Mhz P4 (a few years old) called
a routine to return "stuff" was about 42ms (I used the
inexact GetTickCount). A TStringList call with a
.AsText took about 422ms.
I've used C++, C, C#, and am now on a Java WebSphere
team, but I still am able to do the bulk of my work in
Delphi since I'm nearly always able to get stuff done
correctly and way faster than anyone else at my
company simply because I use Delphi and have taken the
time to figure out what is going on under the covers.
Comment by Steve McForest on February 2, 22:54
Delphi Super-Duper Strings
Hi Marco, I have to disagree;
Sure, plain concatenation of small strings might be
faster than using a stringbuilder(ish) approach, but
the larger a string becomes, the slower concatenation
will be. This is, because an ever increasing amount of
memory needs to be copied around every time the string's allocated storage space is exhausted.
As far as I know, Delphi strings are still allocated
with a precise-fit. Perhaps the memory-manager below
it (FastMM nowadays) could anticipate growth in-place,
but that won't solve the slowdown either.
Just try to imagine what would happen to the memory-
manager if you would re-allocate a string millions of
times, each time a few characters longer, each time
undergoing a copy - to me that's just plain
unacceptable! Especially when the application in
question has to run 24/7, and/or has to handle huge
volumes of data.
Note, that TStringBuilder is not a complete solution
either, as someone called Eric already mentioned on
your previous blog-entry on this subject
(http://blog.marcocantu.com/blog/not_so_fast_tstringbu
ilder.html).
A better solution would be to use a block-based
allocation scheme. The general idea behind this is, to
prevent copying data altogether, by allocating
additional storage in blocks of equal size. A nice
side-effect of this is, that it helps preventing
memory-fragmentation at the same time!
Also, for anyone looking into this subject, I would
suggest to also consider a completely different way of
representing strings, called 'Ropes'. Here's two good
resources on them : http://www.cs.ubc.ca/local/reading/proceedings/spe91-
95/spe/vol25/issue12/spe986.pdf and http://www.ibm.com/developerworks/library/j-ropes/
Instead of calling Delphi a leading language, I feel
quite the opposite : I fear we're actually falling
more and more behind on other languages.
Some examples of techniques & libraries I would love
to see in Delphi, but are almost impossible to realize
right now : Reflection, Inversion of Control, Lazy
Evaluation, Linq, NDepend, StructureMap, etc, etc.
Don't take this the wrong way, but we (Delphi
developers) shouldn't pat ourselves on the back too
much!
Comment by Patrick van Logchem
[http://www.everyangle.com]
on February 3, 00:34
Delphi Super-Duper Strings
"the new optimized StringBuilder class in Delphi 2009
cannot beat it"
Errr... actually, it's not optimized at all. It's a
just a compatibility class leftover from the
Delphi.Net era.
Algorithmically, it does nothing that String doesn't
do on its own: it maintains a buffer that is grown by
large increments, rather than just by the bytes
needed, however, that's also what you get on regular
String when using FastMM (which is ON by default in
D2009).
But then yes, the point you make about immutable
strings is very valid: their immutability brings no
benefits *most of the time*, and it can bring
performance trouble even in harmless-looking code.
Comment by Eric on February 3, 01:01
Delphi Super-Duper Strings
Don't worry Marco, I bet the next, new Delphi
compiler will introduce immutable strings because
they want "to modify the Delphi front end to enable
new and modern language features"... just "to make
the very cool changes that will keep Delphi on the
cutting edge language technology".
Comment by Luig D. Sandon
[http://www.sandon.it]
on February 3, 01:17
Delphi Super-Duper Strings
I don't know about Utf 16 strings (yet), but in ansi
string land, if you want to optimize your string
usage, you can take a page from TList's book.
Pre-allocate a string of the expected length and move
memory buffers around. If you need to copy more than
is available in the buffer, SetLength the string even
larger.
Once done, SetLength to the actual length. If you
can accurately predict the string's final length (not
that hard to do in many cases), you can bring a large
number of allocations down to 1 or 2, even if you
miss that mark, if you allocate the string in large
enough chunks, you can significantly reduce the
reallocations.
That aside, yes, Delphi Strings Rock (at least they
do since D2. I always hated the pascal string limit
of 255. It was fine for a learning language, but it
could seriously hamper working with real world
situations) - but you forgot one of their greatest
strengths.
Since Delphi developers leave it up to the efficient
RTL to handle the heavy lifting instead of manually
creating char arrays and hoping they are large
enough, it is nearly impossible to stack smash a
Delphi app. (which is not to say you can not corrupt
the stack or heap if you try hard enough, but you
REALLY have to try - a more likely cause would be a
rogue pointer)
Comment by Xepol on February 3, 08:14
Delphi Super-Duper Strings
@Luig: so true...
Comment by ajasja on February 3, 11:13
Delphi Super-Duper Strings
Patrick,
even at very large sizes string concatenation
remains very fast because the memory is generally
reallocated in place (no copy needed) and FastMM does
an extra very nice job with this scenario.
I didn't claim Delphi is a leading language, but it
has some great feature that make it more suitable for
tasks (like producing HTML or XML)for which other
languages are generally preferred.
And I don't think, as Luigi fears, we'll get immutable
strings in the near future.
And Eric is right StringBuilder is not really
optimized, bad choice of words on my side. It
improveds on memory allocations, but marginally.
Comment by Marco Cantù
[http://www.marcocantu.com]
on February 3, 12:24
Delphi Super-Duper Strings
For a lot of concatenations I just use a memory stream
and dump each part in there. It is way faster than
anything else. You can also estimate the final output
size and avoid to many realocations. I missed this
solution in the Atwood blog. Nobody mentioned it. I
suppose string builder does it in a similar way, but I
haven't looked at the code yet.
Comment by Iztok Kacin
[http://www.cromis.net/blog]
on February 3, 14:20
Delphi Super-Duper Strings
I use C++ Builder 5 and I found on concatenating 30
to 100 characters onto the end of a large AnsiString
got slower once the string had climbed past 10
megabytes in length. To alleviate the problem, I
concatenate to a small AnsiString, and once this
climbs past 100 kilobytes in length, I concatenate
that to the end of the large AnsiString, and blank
out the small string , and then continue. At the end
of the loop I add whatever is in the small string to
the large string to "flush it out", so to speak. It's
a kind of buffered technique of concatenation and it
is very fast indeed. The fastest you can get, though,
is the C RTL function stpcpy, which returns a pointer
to the end of the string, so repeated concatenations
are easy to maintain.
@Luig, I totally agree : the new "advances" in
programming techniques seem to be a waste of time, or
seem to only help those who shouldn't be programming
systems anyway! All this hand-holding gets on my
nerves, but then, I love C stuff like char
***myptr ... :-)
Comment by Mark Jacobs
[http://jacobsm.com]
on February 3, 15:38
Delphi Super-Duper Strings
Marco,
if testing string concatenation is the only thing an
application does, then yes, FastMM should be quite
capable to prevent copy-actions when reallocating
memory. (Assuming that once a string occupies the last
few allocated virtual memory pages, extending this
with another page should just 'stretch up' the virtual
memory limit - and as such no copy action for the
existing part is needed.)
When however you're working in an environment where
most of the virtual memory is allocated ("committed"
is the actual term, I believe) with heavy
fragmentation as a side-effect - Then you'll
definitely see slowdowns when concatenating to
already-large strings. Also, EOutOfMemory errors will
arise, because virtual-memory fragmentation will
hamper the memory-manager to arrange for ever
increasing sizes of contiguous memory-ranges.
The best solution for this phenomenon is what I
suggested above : Try to allocate all your memory in
even-sized blocks (our app. uses 64Kb blocks for most
data structures), or switch to a fundamentally
different string-representation altogether.
Sure, these solutions do have their drawbacks. For
example, both 'blocked' strings and 'Rope' strings
need specific logic to enable reading character-
offsets; Ropes need manual reference-counting (unless
you're really clever with interfaces, Variants or
whatnot); Both need (implicit or explicit) conversion-
functions from-and-to the native Delphi string type
(whenever a method needs to be called that needs a
contiguous array of characters, which is already the
case with a PChar argument for example).
Writing reliant, high-volume, high-performance, high-
availability and/or real-time software is not very
common in Delphi, so I can understand you normally
don't want to go to all this trouble.
But let me tell you : When you do need to fulfill
these requirements, the very real speed improvements,
and the fact the virtual memory won't fragment as much
as using 'normal' string-concatenation, is a big
enough gain to seriously consider these options.
Just my 5 cents ;-)
Comment by Patrick van Logchem
[http://www.everyangle.com]
on February 3, 23:07
Delphi Super-Duper Strings
Keep in mind that there are downsides too. In
comp.lang.delphi.misc some posters made a little list
about the string types of Delphi:
(Possible) Delphi string types and types that auto
converted to them in some form or the other:
- 1 (ansi)char
- 1 wide char, (both = char and widechar from now on)
- 2 (static) array of both
- 2 dynamic array of both.
- 2 open array of both,
- 2 pointer to both,
- 2 pointer to array of both ,
- 3 short,ansi,widestring types.
- 1 open array shortstring.
----
16
Note that I didn't test all this. It was just what I
could quickly think up.
Theoretic maximal number of conversions: 16*16 = 256
This is way too high of cousre:
- assumes every conversion is possible (some like open
arrays are possibly readonly),
- assumes every conversion reversable.
- also 16 direct assignments to same type are
included, which are less prone to automatic
conversion problems.
Some other remarks gathered from the various comments
(IRC and
c.l.p.delphi.misc)
- Note that this even does not include some of these
types in variants!!!!!
Variants probably count as one stringtype (in the
overloading sense)
- literals. Are string literals different ?
- Are there more types of literals? (e.g. unicode,
pchar typed etc)
- are there conversions, not over the base string
types between
literals and e.g. static array of char?
Note that this is all before the gigantic D2009 changes.
Comment by Marco van de Voort
[http://www.freepascal.org]
on February 6, 16:06
Delphi Super-Duper Strings
Well, I read all the comments and tend to agree more
with Marco.
Why ?
Because in 99% ot the cases, we work with small
strings (up to 64k). Large, multimegabyte strings are
a Special Case and should be solved as such. If I had
to generate such string to put it somewhere, I
wouldn`t concatenate at all, but would write
sequentally what needs to be written. We have
something, called TstringList, but one doesn`t have to
concatenate all strings to write `em.
Ajax response, consisting of 50,000 lines IMHO is
a bad design indicator, because it violates the
purpose AJAX was designed for: Interactive Web
applications, faster than having to reload a web page...
Comment by Angel Genchev
[http://lonebrain.net]
on March 18, 22:59
Delphi Super-Duper Strings
Hi Marco,
I just want to know if something like this will be added soon in
Delphi :
var longstring :=
' SELECT *
FROM CUSTOMERS C
JOIN SALES S ON C.CUST_ID = S.CUST_ID
WHERE C.STATE LIKE "%NY%"
';
That will make a lot of people happy. Imagine HTML, JSON,
XML and others using a quite simple way to concatenate large
strings.
Comment by Miguel Guzman on September 21, 03:26
Post Your Comment
Click
here for posting
your feedback to this blog.
There are currently 0 pending (unapproved) messages.