June 14, 2010
At a recent event, a person mentioned he had problems with the debugger with a very large unit. I tried to reproduce it...
At a recent event, a person mentioned he had problems with the debugger with a very large unit (as you can see in the questions).
Although I'm not sure why someone would ever want to have over 64,000 lines of Delphi code in a single unit (a single file, for the non-Delphi developers out there), I was puzzled about the request. So I wrote a very large unit like that. Well, that's not entirely true, I wrote a program I used to write a large unit, with 20,000 functions for adding numbers (all with the same signature). Than I added the unit to the same program, called a function, and palced a breakpoint. In fact the developer asking the question was mentioning debugging problems. Here is the large unit in the debugger, stopped at line 81,732 (out of 120,011 lines):
I have not tried to compile and run the same application in Delphi 2007 (the version against which the problems was reported), but 2010 seems to handle it. Yes, opening the unit is not terribly fast, as the IDE has to parse it. As an experiment I tried to create and compile a unit with 200,000 similar functions, but the compiler started using the system memory and CPU at a very high rate and the compiled lines count after a while was moving versy slowly... so eventually I gave up compiling a unit with over a million lines of source code.
My original question remains, though. Why would one developer have such a large unit? Which are the largest units (in terms of lines of source code) you have?
posted by
marcocantu @ 10:12AM | 22 Comments
[0 Pending]
22 Comments
Hundred Thousand Lines in a Unit
Well, either you develop properly, or messy.
The latter is the case here.
The fact that Delphi does not encourage developing OO,
nor separation of concerns is another reason.
Comment by master on June 14, 10:47
Hundred Thousand Lines in a Unit
My personal largest is 31k lines. I see that
Windows.pas in D6 also has 31k lines, in D2010 it has
34k lines. DirectShow9 and MSHTML are similar.
I've found a Project2000.pas with 46k lines - this is
an OCX import unit.
Comment by David Heffernan on June 14, 10:52
Hundred Thousand Lines in a Unit
I just checked the library my company developed over
more than 10 years (since Turbo Pascal..) to control
packaging machines, and the largest unit we have is
about 16'000 lines long. It contains about 20 classes
that define widgets in our custom HMI. And I just
noticed the source code of Delphi 6 system.pas is
18'000 lines long.
When I started coding in Delphi coming from the C++
world, I was surprised to see such large units and
didn't understand at once why one would define more
than one class per unit. Now I think it's because of
the "uses" mechanism that isn't "transitive" : if A
uses B and B uses C, then C isn't automatically
visible from A. Therefore, if you build a unit for
each class in a complex project, you'll need tons of
uses statements across units, while in C++ you'll need
only a few includes to get all the definitions you need.
Over now 2 years I noticed that I tend to add new
classes to existing units rather than define a new
unit just because of the tedious job of guessing all
the required "uses". Once I spent literally days in
splitting a large unit in 4 smaller parts, I won't do
it again.
This tendency to large units is of course limited by
Delphi's forms, but I think another way to encourage
smaller units would be to make "uses" transitive, at
least for "uses" located in the definition.
Comment by Dr. Goulu
[http://drgoulu.com]
on June 14, 11:00
Hundred Thousand Lines in a Unit
We had something like that at my work; it was a data
access layer. There were seven classes per table and
like 500 tables. Each class had multiple functions.
(I didn't write it.)
But yeah, I'm assuming it was some sort of codegen. Like
yours to test it! :)
Comment by Anthony Mills
[http://amills.net/]
on June 14, 11:32
Hundred Thousand Lines in a Unit
Well, if you use a code generator, it's quite easy to
get a lot of code. I've created several code
generators myself to maintain database objects, read
configuration files and to do a few other simple
tasks. But tens of thousands of lines? I've seen such
an unit once, which was basically a garbage collection
of any function that the previous developer couldn't
deal with, correctly. So he kept them all in a single
unit, which included quite a few functions with very
similar functionality.
If the code was created by a code generator, tough...
They do stay difficult to debug. But if that much code
was generated manually, the developer who wrote it
will definitely need to find another job that's not IT
related... It's real bad programming, since you'll
easily lose track of your code.
The biggest files I have now is over 12.000 lines and
it's a huge collection of related classes that wrap
around a database. Unfortunately, If I had written it,
it would be maintained by a code generator.
Unfortunately, someone else wrote it manually and it
contains around 100 different classes. One for every
table used by the application...
Comment by W.A. ten Brink on June 14, 11:38
Hundred Thousand Lines in a Unit
My largest unit has about 8k lines. It's quite old and
ugly code, though I already managed to put 15k lines
into other separate units.
I do have 3rd party units IBObjects and VirtualTrees
which have units with 46k and 31k lines.
Comment by Noz on June 14, 11:53
Hundred Thousand Lines in a Unit
I can think of any reason for such extreme unit size.
Maybe this unit is generated by code ?
I hate large units, largest unit in our codebase has
1938 lines.
Comment by Antonio on June 14, 11:57
Hundred Thousand Lines in a Unit
I have a machine-generated file (a
database-table/record access class hierarchy I have
made myself) that is 48500 lines long.
The largest self-written unit (where I have written
all the code myself) is around 20.000 lines.
Comment by HeartWare
[http://www.heartware.dk]
on June 14, 13:01
Hundred Thousand Lines in a Unit
I would expect that most of these gargantuan units are
machine-generated. For instance, the largest unit I've
got hanging around is a TLB-import unit for a third
party COM library (including component wrappers) which
has 101k LOC.
Comment by Oliver Giesen
[http://ogware.wordpress.com]
on June 14, 14:41
Hundred Thousand Lines in a Unit
"but the compiler started using the system memory and
CPU at a very high rate and the compiled lines count
after a while was moving versy slowly"
Sounds like the compiler relies on linear lookup
mechanisms. Perhaps it's time to switch to a
hash-based approach? :)
Comment by Moritz Beutel
[http://audacia-software.de]
on June 14, 14:51
Hundred Thousand Lines in a Unit
The largest non-generated unit I've got on my disk is
VirtualTrees.pas from Mike Lischke's VirtualTreeView at
~36k LOC. The largest unit I wrote entirely by myself
weighs in at a modest ~3K LOC but it's still a mess that
I have no excuse for... o;)
Comment by Oliver Giesen
[http://ogware.wordpress.com]
on June 14, 14:51
Hundred Thousand Lines in a Unit
There may be some issues with units containing >64K
lines, as some parts of the compiler store line
offsets in 2-byte words.
A bigger problem is the slowness you saw, which is
probably most evident in practice in Windows.pas.
Creating a unit with lots of symbols defined at the
same scope (global methods / vars / consts / types,
fields / methods / properties of a class, etc.)
exposes an assumption of the design of the compiler's
symbol table handling: the hash tables are limited to
256 buckets, and indeed the hash code for all
identifiers is only 1 byte. Field scopes, meanwhile,
have only 16 buckets. And these are simple chained
hash tables - each bucket points to a linked list.
The assumption is that most human-written units will
only have a few hundred top-level definitions, while
classes, records, enumerations etc. will only have a
few dozen fields, etc. Since the hash codes are only a
single byte (they are an undifferentiated unsigned
char C type throughout the compiler, not a simple
typedef to widen), the hash tables turn into simple
linked list searches at the extreme.
Comment by Barry Kelly
[http://blog.barrkel.com/]
on June 14, 15:47
Hundred Thousand Lines in a Unit
the largest non-generated unit I created for my
project is of 22265 lines.
Comment by Prakash Shirodkar on June 14, 17:44
Hundred Thousand Lines in a Unit
At least older Delphi-Debuggers could not handle
breakpoints in units > 65535 lines, but compilers had
no problem.
Also circular type references in the OO model can lead
to such large units in very complex programs, just as
1000 lines might be a minimum in a well-designed
simple program, the demand scales up indefinitely. Not
only a 1000 lines unit can become ugly by further
splitting, the complexity of the given model is the basis.
Comment by MarkB on June 14, 18:00
Hundred Thousand Lines in a Unit
Ditto on code generator imports of DB structures.
Works great, but the units can get large very quickly.
Objects for rows, sets, stored procedures etc - its
great, you get compiler insight for DB structures.
That said, I don't debug into them - either they all
work fine OR they ALL break. That's the great part
about code generation. Once you get it working, you
can trust 100,000 lines of code as easily as 1000.
If anything goes wrong then, it is either how you use
it or the source it was generated from.
After that, I work hard at breaking my code into more
logical units, but I have had units that easily hit
20,000 lines before seperating them, and then they
fork. If you never seperated them, I can see a
100,000 line unit. It would be hell to debug and
maintain tho.
Comment by Xepol on June 14, 21:36
Hundred Thousand Lines in a Unit
I'm surprised to hear that limit is STILL there?!?
Comment by Delfi Phan on June 15, 08:31
Hundred Thousand Lines in a Unit
Hi Marco, guys,
I do NOT wish to turn this into a good/bad programming
technique, but having that many lines of code in a
single unit cannot be a good thing when it comes to
debugging or modifying something in it, I am more than
certain that it could have easily been broken into
10-20 units with way less lines of code.
From my experience I've learned that large units tend
to grow larger and larger, therefore as a precaution I
try to limit the units to a single class OR few methods
depending on their scope.
A good example:
- for database table manipulation I have a generic
class from which I create other classes(1 for each
table), each of this units are ~1.000-2.000 MAX.
- for utility functions I try to split the methods in
several units depending on their scope.
Comment by Dorin Duminica
[http://www.delphigeist.com]
on June 15, 12:54
Hundred Thousand Lines in a Unit
Adding to my earlier post (circular type references
in OO model interfaces), here some options to keep
the original model:
a) support of larger unit (not available)
b) support of circular unit references (not
available), splitting would be easy
c) quick "hack" to make the unit a "bit" larger:
put unit-interface into inc-file
d) scalable unit splitting: replace strong-type
references by base-type references in interfaces.
(type-casting, type-checking at runtime; act of
desperation, but scalable solution)
Comment by MarkB on June 15, 13:34
Hundred Thousand Lines in a Unit
I have a project were this behaviour is exhibited, this
unit is around 28k lines long. Unfortunately it is my
versioning engine and contains alot sql scripts so I
have been unable to cull it and this is under D7 but as
we are migrating to D2010 I am ready to kill it all
very soon :). The issue I believe in my case is
debugger is getting confused with some invalid
characters even the though everything compiles ok. The
debugger is just out by a few lines but makes it
virtual impossible to debug with breakpoints.
cheers
Dave.
Comment by on June 17, 00:27
Hundred Thousand Lines in a Unit
my delphi 2007 app (non unicode) that I have going has
over 100 thread units that are for the most part under
2000 lines of code each... what I am worried about is
the fact that my main code calling thread has 64,000
loc and climbing... I have found problems using the
{$I} included directive as during code design the
declarations don't get added to the autocomplete
system it seems so I have been adding most of my code
in a single file. eash..... I have well over 8000
lines of procedure declarations and variable code.
Talk about a pile that just gets bigger and bigger.
I have noticed that super slow crawling of code
compiling where it can take about 20 seconds per line
for the first 100 lines or so then after it will train
though like normal.... just the first 100 or so seem
to take sooo long. weird.
Comment by Lenn Dolling
[http://skyboardsoftware.com]
on July 15, 03:54
Hundred Thousand Lines in a Unit
Indy's largest unit, IdWinSock2.pas @ 326K, has 9018
LOC in it. Its 2nd largest unit, IdGlobal.pas @ 294K,
has 9186 LOC.
Comment by Remy Lebeau
[http://www.lebeausoftware.org]
on October 1, 18:20
Hundred Thousand Lines in a Unit
The biggest unit I have here is a unit generated by
delphi itself,. it's a "XML binding" unit generated
from an xml file for the purpose of phone/xml
communication and it has 197733 lines of code.
I can't make it smaller if i wanted to,. we didn't
write the xml protocol that needs to be supported.
Comment by Thomas on October 5, 10:31
Post Your Comment
Click
here for posting
your feedback to this blog.
There are currently 0 pending (unapproved) messages.