Archive for September, 2008

Released: Castalia 2008.3

Posted on September 12th, 2008 in Uncategorized | No Comments »

Castalia 2008.3 is now available.

Current customers can get it at

Everyone else can try a free trial at

What’s new in Castalia 2008.3:

* Support for Delphi 2009 IDE

* Parser Support for Unicode source files in Delphi 2009

* Bugfix: Inline variable declaration fires on Ctrl+Space

* Bugfix: Embedded text search doesn’t take selected text

* Bugfix: “Rename Parameter” refactoring doesn’t work for procedures
and functions that are not members of a class: The parameter  isn’t
renamed in the method declaration.

* Keyboard hints in embedded text search can be turned off

* Bugfix: “Extract Method” and other refactorings format code with a
begin/end/else/if/end structure incorrectly.

* Improvements in member name recognition for the “Eliminate With”

* Improved identifier resolution for all refactorings

* Improved handling of compiler “define” directives in parser

* Support for parameterized type (aka “generics”) syntax

* Support for anonymous method syntax

Preparing for Delphi 2009: Part 4

Posted on September 9th, 2008 in Castalia, CodeGear, Delphi | 4 Comments »

Over the last few days, I’ve written about things to look for in your code to be prepared for Delphi 2009.  In today’s installment, I’m going to discuss a few Windows API calls that gave me a little trouble when porting Castalia to Delphi 2009.


The GetProcAddress API call is used to find the address of an exported function in a DLL.  If you’re using it, it probably looks something like this:

procAddr := GetProcAddress(Handle, PChar(‘SomeFunctionName’));

We run into trouble with that last parameter.  GetProcAddress expects an ASCII string.  That is, an array of bytes, not words.  In Delphi 2009, the above code will fail, because you’ll get a pointer to a UTF-16 string.  Here’s the way it should look now:

procAddr := GetProcAddress(Handle, PAnsiChar(AnsiString(‘SomeFunctionName’)));

The typecast to AnsiString ensures that the string is a string of bytes, not words.  Then the PChar cast is changed to PAnsiChar for completeness.


ToAscii is used to convert a keyboard state into an ASCII character.  It’s usually called from a KeyDown or KeyUp event handler.  Here’s a typical use:


I := ToAscii(KeyCode, KS, @TempChar, 0);

Most of the parameters of ToAscii are beyond the scope of this post, but the point is that it takes the current keyboard state (represented by KS and KeyCode) and turns it into the ASCII character which should be displayed.

Of course, now that a Char is no longer limited to Ascii (and the size of the Char data type has changed), ToAscii isn’t really going to work any more.  I have to admit I was a bit surprised to discover the solution – the ToUnicode API call:

 I := ToUnicode(KeyCode, 0, KS, TempChar, 1, 0);

There are a few more parameters involved here (and note that TempChar is no longer referenced by pointer).  Again, most of them don’t matter, but the usage above will work for most cases.  If you’re using ToAscii in your code, you’re going to want to replace it with ToUnicode.


The MultiByteToWideChar function maps a string (often an ASCII or UTF-8 string) to a WideString.  Now that the default string type is unicode, and all string types are compatible by assignment, calls to MultiByteToWideChar can be replaced by simple assignment:

//Old code

MultiByteToWideChar(0, 0, pchar(sourceString), Length(S), PWideChar(targetString), Length(targetString));

//New code

targetString := sourceString;

Any API call that works with strings is probably worth examining for correctness, since most API calls that require a string also require the length of that string.  You should check whether it expects the length of the string in bytes or in WCHARs, and even if it might expect one form of string or another.

One note though: Delphi’s included Windows API units include overloaded versions of most of the API routines that involve strings, so you can call them with either a string or an AnsiString, and they’ll still work.  As I said before, 99.9% of your code is likely to simply compile and work without any changes.  That’s still true.

Delphi 2009 and Unicode

Posted on September 8th, 2008 in Castalia, CodeGear, Delphi | 8 Comments »

Alright, I keep getting questions about what the new 16-bit Char type (and its associated UnicodeString) really mean.  Let’s take a couple minutes off from the “Preparing for Delphi 2009″ series and discuss exactly how this works.

The core issue here is that really, a character according to the unicode standard isn’t 16 bits, it’s 32 (ok, actually it’s 21, but that’s not a normal data size, so we use 32).  Since you obviously can’t fit 32 bits (or even 21 bits) into a 16-bit data type, what is going on with this 16-bit Char type?

In the unicode world, there are two possibilities here.  The first is called UCS-2, which basically means that only a subset of the entire character set is allowed.  That is, you can only use the characters that will fit into 16 bits.

The other possibility is UTF-16, which uses a 16-bit data type, and has a mechanism for splitting a larger character into two of these 16-bit chunks.  When this happens, each chunk is called a Surrogate, and the two of them together is called a Surrogate Pair.

So, let’s settle this once and for all: Delphi 2009 uses UTF-16.  It allows the entire unicode character set, using surrogate pairs for the characters that take more than 16 bits to represent.

Before I get to some specifics, there are a couple of terms we should make sure we have straight:

In unicode, a character is called a Code Point.  The letter ‘A’, an exclamation mark, a space, a line feed, and any other “thing” that is represented as part of the unicode “character set” (called the Code Space) is a code point.

On the other hand, that “chunk” of data – 16-bits in UTF-16 – is called a Code Unit.  If you have a code point that won’t fit into 16 bits, you’ll need to combine two code units to form a surrogate pair.

In Delphi terms, the new 16-bit Char data type represents a code unit, not a code point.  So, when I wrote last week that Length(myString) returns the number of printable characters in myString, that could have been a little misleading.  Length(myString) returns the number of code units in the string.  If some of those code units are surrogates, the number of code points you see on screen may not be the same as the number of code units in the string.

Now for a couple of frequently (and anticipated) asked questions:

Q: Is it really safe to assume that the size of a string is Length(myString) * 2?

A: If by “size,” you mean “size in bytes,” yes.  That will give you the size of the string in bytes, because Length(myString) tells you the number of code units, which are always 2 bytes.  I would, however, suggest that you not use the magic number 2, but rather SizeOf(Char), because 1. it’s more readable, and 2. it’s not inconcievable that the size of the Char data type could change again in the future – use SizeOf(Char) and you’re already ready for it.

If by “size” you mean something else, keep reading…

Q:  How do I get these characters into my strings?

A: The easiest way is just to include them in your source code.  Since the compiler and code editor are fully unicode enabled, you can just put the character into your source code.  However, that isn’t always easy if you’re using a keyboard that isn’t really designed for the characters you’re using, and isn’t always the most readable solution either.  There is another way:

Delphi 2009 includes a new unit in the RTL called Character.pas.  It has a bunch of utility functions (and a utility class) to help with these conversions.  Let’s say you want a string that has the codepoint $20086 in it.  You could do the math to figure out the surrogate pair and do S := Chr($D840) + Chr($DC86); or you could use the ConvertFromUtf32 function from Character.pas: S := ConvertFromUTF32($20086);

Both will give you the same result, but ConvertFromUtf32 is certainly easier to use.

Note that if you do ShowMessage(S), you’ll see only one character on the screen, but that Length(S) will return 2, since there are two code units used to represent the one character.

Q: How do I determine the number of code points in a string with surrogate pairs, instead of the number of code units?

A: SysUtils.pas has some helper functions for things like this.  In this case, we could do I := ElementToCharLen(myString, Length(myString)); and I would contain the number of code points in the string.

Hopefully this will answer some of the questions that have come up.  If there are things that still aren’t clear, feel free to comment, and I’ll do my best to clear things up.

Preparing for Delphi 2009: Part 3

Posted on September 8th, 2008 in Castalia, CodeGear, Delphi | 1 Comment »

Last week, I blogged about the changes to the Delphi string type and how it might affect some of your memory management code.  Those were just warmups for today.

Today I’m writing about TStream and Delphi 2009.  Reading and writing streams was the biggest source of issues in porting my code to Delphi 2009.  I suspect that it will be the same for most of you.  Hopefully today’s post will help you prepare your code ahead of time.

Again, the root of the problem lies in the fact that most of the time when we write code to read or write streams, we assume that a Char is one byte, and a string’s length in Chars is the same as its size in bytes.  Since this isn’t true any more, our code that assumes that it is may be broken.

Note that the following is true for ALL stream classes, whether it’s TMemoryStream, TFileStream, or some other TStream implementation.


TStream.Write expects the number of BYTES to write to the stream, not the number of Chars. The following code is very common, but incorrect:

Stream.Write(Pointer(myString)^, Length(myString));

This code will compile without complaint in Delphi 2009, but it won’t do what you probably wanted it to, which is write the whole string to the stream.  Your first instinct might be to replace Length(myString) with SizeOf(myString) but that won’t work either, since the SizeOf(someString) is always 4 (it’s just a pointer, remember?).  Generally, we should use the Length * SizeOf construct that I disliked a couple days ago:

Stream.Write(Pointer(myString)^, Length(myString) * SizeOf(Char));

I’d like to point out that this code is 100% backwards compatible with prior versions of Delphi.  It behaves correctly in Delphi 5, when SizeOf(Char) was 1, and it behaves correctly in Delphi 2009 when SizeOf(Char) is 2.  This is particularly important in Castalia, which uses the same code base to compile for the last six versions of Delphi.

I said that this solution works generally, but there are instances when you may want to do something else… Specifically, if you want to write in some encoding other than UTF-16.  We’ll leave that for later though (hint: the answer involves a new class called TEncoding).


Of course, if you change your stream write code, you’ll probably need to change your stream read code. It all depends, of course, on how the stream is written. A typical pattern is to write the length of the string, then the string:

L := Length(myString);

Stream.Write(L, SizeOf(Integer));

Stream.Write(Pointer(myString)^, Length(myString) * SizeOf(Char));

Now, the old way to read this will look like this:

Stream.Read(L, SizeOf(Integer));

SetLength(myString, L);

Stream.Read(pointer(myString)^, L);

But this won’t work, as it will only read L bytes, which is going to be half the string when SizeOf(Char) is 2.  I’m sure by now you’re already a step ahead of me on the solution:

Stream.Read(L, SizeOf(Integer));

SetLength(myString, L);

Stream.Read(pointer(myString)^, L * SizeOf(Char));

The first two lines were fine – reading an Integer isn’t affected by the change in the size of a Char, and SetLength takes the number of Char elements in the string, just as it always has.  Reading the string from the stream, however, we need to make sure we’re telling the stream object how many BYTES to read, not how many CHARS.

Once again, the general rule holds: If the routine deals specifically with strings, it expects the number of CHARS to use.  If it works with general memory buffers, it expects the number of BYTES to use.  The trick here is just translating between CHARS and BYTES in appropriate places.

As I said at the beginning, I’m predicting that stream reading and writing is going to be the single most common cause of issues when porting code to Delphi 2009.  The simplest way to go about it is to do what I’ve noted in this post.  We’ll come back to a couple of other solutions in a few days, but tomorrow I’m going to talk about a couple of Windows API calls that you might be using that will need a bit of work.

Preparing for Delphi 2009: Part 2

Posted on September 6th, 2008 in Castalia, CodeGear, Delphi | No Comments »

Yesterday, I wrote about the new UnicodeString type in Delphi 2009, and the fact that the Char type is now a 2-byte Char instead of a 1-byte Char.  I wrote about how this can affect calls to SizeOf and Length, among other things.

While porting my code to Delphi 2009, I found a few instances where the changes introduced some memory management issues.  Of course, the root cause of the problems were that my code assumed that SizeOf(Char) = 1, which is no longer true.

When that change was made, some decisions had to be made about certain parts of the VCL (or, more correctly, the RTL) and what parameters they would expect.  Most of Delphi’s memory management routines expects parameters in numbers of Bytes, not Chars:


A common way to use FillChar has long been FillChar(memoryBuffer, Length(memoryBuffer), 0).  This should fill memoryBuffer with zeroes.  However, if memoryBuffer is an array of Char, this won’t work any more.  It will only fill half of the array, because Length() returns the number of elements in the array, not the size of the array in bytes.  The solution is to use SizeOf() instead of Length():

FillChar(memoryBuffer, SizeOf(memoryBuffer), 0);

Also, note that FillChar fills memoryBuffer with BYTES, not CHARS, even if the buffer is an array of Char.  If your code reads FillChar(memoryBuffer, SizeOf(memoryBuffer), #36), your chars will be $3636, not $36 (AKA $0036) as you might have intended.  To get a UnicodeString full of #36, you’ll need to use StringOfChar().  The analogous code to the above call of FillChar() is as follows:

StrPCopy(memoryBuffer, StringOfChar(#36, Length(memoryBuffer)));

Note that StringOfChar() takes the number of Char elements, not the size in bytes (hint for remembering: if the routine is specifically for strings, it probably takes the number of Char elements.  If it’s a generic memory-management routine, it probably takes the number of bytes).


Move() can have the same problems as FillChar(). If you’re using Move() with Char arrays, Length() won’t work the way it used to:

Move(charArray1, charArray2, Length(charArray1));

Move is a generic memory management routine, so it expects the number of BYTES to move, not the number of Chars.  Here’s the right way:

Move(charArray1, charArray2, SizeOf(CharArray1));

Alternatively, you could do Move(charArray1, charArray2, Length(charArray1) * SizeOf(Char)), but I think that’s unnecessary.  Of course, if you feel it’s more readable, it will work just as well.


Copy() is related to Move(), though it’s aimed specifically at strings and arrays.  This means that when using Copy(), you should pass in the INDICES of the elements (probably CHAR elements, if they’re potentially strings), and NOT the byte offset of the elements you want to copy.  This isn’t as likely to be an issue, but it did bite me once while porting Castalia to Delphi 2009.

I hope that’s enough for one day.  On Monday, we’ll look at how TStream descendants might present some problems.

Preparing for Delphi 2009: Part 1

Posted on September 5th, 2008 in Castalia, CodeGear, Delphi | 7 Comments »

As Delphi 2009, with its big unicode changes, is soon to be upon us (It was announced and made available for sale on Aug. 25), I think it’s a good idea to talk about some of the issues that might come up.

My experience with Delphi 2009 (which I’ve been field testing) is that the vast majority of code will actually behave just the way it did before, with no changes.The only trouble will come with code that assumes that the size of a Char is 1 byte, or that the length of a string is equal to its size in bytes. Now that a Char is a 2-byte data type, and the length of a string is different than the number of bytes it takes up, we have to re-examine some old code and change some old habits.

Over the next few days, I’ll look at a few of the things that I encountered in porting my code, which I expect will be the same issues most people will face.  Here goes…

string, WideString and AnsiString

The under-the-hood changes to the string type are interesting, but aren’t really relevant here, except to say that WideString is more or less deprecated.  The new string type UnicodeString sort of replaces WideString, with a lot more capability (code that uses WideString will still compile and run just fine).

String is mapped to UnicodeString, so all of your code that uses the string type will automatically be unicode, where before it was ASCII (ok, it wasn’t necessarily ASCII, because you could codepages and things, but it WAS restricted to 8-bit characters).  Char is mapped to WideChar, so a Char is now 16-bits by default, instead of 8. 99.9% of the time, this is what you’ll want.

However, for the rare times when you’ll specifically want 8-bit characters, the types AnsiChar and AnsiString still behave like “old” Delphi strings, with 8-bit characters.  AnsiChar and AnsiString are assignment-compatible with UnicodeString and WideChar, so you can do myUnicodeString := myAnsiString and it will just work.  Keep in mind though, if you assign a UnicodeString to an AnsiString, there is potential data loss as 16-bit characters are “compressed” into 8 bits.

We’re going to ignore AnsiChar and AnsiString for most of this discussion.  The times when you’ll need them will be fairly obvious, and most of the time you should use the standard string and Char types, which are unicode.

Length and SizeOf

Any call to Length(string) is a potential problem.  The Length of a string is no longer its size in bytes, but rather the number of printable characters in the string.  The size in bytes is best determined by the expression Length(myString) * SizeOf(Char).

With Char arrays (often encountered when using Windows API calls directly), Length(charArray) will return the number or Char elements in the array, which again is its size in bytes.  If you want the size in bytes, call SizeOf(charArray).  When using API calls like FormatMessage, GetClassName, GetWindowText, etc… which take a Char buffer and its size, make sure you’re passing the right size – be it the size in bytes or the length.  MOST Windows API calls want the number of chars in the array, so you should pass Length(charArray), not SizeOf(charArray).

If you have a null-terminated Char array, you can get the number of printable characters in the array with StrLen(charArray).  For example, if you declared charArray: array[0..99], and assign charArray = ‘hello’, you’ll find that Length(charArray) returns 100, SizeOf(charArray) returns 200, and StrLen(charArray) returns 5.

Oh, and one more note… SizeOf with strings isn’t very helpful.  SizeOf(myString) is always going to be 4, because myString just a pointer.

That’s enough for today.  Tomorrow, I’ll talk about some possible memory management issues you might encounter, and how to easily fix them.