Wow, it’s been a while since I’ve posted here.  I’ve been head-down working on several projects, the largest of which is probably updating Castalia for the next version of Delphi, codenamed Tiburon.

Tiburon is a huge change – possibly the biggest thing since Turbo Pascal turned into Delphi.  Recent versions of Delphi have been cumulative smaller changes, but this time, it’s a wholesale “change the way you think” change…. It’s unicode.

Since the beginning, strings have been made up of a string of bytes, with each byte representing one ASCII character.  Then, the world got bigger, and the characters available via ASCII weren’t enough.  Unicode came along, and offered a few different options for encoding a lot more characters.  The guys at Microsoft decided to use UTF-16 as the native encoding of windows, meaning that most characters will now fit into a two-byte character.  Delphi would convert its byte-encoded strings to two byte strings whenever it used the Windows API.  And you could still only use the single-byte encoding alphabets with Delphi.

Now, with Tiburon, Delphi moves to a two-byte character as well.  So a string is no longer a string of bytes – it’s a string of words (a word beying two bytes).  This has interesting implications:

Length(SomeString) returns the number of characters in the string, NOT the size of the string (in bytes) in memory.  Things like SomeStream.Write(SomeString, Length(SomeString)) will need to be SomeStream.Write(SomeString, Length(SomeString) * 2);

(Further hint: just do Length(SomeString) * SizeOf(Char) and your code will be forwards and backwards compatible, no matter what might get changed in the future).

Further, SomeString[X] will return CHARACTER X in the string, not the contents of memory at X bytes into SomeString.  This is true with a PChar too, since a Char is now 2 bytes.

In fact, pointer arithmetic is a whole new world.  More on that in a later post…

One last thing: if you still need 8-BIT (not byte) strings, there is the AnsiString type (which has already existed for quite some time).  AnsiString is the string type we all know and love from Delphi 2007 and prior.  The type known as String, however, has become UnicodeString, which is the new 16-bit string type.  Conversion between the two is built in, so you can do MyUnicodeString := MyAnsiString; without any trouble.  Beware of the inverse though – you could lose data squeezing a 16-bit character into 8 bits.

All in all, I’m excited about Tiburon, and excited to FINALLY have Castalia working well with it (Unicode has been a BIG deal).  In the near future, I’ll write about some more of the interesting things I’ve found in Tiburon (and Castalia)…