Friday, March 27, 2009

On Strings and Unicode in Delphi 2009

There have been a few posts about strings in Delphi 2009. Here are a couple of comments.



There have been a few posts about strings in Delphi 2009:




  • Breaking existing code (on The Doric Temple). The main point here is that CodeGear shoudl have left string as AnsiString, PChar as PAnsiChar to make existing code exactly identical to the past... and introduce Unicode support alongside.


  • 2009 and backwards compatibility (by Gurock Software). The main point here is conversion wasn't that hard and CodeGear made a good job with new warnings.


  • Misunderstood. Who me? (on the Doric Temple). Here the author counters some of the comments and adds that conversion is going on (seems less worried).

  • ...and probably many others I missed


Having delved into Unicode in Delphi 2009 and given sessions about it, I see that as a standard reaction and understand it. But I think CodeGear did the right job making string an alias of UnicodeString and the like. They had two options, which are clear if you look at code like:



var
myString: string;
begin
myString := 'some text';
MessagBox ('title', PChar (myString), ...)

One option was to let those who wanted Unicode make an extra effort. Converting this code to Unicode would have meant changing the string type declaration (to UnicodeString), the API function call (to MessageBoxW), and the PChar cast (to PWideChar). The second option was to let those who want to stick with Ansi make an extra effort, which is what they did. Keeping this code in Ansi in Delphi 2009 means changing the string type declaration (to AnsiString), the API function call (to MessageBoxA), and the PChar cast (to PAnsiChar). But moving the code above to Unicode means... recompiling it!


I know there are several other issues, like the misuse of the PChar pointer for referring to data other than string characters, but they came up with PByte and the TBytes array for that... or the Bookmark property madness. I've converted a lot of code and seens many problems, including performance issues...


However from a high level perspective I think it all boiled down to these two options. And I think asking for an extra effort to those who want to stick to Ansi better serves the product to move to Unicode, even if the initial acceptance might be slower. That's from the point of view of a Western Europe citizen (not too much Unicode need around here, but some) with an accented letter in his last name (so I tend to put more emphasis on Unicode and character support than other people). But my "accented letter in last name" horror stories are for another blog post.


Now I need to pack to go to the US, for the training session with Cary Jensen. You can follow me on twitter, of course.

No comments: