Monday, March 23, 2009

Delphi 2009 and UTF8

Today I came to a point where I had a UTF8 string in Delphi 2009 and I wanted to use StringReplace to escape some characters. No big deal.



var
S: UTF8String;
begin
S := UTF8Encode('Hello'#10'World!');
S := StringReplace(S, #10, '\n', [rfReplaceAll]);
end;


The compiler compiled this piece of code without an error but with two warnings about implicit string conversions. Ok, UTF8String is an “AnsiString”, so let’s add the AnsiStrings unit to the uses clause to get access to the AnsiString-StringReplace overload. But what’s that? The two warnings do not disappear. The compiler prefers the Unicode version so let’s change the call to explicitly call AnsiStrings.StringReplace. This doesn’t help either. The opposite happens, now there are four warnings and one with a potential dataloss.


By looking at the generate code in the CPU view, I saw what the compiler has done to my code. It converts my UTF8String to an UnicodeString and then to an AnsiString(CP_ACP). It calls StringReplace and the returned AnsiString(CP_ACP) is converted to an UncodeString and back to my UTF8String. This doesn’t sound good and as if the StringReplace function wasn’t a slow function by itself, this string conversion slows down the call too much.


As a result this simple call to StringReplace is now:



var
S: UTF8String;
begin
S := UTF8Encode('Hello'#10'World!');
S := RawByteString(AnsiStrings.StringReplace(RawByteString(S), #10, '\n', [rfReplaceAll]));
end;

No comments: