Friday, March 27, 2009

Strings, PChars and reference counting

Hi folks, in today’s post I will continue with reference counting specifics of Delphi in the particular case of Strings.


We’ll start with a small example:



function ExcludeDigits(const S: String): String;
var
NewS : String;
PNewS : PChar;
I : Integer;
begin
NewS := S;
PNewS := PChar(NewS);

for I := 0 to Length(S) - 1 do
if CharInSet(PNewS[I], ['0' .. '9']) then
PNewS[I] := '_';

Result := NewS;
end;

var
Initial, Result : String;

begin
Initial := 'Hello 001 World';
Result := ExcludeDigits(Initial);

WriteLn('Initial: ', Initial);
WriteLn('Result: ', Result);

ReadLn;
end.

What will the output of this program be? The obvious result (the one we would expect) would be: “Initial: Hello 001 World” and “Result: Hello ___ World”. The actual result is: “Initial: Hello ___ World” and “Result: Hello ___ World”.


To understand why this happens we must dive deep into how reference counting works on strings. Let’s check out how things work by studying the ExcludeDigits function step by step:



  1. We assign S to NewS. And here starts Delphi’s magic! S is actually just a pointer into a heap object where the string is stored, so assigning it to NewS will just copy the pointer and will increase the reference count of that string (in the heap). Basically S and NewS would be 2 pointers that point to the same location in the heap. Current reference count of the string is 2.

  2. After assigning NewS (with conversion) to PNewS, PNewS will point to the first element (character) of the string in the heap. Remember that pointer types do not count as references so nothing else will happen!

  3. Next, the FOR loop will just check at each iteration if each character is a digit and will replace it with _ character. This happens directly in the heap object.

  4. Assigning NewS to Result will do the same as in point 1 — just assign the address and increase the reference count by 1.

  5. At the end of the function, compiler will also add a few calls itself. These calls will decrease the reference count of all local strings and dispose them if that count reached 0. In our case, the reference count will be 2.


Now it should be pretty clear why this function also changed the original string. What you need to remember is that Delphi uses Copy-On-Write technique for the strings — meaning that unless you change the copied string, that copy will occupy the same heap space. This has a nice advantage of saving memory, but also a big disadvantage - you must be careful using pointers because you never know (well, …, actually you know) what you may break.


Let’s revisit our example and add one small line to the code:



function ExcludeDigits(const S: String): String;
var
NewS : String;
PNewS : PChar;
I : Integer;
begin
NewS := S;

{ I have added the following line! }
NewS[1] := NewS[1];

PNewS := PChar(NewS);

for I := 0 to Length(S) - 1 do
if CharInSet(PNewS[I], ['0' .. '9']) then
PNewS[I] := '_';

Result := NewS;
end;

In the code above I have added a new line: “NewS[1] := NewS[1];”. I am modifying the copy so that Delphi copy-on-writes it — this means that NewS will contain a new address after this call happens. PNewS will now contain the address of the first element in the copy. This change will ensure that results are what we expected in the first place: “Initial: Hello 001 World” and “Result: Hello ___ World”.


If you use the CPU mode while debugging this code, take a look what happens for that assignment line: you will notice a call like UniqueStringU– this function actually does the “dirty work”. UniqueString checks to see if the reference count of the given string is more than 1 and if so, it will create a copy of that string and modify the parameter it received (Note: UniqueStringU is the Unicode version of this overloaded function). Let’s try it:



function ExcludeDigits(const S: String): String;
var
NewS : String;
PNewS : PChar;
I : Integer;
begin
NewS := S;

{ I have added the following line! Now using UniqueString }
UniqueString(NewS);

PNewS := PChar(NewS);

for I := 0 to Length(S) - 1 do
if CharInSet(PNewS[I], ['0' .. '9']) then
PNewS[I] := '_';

Result := NewS;
end;

Run it — yes the results are also what we expected in the first place!




General conclusion: Be very careful when you pass Strings to external DLLs by getting PChar references! You may think that you have passed a copy but actually you would pass a pointer into the original string.

No comments: