Monday, March 23, 2009

Jitter optimization for System.Initialize()

In the newsgroups is a discussion about “Delphi compiler and CPU core usage” and in a subthread the idea of a jittered InitializeRecord, FinalizeRecord and CopyRecord was born.


So here are the first performance statistics for a first (Pure Pascal) implemenation of such a Jitter for Initialize/InitializeRecord/InitializeArray.



type
TTest = record
I1: Integer;
SA: array[0..1] of string;
I2: Integer;
S: string;
Intf: IInterface;
A: array of Byte;
end;

var
m: array[0..1024] of TTest;


First call to Initialize

Original: 0.067819 ms

Jittered: 0.042238 ms (this includes the time for the Jitter itself)

// _InitializeArray(@m, TypeInfo(TTest), Length(m));


Second call to Initialize

Original: 0.057681 ms

Jittered: 0.010007 ms

// _InitializeArray(@m, TypeInfo(TTest), Length(m));


Execution in a tight loop: 10000x

Original: 419.089185 ms

Jittered: 39.533007 ms

// for I := 0 to 10000 - 1 do

// _InitializeArray(@m, TypeInfo(TTest), Length(m));


It is interesting that the Jitter can generate and execute the code in less time than the RTTI version can execute the initialization. This is because the RTTI version must process all array elements while the Jitter generates code for only one iteration and adds a loop to the code.


Generated code:


00200000 xor edx,edx
; inner loop for "array[0..1] of string" (begin)
00200002 push eax
00200003 push ecx
00200004 mov ecx,$00000002
; array elements
00200009 mov [eax+$04],edx
; inner loop for "array[0..1] of string" (end)
0020000C add eax,$04
0020000F dec ecx
00200010 jnz $00200009
00200012 pop ecx
00200013 pop eax
; record fields
00200014 mov [eax+$10],edx
00200017 mov [eax+$14],edx
0020001A mov [eax+$18],edx
; outer loop for array variables
0020001D add eax,$1c
00200020 dec ecx
00200021 jnz $00200002
00200023 ret
; alignment for the next method
00200024 nop
00200025 nop
00200026 nop
00200027 nop

This is only the start. I don’t think that the CopyRecord and FinializeRecord functions will show the same increase in performance, because the cleaning (LStrClr, …) are the real time eaters. But I can only be sure if I have tested it.


But there is also a downside. The Jitter uses a hash table to find an already jittered Initialize function for the TypeInfo. And if the type is a simple type, the original Initialize will outperform the Jitter because in the end both execute the same code but the original Initialize has less overhead. I’m sure this could be worked out by optimizing the hash table access and some other tricks.

No comments: