So over the past week or two I had decided to incorporate x64
compilation for the Windows operating system for Squeak. For the most part I have completed this transition, and the MouseSpeak compiler can now target either architecture. Prior to this, Squeak could only be compiled to x86
/32
bit windows. This is a list of things I encountered along the way:
- Byte widths gave me headaches.
- Windows
x64
calling convention woes. 0d16
byte alignment forrsp
gave me so many issues.
Byte Widths
On x86
it was possible to largely ignore all forms of padding and offsets for types, particularly when interacting with the Windows operating system. Since most things were 0d04
bytes in length, I did not need to calculate offsets or implement a strict typing system. Indeed, Squeak had no typing system and only cared about data in terms of being a 0d04
byte address.
When trying to compile to x64
it became immediately apparent that this wasn’t sufficient. Especially since pointers were now 0d08
and most fields were 0d04
bytes wide. This means that structs supplied or required by the operating system now required byte-aligned offsets, creating artificial inflation for data sizes and wasting bytes. A painful discovery.
As an observation, if some of these fields had been restructured so that some of their DWORD members came before pointer types, this wouldn’t have impacted the size of the structs at all, but would destroy backwards compatability.
Windows Calling Convention
I long for the x86
calling conventions for stdcall
and _cdec
calling. In these x86
style calling conventions, arguments are pushed to the stack right to left. The minor difference between the two is that in the case of stdcall
the called function has a strict number of arguments and can clear the stack before calling return
. The latter generally supports variadic functions, and since the function does not know how many addresses to pop, the caller must clean the stack after the function returns.
In the x64
Windows calling convention, the first four arguments are passed via registers similar to fastcall
. Additional arguments are pushed to the stack, but since the stack must be 0d16
byte aligned, the caller may need to add padding by adjusting rsp
. For instance, if a fifth argument 0d08
bytes long is required, the caller will need to subtract 0d08
bytes from rsp
prior to pushing the stack functions.
In addition to this, after pushing all the functions required for the call, 0x20
bytes (0d32
) need to be reserved for the called function to do some __magic__ or something. Something about passing registers around or something.
Here is the Squeak logic in the event it helps anyone else trying to figure this out:
- Loop through the arguments.
- Put the first four in the registers
rcx
rdx
r8
r9
. - If the remaining arguments do not align
rsp
to a multiple of0d16
(args * 8 % 16 != 0
), subtractrsp
by the appropriate offset to align (this will probably just besub rsp, 8
). - Push the remaining arguments in right to left.
- Call the function.
- Add the alignment and pushed addresses back to
rsp
.
Since this can cater to both the x86
calling conventions most used in Windows, in the x64
operating system both of these calling conventions can collapse to the Windows calling convention.
0d16
Byte Alignment
In x64
architecture, the stack pointer register should be aligned to a 0d16
byte boundary. This is not a hard and fast rule, but some functions within kernel space and libraries such as ntdll.dll
and kernel32.dll
will use the address of rsp
. If this has not been aligned correctly, your program will likely end up trying to access illegal memory and result in an access denied error. In Squeak, certain things are pushed to the stack when required at runtime to facilitate first-byte executable code. In the case of x86
, I could just push each 0d04
segment to complete the string. In the case of x64
, however, I had to subtract a multiple of 16 from rsp
to reserve the space, then use move instructions to fill the space. Part of this is a result of the x64
instruction set not containing a push imm64
command, at least, not in my assembler of choice hehe.
Observations
Funnily enough, some things flag on x32
code vs x64
on my machine with my local antivirus solution. This is probably due to signature detections or a behaviour of using a call from a library such as bcrypt.dll
which has not been indexed enough in x64
. Strange.
Leave a Reply