Adventures in Squeak: x64 Retro

So over the past week or two I had decided to incorporate x64 compilation for the Windows operating system for Squeak. For the most part I have completed this transition, and the MouseSpeak compiler can now target either architecture. Prior to this, Squeak could only be compiled to x86/32 bit windows. This is a list of things I encountered along the way:

  1. Byte widths gave me headaches.
  2. Windows x64 calling convention woes.
  3. 0d16 byte alignment for rsp gave me so many issues.

Byte Widths

On x86 it was possible to largely ignore all forms of padding and offsets for types, particularly when interacting with the Windows operating system. Since most things were 0d04 bytes in length, I did not need to calculate offsets or implement a strict typing system. Indeed, Squeak had no typing system and only cared about data in terms of being a 0d04 byte address.

When trying to compile to x64 it became immediately apparent that this wasn’t sufficient. Especially since pointers were now 0d08 and most fields were 0d04 bytes wide. This means that structs supplied or required by the operating system now required byte-aligned offsets, creating artificial inflation for data sizes and wasting bytes. A painful discovery.

As an observation, if some of these fields had been restructured so that some of their DWORD members came before pointer types, this wouldn’t have impacted the size of the structs at all, but would destroy backwards compatability.

Windows Calling Convention

I long for the x86 calling conventions for stdcall and _cdec calling. In these x86 style calling conventions, arguments are pushed to the stack right to left. The minor difference between the two is that in the case of stdcall the called function has a strict number of arguments and can clear the stack before calling return. The latter generally supports variadic functions, and since the function does not know how many addresses to pop, the caller must clean the stack after the function returns.

In the x64 Windows calling convention, the first four arguments are passed via registers similar to fastcall. Additional arguments are pushed to the stack, but since the stack must be 0d16 byte aligned, the caller may need to add padding by adjusting rsp. For instance, if a fifth argument 0d08 bytes long is required, the caller will need to subtract 0d08 bytes from rsp prior to pushing the stack functions.

In addition to this, after pushing all the functions required for the call, 0x20 bytes (0d32) need to be reserved for the called function to do some __magic__ or something. Something about passing registers around or something.

Here is the Squeak logic in the event it helps anyone else trying to figure this out:

  1. Loop through the arguments.
  2. Put the first four in the registers rcx rdx r8 r9.
  3. If the remaining arguments do not align rsp to a multiple of 0d16 ( args * 8 % 16 != 0 ), subtract rsp by the appropriate offset to align (this will probably just be sub rsp, 8).
  4. Push the remaining arguments in right to left.
  5. Call the function.
  6. Add the alignment and pushed addresses back to rsp.

Since this can cater to both the x86 calling conventions most used in Windows, in the x64 operating system both of these calling conventions can collapse to the Windows calling convention.

0d16 Byte Alignment

In x64 architecture, the stack pointer register should be aligned to a 0d16 byte boundary. This is not a hard and fast rule, but some functions within kernel space and libraries such as ntdll.dll and kernel32.dll will use the address of rsp. If this has not been aligned correctly, your program will likely end up trying to access illegal memory and result in an access denied error. In Squeak, certain things are pushed to the stack when required at runtime to facilitate first-byte executable code. In the case of x86, I could just push each 0d04 segment to complete the string. In the case of x64, however, I had to subtract a multiple of 16 from rsp to reserve the space, then use move instructions to fill the space. Part of this is a result of the x64 instruction set not containing a push imm64 command, at least, not in my assembler of choice hehe.

Observations

Funnily enough, some things flag on x32 code vs x64 on my machine with my local antivirus solution. This is probably due to signature detections or a behaviour of using a call from a library such as bcrypt.dll which has not been indexed enough in x64. Strange.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Secured By miniOrange