@noop_dev@TurquoiseDarren@TrisH0x2A Although I've found __builtin_call_with_static_chain to be buggy and GCC sometimes doesn't do what it should. I instead using macros with inline asm to read and write R10 explicitly.
@noop_dev@TurquoiseDarren@TrisH0x2A That comes at the cost of losing one register (RCX) for the vtable pointer, but it's a good approach. In C we would instead use the static chain pointer, to achieve similar - we can get one extra register arg (R10) using __builtin_call_with_static_chain.
@noop_dev@TurquoiseDarren@TrisH0x2A The register allocators in gcc and clang do an insanely good job without help, but sometimes giving specific hints can improve performance. Marking too much state with register storage can be detrimental to performance though.
@noop_dev@TurquoiseDarren@TrisH0x2A Computed goto is often used in conjunction with variables marked with register storage to keep the hot state in specific registers. Musttail kind of does that implicitly based on the calling convention.
@noop_dev@TurquoiseDarren@TrisH0x2A If you're optimizing for performance you should care though, because a function with computed goto still needs to move interpreter state around between registers - if there's more "hot" state than registers we end up with lots of spilling.
@noop_dev@TurquoiseDarren@TrisH0x2A But on x86-64, we have limited registers anyway and accessing R8 and above increases code size due to REX prefixes.
@noop_dev@TurquoiseDarren@TrisH0x2A Yes, you would basically add another argument to each function for a stack pointer.
It's most effective when our most frequently accessed state can fit into the 6 registers used for arguments (RDI,RSI,RDX,RCX,R8,R9), otherwise we need to read state from the stack.
@noop_dev@TurquoiseDarren@TrisH0x2A Sometimes switch is still preferable. The compiler can "over-optimize" computed gotos because their tails are all an equivalent macro, sometimes GCC decides "Hey, lets do CSE and make every branch jump to this one bit of code instead of duplicating it."
@noop_dev@TurquoiseDarren@TrisH0x2A They're not completely equivalent, but they achieve the same result. I skipped the pc in the tailcall version to simplify it - but for a more similar comparison, here's a tailcall version which uses the pc.
https://t.co/CsljSh35W4
@noop_dev@TurquoiseDarren@TrisH0x2A I've used this approach in a far more complicated interpreter and it's basically on par with computed goto for performance - but IMO better for debugging and easier to maintain.
@noop_dev@TurquoiseDarren@TrisH0x2A IMO it's less convoluted - each instruction is an independent unit and the compiler can do a better job at register allocation. Context can be passed as arguments (up to 6 in registers on SYSV x86 - more on stack).
@noop_dev@TurquoiseDarren@TrisH0x2A I wrote a small example to demonstrate this to someone a few weeks ago.
Computed Goto: https://t.co/dKycpi6VSh
Functions w/ musttail: https://t.co/uD2ywjv99L
@noop_dev@TurquoiseDarren@TrisH0x2A If you use __attribute__((musttail)) you can eliminate those costs - can match, or even potentially beat performance of computed goto.
@joseph_h_garvin@analytichegel f x y can just be partial application, which has the same effect as currying when applied multiple times
f : (a, b, c) -> d
curry f :: a -> b -> c -> d
partial f : a -> (b, c) -> d
partial (partial f x) : b -> c -> d
@valigo@SamProgramiz I was sadder to learn that the SYSV convention for AARCH and RV64 doesn't do it for structs, even though the architectures provide an ABI reference which supports 2-register returns.
@valigo@SamProgramiz The MS x64 convention basically does "out parameters" for you. If you return a structure, the caller allocates the space and passes a pointer to it as a hidden argument, and the function returns that pointer.
@SamProgramiz@valigo This is SYSV x86-64 only - although the ABIs for AARCH64 and RISC-V also support returning two values in registers, they don't do it for a struct of 16-bytes.
@SamProgramiz@valigo The "convention" for multiple returns is to use out parameters:
size_t bar(char **out);
Which requires using the stack to return the pointer.
But:
struct { size_t count; char *ptr; } bar();
Doesn't touch the stack - both count and ptr are returned in registers.