Skip to content

Conversation

@tmc
Copy link
Contributor

@tmc tmc commented Oct 28, 2025

Darwin ARM64 uses byte-aligned argument packing, not 8-byte slots.
The current code uses 8-byte slots for everything, causing two problems:

  1. Functions with >8 parameters fail when using int32, float32, etc.
  2. Struct arguments are packed incorrectly.

For example:

// C: int32_t sum(int32_t a1, ..., int32_t a11)
var fn func(int32, int32, int32, int32, int32, int32, int32, int32, int32, int32, int32) int32
purego.RegisterLibFunc(&fn, lib, "sum_11_int32")
result := fn(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
// fails: stack arguments misaligned

The fix implements proper Darwin ARM64 calling conventions:

  • Stack arguments use natural alignment (int32 = 4 bytes, not 8).
  • Multiple small values pack into single 8-byte slots.
  • Struct arguments use byte-level copying, not slot alignment.
  • Callbacks unpack arguments using Darwin-specific rules.
  • Register allocation tries registers before falling back to stack.

The code splits ARM64 struct handling into darwin and non-darwin versions
to preserve correct behavior on Linux and other ARM64 systems.

Fixes #352

@tmc tmc force-pushed the improve-darwin-arm64-stack-packing branch from 3553dfa to 61448b8 Compare October 28, 2025 16:02
@tmc
Copy link
Contributor Author

tmc commented Oct 28, 2025

Rebased and fixed this up.

@tmc tmc force-pushed the improve-darwin-arm64-stack-packing branch from 0e9a914 to 4aa156e Compare October 28, 2025 16:35
@tmc
Copy link
Contributor Author

tmc commented Oct 28, 2025

note: I may have a clean way to extend this abi change to callbacks, will push once I validate.

@tmc
Copy link
Contributor Author

tmc commented Oct 28, 2025

note: I may have a clean way to extend this abi change to callbacks, will push once I validate.

Success! I've included the c-packed arguments for callbacks as well on darwin/arm64.

@hajimehoshi hajimehoshi marked this pull request as draft October 28, 2025 17:33
@tmc tmc force-pushed the improve-darwin-arm64-stack-packing branch from 2a73c27 to 0ee3fd6 Compare October 29, 2025 15:53
@tmc tmc changed the title Fix Darwin ARM64 stack parameter packing for sub-8-byte types purego: fix Darwin ARM64 calling convention for struct arguments, callbacks, and stack packing Oct 29, 2025
@tmc tmc force-pushed the improve-darwin-arm64-stack-packing branch 2 times, most recently from 3d21862 to c39e130 Compare October 29, 2025 17:08
@tmc tmc marked this pull request as ready for review October 29, 2025 18:16
@hajimehoshi hajimehoshi requested a review from Copilot October 30, 2025 04:34
@hajimehoshi
Copy link
Member

Awesome! Would it be possible to split the PR?

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements support for Darwin ARM64 calling convention for stack argument packing in purego. The key change is handling the difference between Darwin ARM64's byte-level packing (where arguments use their natural size) vs. the 8-byte slot approach used on other platforms.

  • Darwin ARM64 now uses C-style byte-level packing for stack arguments with proper alignment
  • Callback handling split into platform-specific implementations for Darwin ARM64 vs. generic
  • Smart stack limit checking based on actual byte requirements rather than simple argument counts

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
func.go Added Darwin ARM64 stack bundling logic with C-style packing and calculateStackBytesNeeded helper
func_darwin_arm64.go Platform-specific helpers for determining when stack packing is needed and struct register fitting
func_notdarwin.go Stub implementations for non-Darwin ARM64 platforms
syscall_sysv.go Split callback handling into callbackWrapGeneric and callbackWrapDarwinARM64
struct_darwin_arm64.go Darwin ARM64-specific struct register placement with byte-level copying for non-HFA/HVA
struct_notdarwin_arm64.go Delegates to standard AArch64 logic for non-Darwin ARM64
struct_arm64.go Renamed placeRegisters to placeRegistersAArch64 for clarity
testdata/abitest/abi_test.c Added comprehensive test cases for struct passing and stack arguments
testdata/libcbtest/callback_packing_test.c New test file for callback argument packing
func_test.go Added extensive tests for argument passing and stack limits
callback_test.go Added callback packing tests for int32, mixed types, and small types

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@tmc tmc force-pushed the improve-darwin-arm64-stack-packing branch from ee5135d to ed55e2d Compare October 30, 2025 14:37
@tmc
Copy link
Contributor Author

tmc commented Oct 30, 2025

Awesome! Would it be possible to split the PR?

Open to that, can you suggest what you'd like to see separated?

edit: I can pull the callback fixes out to a separate PR, sound good?

edit2: The changes to argument packing cause a failure in the callback tests which is making it difficult to separate this into two logical PRs.

@hajimehoshi
Copy link
Member

Open to that, can you suggest what you'd like to see separated?

I'll take a look tomorrow, but I'd like to know @TotallyGamerJet's opinion

@tmc
Copy link
Contributor Author

tmc commented Oct 30, 2025

Going to mark this as draft and see if I can split it up successfully.

@tmc tmc marked this pull request as draft October 30, 2025 19:30
Adds callback detection to avoid applying C-style tight packing to callback
functions, since callback unpacking still uses the traditional 8-byte slot
convention and would break with tightly-packed arguments.

Key changes:
- Add isCallbackFunction() to detect purego callback addresses
- Skip tight packing in RegisterFunc when cfn is a callback
- Add helper functions getCallbackStart() and getMaxCB() for address range checks
- Include TODO comments marking this as temporary until callback unpacking
  is updated to handle tight packing

This ensures callbacks continue working correctly while the tight packing
feature is being developed for Darwin ARM64.
@tmc tmc force-pushed the improve-darwin-arm64-stack-packing branch from 3c78dff to f497229 Compare October 31, 2025 19:59
@tmc tmc changed the title purego: fix Darwin ARM64 calling convention for struct arguments, callbacks, and stack packing Fix Darwin ARM64 calling convention for struct arguments, callbacks, and stack packing Oct 31, 2025
@tmc tmc marked this pull request as ready for review October 31, 2025 20:06
@tmc
Copy link
Contributor Author

tmc commented Oct 31, 2025

OK I separated the callback portion of this to #364 -- this required adding some temporary shims to avoid this arg packing when a destination is a go callback.

func.go Outdated
}
// Round total to 8-byte boundary
if stackBytes > 0 && stackBytes%align8ByteSize != 0 {
stackBytes = (stackBytes + align8ByteMask) &^ align8ByteMask
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use roundUpTo8 here? Makes it clearer what's happening

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmc you marked this resolved but I don't see why you didn't use it

func_test.go Outdated
}
lib, err := load.OpenLibrary(libFileName)
if err != nil {
t.Fatalf("Dlopen(%q) failed: %v", libFileName, err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't calling dlopen change to Failed to open library:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmc You resolved this but didn't update the string

func_test.go Outdated

tests := []struct {
name string
fn interface{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change all interface{} -> any

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmc you didn't update this

This commit addresses review feedback from @TotallyGamerJet:

1. Merge struct_others.go and struct_other.go into struct_notdarwin.go
   - Consolidates platform-specific stubs for non-Darwin platforms
   - Reduces confusion from having similarly-named files
   - Uses build tag !darwin to cover all non-Darwin platforms

2. Refactor shouldBundleStackArgs to avoid multi-level if statements
   - Restructures control flow with early returns
   - Improves readability by reducing nesting depth
   - Makes the logic flow more linear and easier to follow

3. Extract copyStruct8ByteChunks helper function
   - Deduplicates code for copying struct memory in 8-byte chunks
   - Used by both placeRegisters and bundleStackArgs
   - Follows DRY principle for Darwin ARM64 byte-level packing
@tmc tmc force-pushed the improve-darwin-arm64-stack-packing branch from 93945f3 to 6563f9f Compare November 16, 2025 02:19
Improve build tag specificity and organization for struct handling files:

- Add explicit architecture build tags to struct_amd64.go and struct_arm64.go
- Add explicit darwin && arm64 tag to struct_darwin_arm64.go
- Refine struct_notdarwin.go to exclude arm64 and amd64 (!darwin && !arm64 && !amd64)
- Refine struct_notdarwin_arm64.go to be ARM64-specific (!darwin && arm64)
- Create struct_notdarwin_amd64.go for non-Darwin AMD64 stub functions
- Move architecture-specific stub functions from struct_notdarwin.go to arch-specific files
  (shouldBundleStackArgs, structFitsInRegisters, collectStackArgs, bundleStackArgs)
- Remove generic estimateStackBytes stub from struct_notdarwin.go

This ensures each file has precise build constraints matching its actual
platform coverage and places architecture-specific stubs in the appropriate files.

Fixes Linux ARM64 and AMD64 compilation which previously failed with function
redeclaration errors due to missing build tags.
@tmc
Copy link
Contributor Author

tmc commented Nov 17, 2025

I think i've addressed all comment.

Copy link
Collaborator

@TotallyGamerJet TotallyGamerJet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were a lot of comments marked as resolved but didn't have comments or no changes

func.go Outdated
}
// Round total to 8-byte boundary
if stackBytes > 0 && stackBytes%align8ByteSize != 0 {
stackBytes = (stackBytes + align8ByteMask) &^ align8ByteMask
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmc you marked this resolved but I don't see why you didn't use it

func_test.go Outdated
}
lib, err := load.OpenLibrary(libFileName)
if err != nil {
t.Fatalf("Dlopen(%q) failed: %v", libFileName, err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmc You resolved this but didn't update the string

func_test.go Outdated

tests := []struct {
name string
fn interface{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmc you didn't update this

ptr := unsafe.Pointer(v.Addr().Pointer())
size := v.Type().Size()

// Copy the struct memory in 8-byte chunks
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems unnecessary

@hajimehoshi
Copy link
Member

ping @tmc

@tmc
Copy link
Contributor Author

tmc commented Dec 2, 2025

Sorry missed these comments, will address.

tmc and others added 7 commits December 2, 2025 09:56
…ests

Reorganize platform-specific struct files with more explicit architecture-specific
build tags:

- Split struct_notdarwin.go into struct_386.go and struct_arm.go for clarity
- Consolidate loong64 stubs into struct_loong64.go, remove dedicated file
- Adjust build tags on struct_*_*.go files for accuracy
- Simplify syscall_other.go build tag to (android || linux) && (386 || arm)
The mask-based optimization for partial chunks reads 8 bytes even when
remaining < 8, which triggers Go's checkptr validation during race testing.

Revert to byte-by-byte reading for partial chunks to avoid reading
beyond the allocation boundary.
Copy link
Collaborator

@TotallyGamerJet TotallyGamerJet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed some small changes so we don't have to wait on my approve again. LGTM! @hajimehoshi

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ptr := strings.CString(val.String())
keepAlive = append(keepAlive, ptr)
val = reflect.ValueOf(ptr)
args[startIdx+j] = val
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The args slice is being mutated on line 475 (args[startIdx+j] = val) to replace strings with their C pointer representation. However, this modification of the input slice is not documented in the function comment and could be confusing. Since stackArgs is returned and used instead of the modified args slice, this mutation appears unnecessary. Consider removing the mutation and only using val for the stackArgs append operation.

Suggested change
args[startIdx+j] = val

Copilot uses AI. Check for mistakes.
Comment on lines +506 to +530
func estimateStackBytes(ty reflect.Type) int {
numInts, numFloats := 0, 0
stackBytes := 0

for i := 0; i < ty.NumIn(); i++ {
arg := ty.In(i)
size := int(arg.Size())

// Check if this goes to register or stack
usesInt := arg.Kind() != reflect.Float32 && arg.Kind() != reflect.Float64
if usesInt && numInts < numOfIntegerRegisters() {
numInts++
} else if !usesInt && numFloats < numOfFloatRegisters {
numFloats++
} else {
// Goes to stack - accumulate total bytes
stackBytes += size
}
}
// Round total to 8-byte boundary
if stackBytes > 0 && stackBytes%align8ByteSize != 0 {
stackBytes = int(roundUpTo8(uintptr(stackBytes)))
}
return stackBytes
}
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The estimateStackBytes function doesn't account for struct arguments, which could lead to underestimation of stack space. When an argument is a struct, line 515's usesInt check will be true (since it's not float32/float64), but line 516-517 will only increment numInts by 1, not accounting for the actual slots needed for struct packing. This could cause the validation on line 217 to pass when it should fail.

Consider handling reflect.Struct explicitly to estimate the actual space needed, similar to how the validation logic in lines 178-194 handles structs.

Copilot uses AI. Check for mistakes.
func_test.go Outdated
defer func() {
if r := recover(); r != nil {
msg := fmt.Sprint(r)
if !strings.Contains(msg, "too many stack arguments") {
Copy link
Member

@hajimehoshi hajimehoshi Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like testing an error message is fragile. Just checking an error existence should be enough.

struct_386.go Outdated
}

func placeRegisters(v reflect.Value, addFloat func(uintptr), addInt func(uintptr)) {
panic("purego: not needed on 386")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is not needed? The function itself?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess changing it to not implemented is good

struct_amd64.go Outdated
}

func placeRegisters(v reflect.Value, addFloat func(uintptr), addInt func(uintptr)) {
panic("purego: not needed on amd64")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

struct_amd64.go Outdated
// structFitsInRegisters is not used on non-Darwin platforms.
// This stub exists for compilation but should never be called.
func structFitsInRegisters(val reflect.Value, tempNumInts, tempNumFloats int) (bool, int, int) {
panic("purego: structFitsInRegisters should not be called on non-Darwin platforms")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message seems odd. Isn't this function defined for both Darwin and non-Darwin?

Copy link
Collaborator

@TotallyGamerJet TotallyGamerJet Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, it's just not needed on amd64 at all

struct_arm.go Outdated
}

func placeRegisters(v reflect.Value, addFloat func(uintptr), addInt func(uintptr)) {
panic("purego: not needed on arm")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

struct_arm64.go Outdated
// 8-byte chunks, which works correctly for both register and stack placement.
func placeRegistersDarwin(v reflect.Value, addFloat func(uintptr), addInt func(uintptr)) {
if runtime.GOOS != "darwin" {
panic("purego: this should only be called on darwin")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/this/placeRegistersDarwin/

struct_arm64.go Outdated
// should go through normal register allocation or be bundled with stack args.
func structFitsInRegisters(val reflect.Value, tempNumInts, tempNumFloats int) (bool, int, int) {
if runtime.GOOS != "darwin" {
panic("purego: this should only be called on darwin")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/this/structFitsInRegisters/

struct_arm64.go Outdated
// It creates a packed struct with proper alignment and copies it to the stack in 8-byte chunks.
func bundleStackArgs(stackArgs []reflect.Value, addStack func(uintptr)) {
if runtime.GOOS != "darwin" {
panic("purego: this should only be called on darwin")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/this/bundleStackArgs/

@TotallyGamerJet
Copy link
Collaborator

@tmc ping

@hajimehoshi
Copy link
Member

Let me know when it's ready to review

@hajimehoshi
Copy link
Member

ping @tmc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Apple ARM64 ABI for stack arguments

3 participants