-
Notifications
You must be signed in to change notification settings - Fork 99
Fix Darwin ARM64 calling convention for struct arguments, callbacks, and stack packing #353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
3553dfa to
61448b8
Compare
|
Rebased and fixed this up. |
0e9a914 to
4aa156e
Compare
|
note: I may have a clean way to extend this abi change to callbacks, will push once I validate. |
Success! I've included the c-packed arguments for callbacks as well on darwin/arm64. |
2a73c27 to
0ee3fd6
Compare
3d21862 to
c39e130
Compare
|
Awesome! Would it be possible to split the PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements support for Darwin ARM64 calling convention for stack argument packing in purego. The key change is handling the difference between Darwin ARM64's byte-level packing (where arguments use their natural size) vs. the 8-byte slot approach used on other platforms.
- Darwin ARM64 now uses C-style byte-level packing for stack arguments with proper alignment
- Callback handling split into platform-specific implementations for Darwin ARM64 vs. generic
- Smart stack limit checking based on actual byte requirements rather than simple argument counts
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| func.go | Added Darwin ARM64 stack bundling logic with C-style packing and calculateStackBytesNeeded helper |
| func_darwin_arm64.go | Platform-specific helpers for determining when stack packing is needed and struct register fitting |
| func_notdarwin.go | Stub implementations for non-Darwin ARM64 platforms |
| syscall_sysv.go | Split callback handling into callbackWrapGeneric and callbackWrapDarwinARM64 |
| struct_darwin_arm64.go | Darwin ARM64-specific struct register placement with byte-level copying for non-HFA/HVA |
| struct_notdarwin_arm64.go | Delegates to standard AArch64 logic for non-Darwin ARM64 |
| struct_arm64.go | Renamed placeRegisters to placeRegistersAArch64 for clarity |
| testdata/abitest/abi_test.c | Added comprehensive test cases for struct passing and stack arguments |
| testdata/libcbtest/callback_packing_test.c | New test file for callback argument packing |
| func_test.go | Added extensive tests for argument passing and stack limits |
| callback_test.go | Added callback packing tests for int32, mixed types, and small types |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ee5135d to
ed55e2d
Compare
Open to that, can you suggest what you'd like to see separated? edit: I can pull the callback fixes out to a separate PR, sound good? edit2: The changes to argument packing cause a failure in the callback tests which is making it difficult to separate this into two logical PRs. |
I'll take a look tomorrow, but I'd like to know @TotallyGamerJet's opinion |
|
Going to mark this as draft and see if I can split it up successfully. |
Adds callback detection to avoid applying C-style tight packing to callback functions, since callback unpacking still uses the traditional 8-byte slot convention and would break with tightly-packed arguments. Key changes: - Add isCallbackFunction() to detect purego callback addresses - Skip tight packing in RegisterFunc when cfn is a callback - Add helper functions getCallbackStart() and getMaxCB() for address range checks - Include TODO comments marking this as temporary until callback unpacking is updated to handle tight packing This ensures callbacks continue working correctly while the tight packing feature is being developed for Darwin ARM64.
3c78dff to
f497229
Compare
|
OK I separated the callback portion of this to #364 -- this required adding some temporary shims to avoid this arg packing when a destination is a go callback. |
func.go
Outdated
| } | ||
| // Round total to 8-byte boundary | ||
| if stackBytes > 0 && stackBytes%align8ByteSize != 0 { | ||
| stackBytes = (stackBytes + align8ByteMask) &^ align8ByteMask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use roundUpTo8 here? Makes it clearer what's happening
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tmc you marked this resolved but I don't see why you didn't use it
func_test.go
Outdated
| } | ||
| lib, err := load.OpenLibrary(libFileName) | ||
| if err != nil { | ||
| t.Fatalf("Dlopen(%q) failed: %v", libFileName, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't calling dlopen change to Failed to open library:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tmc You resolved this but didn't update the string
func_test.go
Outdated
|
|
||
| tests := []struct { | ||
| name string | ||
| fn interface{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change all interface{} -> any
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tmc you didn't update this
This commit addresses review feedback from @TotallyGamerJet: 1. Merge struct_others.go and struct_other.go into struct_notdarwin.go - Consolidates platform-specific stubs for non-Darwin platforms - Reduces confusion from having similarly-named files - Uses build tag !darwin to cover all non-Darwin platforms 2. Refactor shouldBundleStackArgs to avoid multi-level if statements - Restructures control flow with early returns - Improves readability by reducing nesting depth - Makes the logic flow more linear and easier to follow 3. Extract copyStruct8ByteChunks helper function - Deduplicates code for copying struct memory in 8-byte chunks - Used by both placeRegisters and bundleStackArgs - Follows DRY principle for Darwin ARM64 byte-level packing
93945f3 to
6563f9f
Compare
Improve build tag specificity and organization for struct handling files: - Add explicit architecture build tags to struct_amd64.go and struct_arm64.go - Add explicit darwin && arm64 tag to struct_darwin_arm64.go - Refine struct_notdarwin.go to exclude arm64 and amd64 (!darwin && !arm64 && !amd64) - Refine struct_notdarwin_arm64.go to be ARM64-specific (!darwin && arm64) - Create struct_notdarwin_amd64.go for non-Darwin AMD64 stub functions - Move architecture-specific stub functions from struct_notdarwin.go to arch-specific files (shouldBundleStackArgs, structFitsInRegisters, collectStackArgs, bundleStackArgs) - Remove generic estimateStackBytes stub from struct_notdarwin.go This ensures each file has precise build constraints matching its actual platform coverage and places architecture-specific stubs in the appropriate files. Fixes Linux ARM64 and AMD64 compilation which previously failed with function redeclaration errors due to missing build tags.
|
I think i've addressed all comment. |
TotallyGamerJet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were a lot of comments marked as resolved but didn't have comments or no changes
func.go
Outdated
| } | ||
| // Round total to 8-byte boundary | ||
| if stackBytes > 0 && stackBytes%align8ByteSize != 0 { | ||
| stackBytes = (stackBytes + align8ByteMask) &^ align8ByteMask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tmc you marked this resolved but I don't see why you didn't use it
func_test.go
Outdated
| } | ||
| lib, err := load.OpenLibrary(libFileName) | ||
| if err != nil { | ||
| t.Fatalf("Dlopen(%q) failed: %v", libFileName, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tmc You resolved this but didn't update the string
func_test.go
Outdated
|
|
||
| tests := []struct { | ||
| name string | ||
| fn interface{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tmc you didn't update this
struct_darwin_arm64.go
Outdated
| ptr := unsafe.Pointer(v.Addr().Pointer()) | ||
| size := v.Type().Size() | ||
|
|
||
| // Copy the struct memory in 8-byte chunks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems unnecessary
|
ping @tmc |
|
Sorry missed these comments, will address. |
…ests Reorganize platform-specific struct files with more explicit architecture-specific build tags: - Split struct_notdarwin.go into struct_386.go and struct_arm.go for clarity - Consolidate loong64 stubs into struct_loong64.go, remove dedicated file - Adjust build tags on struct_*_*.go files for accuracy - Simplify syscall_other.go build tag to (android || linux) && (386 || arm)
The mask-based optimization for partial chunks reads 8 bytes even when remaining < 8, which triggers Go's checkptr validation during race testing. Revert to byte-by-byte reading for partial chunks to avoid reading beyond the allocation boundary.
TotallyGamerJet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed some small changes so we don't have to wait on my approve again. LGTM! @hajimehoshi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ptr := strings.CString(val.String()) | ||
| keepAlive = append(keepAlive, ptr) | ||
| val = reflect.ValueOf(ptr) | ||
| args[startIdx+j] = val |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The args slice is being mutated on line 475 (args[startIdx+j] = val) to replace strings with their C pointer representation. However, this modification of the input slice is not documented in the function comment and could be confusing. Since stackArgs is returned and used instead of the modified args slice, this mutation appears unnecessary. Consider removing the mutation and only using val for the stackArgs append operation.
| args[startIdx+j] = val |
| func estimateStackBytes(ty reflect.Type) int { | ||
| numInts, numFloats := 0, 0 | ||
| stackBytes := 0 | ||
|
|
||
| for i := 0; i < ty.NumIn(); i++ { | ||
| arg := ty.In(i) | ||
| size := int(arg.Size()) | ||
|
|
||
| // Check if this goes to register or stack | ||
| usesInt := arg.Kind() != reflect.Float32 && arg.Kind() != reflect.Float64 | ||
| if usesInt && numInts < numOfIntegerRegisters() { | ||
| numInts++ | ||
| } else if !usesInt && numFloats < numOfFloatRegisters { | ||
| numFloats++ | ||
| } else { | ||
| // Goes to stack - accumulate total bytes | ||
| stackBytes += size | ||
| } | ||
| } | ||
| // Round total to 8-byte boundary | ||
| if stackBytes > 0 && stackBytes%align8ByteSize != 0 { | ||
| stackBytes = int(roundUpTo8(uintptr(stackBytes))) | ||
| } | ||
| return stackBytes | ||
| } |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The estimateStackBytes function doesn't account for struct arguments, which could lead to underestimation of stack space. When an argument is a struct, line 515's usesInt check will be true (since it's not float32/float64), but line 516-517 will only increment numInts by 1, not accounting for the actual slots needed for struct packing. This could cause the validation on line 217 to pass when it should fail.
Consider handling reflect.Struct explicitly to estimate the actual space needed, similar to how the validation logic in lines 178-194 handles structs.
func_test.go
Outdated
| defer func() { | ||
| if r := recover(); r != nil { | ||
| msg := fmt.Sprint(r) | ||
| if !strings.Contains(msg, "too many stack arguments") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like testing an error message is fragile. Just checking an error existence should be enough.
struct_386.go
Outdated
| } | ||
|
|
||
| func placeRegisters(v reflect.Value, addFloat func(uintptr), addInt func(uintptr)) { | ||
| panic("purego: not needed on 386") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is not needed? The function itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess changing it to not implemented is good
struct_amd64.go
Outdated
| } | ||
|
|
||
| func placeRegisters(v reflect.Value, addFloat func(uintptr), addInt func(uintptr)) { | ||
| panic("purego: not needed on amd64") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
struct_amd64.go
Outdated
| // structFitsInRegisters is not used on non-Darwin platforms. | ||
| // This stub exists for compilation but should never be called. | ||
| func structFitsInRegisters(val reflect.Value, tempNumInts, tempNumFloats int) (bool, int, int) { | ||
| panic("purego: structFitsInRegisters should not be called on non-Darwin platforms") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message seems odd. Isn't this function defined for both Darwin and non-Darwin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, it's just not needed on amd64 at all
struct_arm.go
Outdated
| } | ||
|
|
||
| func placeRegisters(v reflect.Value, addFloat func(uintptr), addInt func(uintptr)) { | ||
| panic("purego: not needed on arm") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
struct_arm64.go
Outdated
| // 8-byte chunks, which works correctly for both register and stack placement. | ||
| func placeRegistersDarwin(v reflect.Value, addFloat func(uintptr), addInt func(uintptr)) { | ||
| if runtime.GOOS != "darwin" { | ||
| panic("purego: this should only be called on darwin") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/this/placeRegistersDarwin/
struct_arm64.go
Outdated
| // should go through normal register allocation or be bundled with stack args. | ||
| func structFitsInRegisters(val reflect.Value, tempNumInts, tempNumFloats int) (bool, int, int) { | ||
| if runtime.GOOS != "darwin" { | ||
| panic("purego: this should only be called on darwin") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/this/structFitsInRegisters/
struct_arm64.go
Outdated
| // It creates a packed struct with proper alignment and copies it to the stack in 8-byte chunks. | ||
| func bundleStackArgs(stackArgs []reflect.Value, addStack func(uintptr)) { | ||
| if runtime.GOOS != "darwin" { | ||
| panic("purego: this should only be called on darwin") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/this/bundleStackArgs/
|
@tmc ping |
|
Let me know when it's ready to review |
|
ping @tmc |
Darwin ARM64 uses byte-aligned argument packing, not 8-byte slots.
The current code uses 8-byte slots for everything, causing two problems:
For example:
The fix implements proper Darwin ARM64 calling conventions:
The code splits ARM64 struct handling into darwin and non-darwin versions
to preserve correct behavior on Linux and other ARM64 systems.
Fixes #352