Skip to content

Conversation

@wmarti
Copy link

@wmarti wmarti commented Dec 21, 2025

This is a work in progress port of Xenia for MacOS, currently only tested on Apple Silicon, over @Wunkolo’s ARM64 backend #2259. In theory this would also work on iOS devices, but only in regions where JIT compilation is available, and distribution is available outside of the AppStore, like the EU.

The Metal backend translates Xbox 360 shader microcode through multiple stages:

Xbox 360 Microcode (ucode)
    ↓ DxbcShaderTranslator (shared with D3D12)
DXBC (DirectX Bytecode)
    ↓ dxbc_to_dxil_converter (native dxbc2dxil)
DXIL (DirectX Intermediate Language)
    ↓ metal_shader_converter (Apple Metal Shader Converter)
Metal IR
    ↓ MTLDevice newLibraryWithData
MTLLibrary (GPU-executable)

The pipeline leverages:

  • DxbcShaderTranslator: Existing Xenia infrastructure for microcode → DXBC SM 5.1
  • dxbc2dxil: DirectXShaderCompiler tool (ported to MacOS for use as a native binary) -> DXIL SM 6.0
  • Metal Shader Converter: Apple's metalirconverter library for DXIL → Metal IR

Maybe eventually I'll go the SPIR-V -> MSL route, but this seemed the easiest for now (even though there's a big performance penalty).

The entire thing has been essentially “vibecoded” over the last ~year, so there's probably a minefield of issues, and there are many merge conflicts, and tons of bloat that's not meant to be committed (sorry, I'm still learning how to use git), but I'll get those issues ironed out over time. Not expecting this to get merged anytime soon, but just opening this PR for tracking. The "app" builds but does not run games yet. I've got xenia-gpu-metal-trace-dump reproducing traces captured in D3D12 backend from Gears of War ~mostly correctly. Other Games are WIP, as you can see below.

Gears of War

4D5307D5_9639 4D5307D5_12436 4D5307D5_12994

Halo 3

4D5307E6_31051 4D5307E6_33934 4D5307E6_37345

GTA IV

545407F2_2784 545407F2_3320

Wunkolo and others added 30 commits May 4, 2024 15:47
Implements control sequences such as conditional branching, breaking, and trapping
Register was getting stomped over
On the x64 side, this is the same as the `reset()` function resetting the label-manager
Resolving the function puts it into X0 and should be called immediately after.

We were just calling ResolveFunction on ResolveFunction recursively
Things still get weird at the thunks, but this allows for callstacks between-to-guest calls
Also changes the register to X3 by default
Should be `GUEST_RET_ADDR` not `GUEST_CALL_RET_ADDR`.
Let the register type determine the reverse-size

REV32 was also the wrong instruction to use.
`W1` is a possible HIR register allocation and using W1 here was stomping over it. Don't use W1, use the provided "scratch" register.
Derive the reversal-size from the register-size.
REV32 is also the wrong one to be using here since it will reverse the bytes of upper and lower 32-bit words.
Share a somewhat similar calling convention as ARM64
16-bit word rather than 8-bit
These instructions need to use an extra register to generate their constants if they are too large
`x0` was loading the thunk rather than using `xip`

Fixes lots of init bugs!
Restart render encoders when the render target cache changes render pass attachments (dummy RT0 → real RTs, depth/stencil binding) to satisfy Metal validation. Bind stencil attachments based on the depth texture pixel format and size the dummy RT to avoid scissor/viewport assertions.
Derive viewport/scissor size from depth attachments for depth-only passes to satisfy Metal validation. Add Metal texture creation + GPU load shader coverage for k_16* UNORM/FLOAT formats plus DXT3A/DXT5A/CTX1 decoding to remove unsupported-format failures in Halo 3 traces.
Add Halo 3 and Gears of War trace filenames to Phase 2 so future Metal work can quickly validate render-target transfer and render-pass stability changes.
Metal validates pipeline attachment formats against the active render pass. For depth-only draws, attach a dummy color RT0 so pipelines that include color formats remain valid, and ensure viewport/scissor sizing prefers depth targets over the dummy when both exist.
Avoid invalid blit readbacks by capturing using the actual bound RT dimensions and skipping non-BGRA8 formats. Always create cubemap textures as cube arrays to match shader expectations, and size the dummy color attachment from the active depth target for depth-only passes.
Document the Resolve pipeline/key gap in MetalRenderTargetCache::Resolve and add a checklist to implement the missing compute resolve pipelines to avoid IssueCopy failures in real traces.
@wmarti wmarti force-pushed the metal-backend-clean-msc branch from d8a981d to 4b81238 Compare December 22, 2025 02:28
@has207
Copy link

has207 commented Dec 22, 2025

very cool! it would probably be better if you targeted xenia-canary though, there is much more active development there

@wmarti wmarti force-pushed the metal-backend-clean-msc branch from 4b81238 to 015cd72 Compare December 22, 2025 03:43
@wmarti
Copy link
Author

wmarti commented Dec 22, 2025

very cool! it would probably be better if you targeted xenia-canary though, there is much more active development there

Will look into it, thanks!

Will Martin and others added 5 commits December 25, 2025 11:46
- Build Metal XeSL bytecode for all shader stages, skipping only FXAA in buildshaders.

- Add per-binding swizzled texture views with signed pixel-format support for Metal bindings.

- Enable GPU texture load path by default and align Metal texture formats and BC decompression with D3D12.

- Wire missing Metal texture_load pipelines (DXT3A_AS_1111, R10G11B11/R11G11B10, UNORM/SNORM helpers) and update NEXT_STEPS tracking.
- Refresh Metal shader bytecode outputs and include Metal UI shaders.

- Update macOS DXBC->DXIL converter and Metal runtime plumbing changes.

- Record local docs/metal updates and submodule pointers.
@wmarti wmarti force-pushed the metal-backend-clean-msc branch from 954a62d to 5d51a8f Compare December 25, 2025 03:13
Probe norm16 UNORM/SNORM support at init and choose load shaders

and pixel formats to match D3D12/Vulkan fallback behavior when

the host lacks those formats, while keeping standard paths when

available.

Log a single info line when any norm16 format falls back to float

so trace validation can confirm behavior without noisy per-texture

output.
@wmarti wmarti force-pushed the metal-backend-clean-msc branch from 5d51a8f to cae9331 Compare December 25, 2025 03:14
Remove the Metal CPU untile fallback so texture loading matches D3D12

and Vulkan behavior, keeping GPU texture_load shaders as the only

path.

Document the next investigation steps for missing bindings, memexport

warnings, and resolve/transfer parity in NEXT_STEPS.
Log decoded fetch data for missing bindings, emit one-time memexport

shader hashes, and capture transfer format/pixel format pairs to

support render target parity analysis.

Update NEXT_STEPS tracking and validate with a trace run.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants