MacOS Port of Xenia #2332

wmarti · 2025-12-21T03:39:29Z

This is a work in progress port of Xenia for MacOS, currently only tested on Apple Silicon, over @Wunkolo’s ARM64 backend #2259. In theory this would also work on iOS devices, but only in regions where JIT compilation is available, and distribution is available outside of the AppStore, like the EU.

The Metal backend translates Xbox 360 shader microcode through multiple stages:

Xbox 360 Microcode (ucode)
    ↓ DxbcShaderTranslator (shared with D3D12)
DXBC (DirectX Bytecode)
    ↓ dxbc_to_dxil_converter (native dxbc2dxil)
DXIL (DirectX Intermediate Language)
    ↓ metal_shader_converter (Apple Metal Shader Converter)
Metal IR
    ↓ MTLDevice newLibraryWithData
MTLLibrary (GPU-executable)

The pipeline leverages:

DxbcShaderTranslator: Existing Xenia infrastructure for microcode → DXBC SM 5.1
dxbc2dxil: DirectXShaderCompiler tool (ported to MacOS for use as a native binary) -> DXIL SM 6.0
Metal Shader Converter: Apple's metalirconverter library for DXIL → Metal IR

Maybe eventually I'll go the SPIR-V -> MSL route, but this seemed the easiest for now (even though there's a big performance penalty).

The entire thing has been essentially “vibecoded” over the last ~year, so there's probably a minefield of issues, and there are many merge conflicts, and tons of bloat that's not meant to be committed (sorry, I'm still learning how to use git), but I'll get those issues ironed out over time. Not expecting this to get merged anytime soon, but just opening this PR for tracking. The "app" builds but does not run games yet. I've got xenia-gpu-metal-trace-dump reproducing traces captured in D3D12 backend from Gears of War ~mostly correctly. Other Games are WIP, as you can see below.

Gears of War

Halo 3

GTA IV

Implements control sequences such as conditional branching, breaking, and trapping

Register was getting stomped over

On the x64 side, this is the same as the `reset()` function resetting the label-manager

Resolving the function puts it into X0 and should be called immediately after. We were just calling ResolveFunction on ResolveFunction recursively

Things still get weird at the thunks, but this allows for callstacks between-to-guest calls

Also changes the register to X3 by default

Should be `GUEST_RET_ADDR` not `GUEST_CALL_RET_ADDR`.

Let the register type determine the reverse-size REV32 was also the wrong instruction to use.

`W1` is a possible HIR register allocation and using W1 here was stomping over it. Don't use W1, use the provided "scratch" register.

Derive the reversal-size from the register-size. REV32 is also the wrong one to be using here since it will reverse the bytes of upper and lower 32-bit words.

Share a somewhat similar calling convention as ARM64

Fixes callstacks!!!!

16-bit word rather than 8-bit

These instructions need to use an extra register to generate their constants if they are too large

`x0` was loading the thunk rather than using `xip` Fixes lots of init bugs!

Restart render encoders when the render target cache changes render pass attachments (dummy RT0 → real RTs, depth/stencil binding) to satisfy Metal validation. Bind stencil attachments based on the depth texture pixel format and size the dummy RT to avoid scissor/viewport assertions.

Derive viewport/scissor size from depth attachments for depth-only passes to satisfy Metal validation. Add Metal texture creation + GPU load shader coverage for k_16* UNORM/FLOAT formats plus DXT3A/DXT5A/CTX1 decoding to remove unsupported-format failures in Halo 3 traces.

Add Halo 3 and Gears of War trace filenames to Phase 2 so future Metal work can quickly validate render-target transfer and render-pass stability changes.

Metal validates pipeline attachment formats against the active render pass. For depth-only draws, attach a dummy color RT0 so pipelines that include color formats remain valid, and ensure viewport/scissor sizing prefers depth targets over the dummy when both exist.

Avoid invalid blit readbacks by capturing using the actual bound RT dimensions and skipping non-BGRA8 formats. Always create cubemap textures as cube arrays to match shader expectations, and size the dummy color attachment from the active depth target for depth-only passes.

Document the Resolve pipeline/key gap in MetalRenderTargetCache::Resolve and add a checklist to implement the missing compute resolve pipelines to avoid IssueCopy failures in real traces.

…entation/UI pipelines

…e script

…senter optimization

…r macOS

has207 · 2025-12-22T03:32:37Z

very cool! it would probably be better if you targeted xenia-canary though, there is much more active development there

wmarti · 2025-12-22T04:11:05Z

very cool! it would probably be better if you targeted xenia-canary though, there is much more active development there

Will look into it, thanks!

- Build Metal XeSL bytecode for all shader stages, skipping only FXAA in buildshaders. - Add per-binding swizzled texture views with signed pixel-format support for Metal bindings. - Enable GPU texture load path by default and align Metal texture formats and BC decompression with D3D12. - Wire missing Metal texture_load pipelines (DXT3A_AS_1111, R10G11B11/R11G11B10, UNORM/SNORM helpers) and update NEXT_STEPS tracking.

- Refresh Metal shader bytecode outputs and include Metal UI shaders. - Update macOS DXBC->DXIL converter and Metal runtime plumbing changes. - Record local docs/metal updates and submodule pointers.

Probe norm16 UNORM/SNORM support at init and choose load shaders and pixel formats to match D3D12/Vulkan fallback behavior when the host lacks those formats, while keeping standard paths when available. Log a single info line when any norm16 format falls back to float so trace validation can confirm behavior without noisy per-texture output.

Remove the Metal CPU untile fallback so texture loading matches D3D12 and Vulkan behavior, keeping GPU texture_load shaders as the only path. Document the next investigation steps for missing bindings, memexport warnings, and resolve/transfer parity in NEXT_STEPS.

Log decoded fetch data for missing bindings, emit one-time memexport shader hashes, and capture transfer format/pixel format pairs to support render target parity analysis. Update NEXT_STEPS tracking and validate with a trace run.

Wunkolo and others added 30 commits May 4, 2024 15:47

[a64] Implement control sequences

4538d1e

Implements control sequences such as conditional branching, breaking, and trapping

[a64] Fix ResolveFunction thunk

9641eea

Register was getting stomped over

[a64] Fix resetting of labels during Emplace

8030533

On the x64 side, this is the same as the `reset()` function resetting the label-manager

[a64] Fix ResolveFunctionThunk call

390e954

Resolving the function puts it into X0 and should be called immediately after. We were just calling ResolveFunction on ResolveFunction recursively

[a64] Pad code cache with 0x00 bytes

68d078f

[Vulkan] Non-seamless cube map filtering

e5e0f34

[a64] Draft Windows-ARM64 stack unwinding data

b7eb56f

Things still get weird at the thunks, but this allows for callstacks between-to-guest calls

[a64] Use X4 for address-generation veneer

fbd936c

[a64] Optimize Volatile/NonVolatile push/pop

3c8718f

[a64] Refactor thunk prolog/epilog

47cd4c6

[a64] Update Membase and Context register

5011408

[a64] Fix emitted function prolog/epilog

d7b79cf

[a64] Refactor XSP to SP

2b8935d

[a64] Implement OPCODE_{LOAD,STORE}_MMIO

4e46863

[a64] Remove redundant zero-extension during address computation

50fae57

Also changes the register to X3 by default

[a64] Fix CallIndirect return address

f74ab8f

Should be `GUEST_RET_ADDR` not `GUEST_CALL_RET_ADDR`.

[a64] Refactor REV{32,64} to REV

99814fa

Let the register type determine the reverse-size REV32 was also the wrong instruction to use.

[a64] Implement OPCODE_MEMSET

c2db7ac

[a64] Implement OPCODE_MEMORY_BARRIER

cd0f959

[a64] Implement OPCODE_{LOAD,STORE}_LOCAL

806f951

[a64] Implement OPCODE_ATOMIC_EXCHANGE

46bca32

[a64] Implement OPCODE_ATOMIC_COMPARE_EXCHANGE

fd1788f

[a64] Fix ComputeMemoryAddress{Offset} register stomp

2c83ef3

`W1` is a possible HIR register allocation and using W1 here was stomping over it. Don't use W1, use the provided "scratch" register.

[a64] Refactor REV{16,32} to REV

6a159c3

Derive the reversal-size from the register-size. REV32 is also the wrong one to be using here since it will reverse the bytes of upper and lower 32-bit words.

[a64] Reorganize guest register allocation

7939adc

Share a somewhat similar calling convention as ARM64

[a64] Remove standard prolog/epilog from thunks

1d87097

Fixes callstacks!!!!

[a64] Fix EmitGetCurrentThreadId type

613309c

16-bit word rather than 8-bit

[a64] Fix immediates being too large

3d02f23

These instructions need to use an extra register to generate their constants if they are too large

[a64] Increase function code size to 1MiB

8150be6

[a64] Fix external function call arguments

71e8305

`x0` was loading the thunk rather than using `xip` Fixes lots of init bugs!

wmarti added 13 commits December 19, 2025 21:16

[Docs] Update Metal trace validation targets

aa160f9

Add Halo 3 and Gears of War trace filenames to Phase 2 so future Metal work can quickly validate render-target transfer and render-pass stability changes.

[Docs] Track missing Metal EDRAM resolve pipelines

a3b7e27

Document the Resolve pipeline/key gap in MetalRenderTargetCache::Resolve and add a checklist to implement the missing compute resolve pipelines to avoid IssueCopy failures in real traces.

[Metal] Fix texture swizzling, Reverse-Z viewport, and implement pres…

a768a10

…entation/UI pipelines

[Metal] Fix Green Screen regression (ImmediateDrawer format) and trac…

993f7c8

…e script

[Metal] Fix Green Screen regression via swizzle revert (111R) and Pre…

242afb7

…senter optimization

[Build] Automate native dxbc2dxil build on macOS

5cb04b6

[Build] Finalized native dxbc2dxil and directx-headers integration fo…

ec4ee88

…r macOS

[Build] Remove Vulkan from macOS premake

015cd72

[ThirdParty] Convert to submodules

4b81238

wmarti force-pushed the metal-backend-clean-msc branch from d8a981d to 4b81238 Compare December 22, 2025 02:28

wmarti added 2 commits December 22, 2025 12:34

[Build] Point DXC submodule to ARM64 macOS fork

c5c84d0

[Build] Update DXC submodule

a6d686a

wmarti force-pushed the metal-backend-clean-msc branch from 4b81238 to 015cd72 Compare December 22, 2025 03:43

[Build] Update DXC submodule after C++17 fix

9ad7e59

Will Martin and others added 5 commits December 25, 2025 11:46

[Build] Update DXC submodule after C++17 fix

930c411

[Build] Update disruptorplus submodule

28edd84

[Metal] Update macOS tooling and shader bytecode

3abd995

- Refresh Metal shader bytecode outputs and include Metal UI shaders. - Update macOS DXBC->DXIL converter and Metal runtime plumbing changes. - Record local docs/metal updates and submodule pointers.

[Build] Update FFmpeg submodule for Windows LTO fix

de8a06a

wmarti force-pushed the metal-backend-clean-msc branch from 954a62d to 5d51a8f Compare December 25, 2025 03:13

wmarti force-pushed the metal-backend-clean-msc branch from 5d51a8f to cae9331 Compare December 25, 2025 03:14

wmarti added 2 commits December 25, 2025 12:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MacOS Port of Xenia #2332

MacOS Port of Xenia #2332

wmarti commented Dec 21, 2025 •

edited

Loading

Uh oh!

has207 commented Dec 22, 2025

Uh oh!

wmarti commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

MacOS Port of Xenia #2332

Are you sure you want to change the base?

MacOS Port of Xenia #2332

Conversation

wmarti commented Dec 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Gears of War

Halo 3

GTA IV

Uh oh!

has207 commented Dec 22, 2025

Uh oh!

wmarti commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

wmarti commented Dec 21, 2025 •

edited

Loading