Skip to content

General aarch64 improvements & Apple Silicon support #1255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 53 commits into
base: main
Choose a base branch
from

Conversation

exverge-0
Copy link
Contributor

@exverge-0 exverge-0 commented Jul 11, 2024

The re-compiler hasn't been implemented and only the interpreter works. (and there's already an WIP aarch64 recompiler for cemu android, assuming that they plan on merging into this repo)

Closes #1490

@exverge-0 exverge-0 changed the title macOS: Apple Silicon (interpreter only) macOS: Apple Silicon support (interpreter only) Jul 11, 2024
@Exzap
Copy link
Member

Exzap commented Jul 11, 2024

I'll be honest, it will probably be a while until we have a decent ARM recompiler that will outperform Rosetta so this seems a bit early. But I generally don't want to stand in the way of adding new target platforms. That said, the changes made here are detrimental for other platforms. Just on a quick glance some obvious issues:

  • Putting gx2WriteGatherPipe access behind a mutex is absolutely gonna wreck performance. Better to work with atomic pointers or manually insert memory barriers where needed.
  • mmuRange_HIGHMEM is supposed to match Wii U's memory map. I get that there is a 16KB page size restriction but other platforms don't have this so it should be conditional.
  • Lots of changes in the h264 decoder, would be nice if you could give an explanation?

@exverge-0
Copy link
Contributor Author

exverge-0 commented Jul 11, 2024

Lots of changes in the h264 decoder, would be nice if you could give an explanation?

On OSX, C functions are compiled with a leading underscore and the header files define the ASM functions as a normal C function, resulting in functions being called with a leading underscore but defined without one.
Admittedly now I see this could probably be done by just marking them with extern, I'll do that instead This results in the same thing.

Putting gx2WriteGatherPipe access behind a mutex is absolutely gonna wreck performance. Better to work with atomic pointers or manually insert memory barriers where needed.

Makes sense, I'll revert that and try something else

mmuRange_HIGHMEM is supposed to match Wii U's memory map. I get that there is a 16KB page size restriction but other platforms don't have this so it should be conditional.

Ah, I wasn't sure but that's what I somewhat assumed.

@exverge-0 exverge-0 marked this pull request as draft July 11, 2024 17:50
@exverge-0 exverge-0 force-pushed the macos-arm64 branch 4 times, most recently from e9a00d3 to 0328901 Compare July 12, 2024 02:33
@exverge-0 exverge-0 force-pushed the macos-arm64 branch 6 times, most recently from 6998fdd to ebb249c Compare July 20, 2024 19:35
exverge-0 added 14 commits July 20, 2024 15:37
…m64 rather than aarch64

cmake: Fix compiling for Apple Silicon
I've changed the range to accommodate for the Project Zero bug stated however I'm not sure if causes any other issues or if this is used, however it seems to work fine. Please correct me if true.
On Apple Silicon, PPCTimer estimates a terribily inaccurate RSTSC frequency and results in games (specifically tested Color Splash & MK8) run extremely fast especially in the title screens which unsurpisingly doesn't work that well.
The value hardcoded is the same frequency as on Rosetta.
Admittedly this probably isn't the best solution however it is accurate and it works.
"A MenuItem ID of Zero does not work under Mac"
Despite being disabled in InitBlendState, this still causes errors on MoltenVk, so just skip it altogether
Seemingly fixes cemu-project#396 (there's a multitude of errors there in the comments, specifically referring to the issue), however I don't own BOTW and can't confirm
Apple seemed to not have offsets for arguments on the stack
Either that or the offsets were just wrong, I'll test on a Linux VM and remove the conditonal if this still happens
neebyA added a commit to neebyA/Cemu that referenced this pull request May 10, 2025
neebyA added a commit to neebyA/Cemu that referenced this pull request May 10, 2025
@neebyA neebyA mentioned this pull request May 12, 2025
@neebyA neebyA mentioned this pull request May 23, 2025
27 tasks
exverge-0 added 3 commits May 26, 2025 00:33
Co-authored-by: neebyA <[email protected]>

Revert "Update ih264_intra_pred_filters.h"

This reverts commit 0ac296d.

Revert "Update ih264_deblk_edge_filters.h"

This reverts commit 0e48f86.

Revert "fix CI on windows"

This reverts commit 2ccb5dd.

Revert "fix compiling on x64"

This reverts commit 99378f1.

Revert "update ih264d macros"

This reverts commit 0924e11.

Revert "ih264d: Modify to compile with AppleClang & for M1"

This reverts commit d2a9c31.
@exverge-0 exverge-0 changed the title macOS: Apple Silicon support (interpreter only) General aarch64 improvements & Apple Silicon support May 26, 2025
@neebyA

This comment was marked as resolved.

@Exzap
Copy link
Member

Exzap commented May 26, 2025

@neebyA Try changing the start of the loop to this as a workaround:

		for (auto&& [_jumpStart, _jumpInfo] : jumps)
		{
			auto& jumpStart = _jumpStart;
			auto& jumpInfo = _jumpInfo;

I am not familiar with macOS development so I am not sure what the best strategy is in regards of which xcode version to target. But in general I am against bumping a compiler version when something can be worked around easily.

@exverge-0
Copy link
Contributor Author

exverge-0 commented May 26, 2025

I used @Exzap 's workaround in order to avoid said issue, and it seems to compile fine.

Otherwise, with the addition of the aarch64 recompiler, this PR is essentially done. I haven't done much testing, from what I've tested it seems to run smoothly with no new issues and the only real difference I've noticed being that the arm64 seemingly has less lag spikes, whereas under Rosetta it would stutter while compiling shaders, but otherwise it's seemingly the same. (though Rosetta already ran at 60/30 fps on the games I tested, so any performance improvement wouldn't be noticeable)
I also reverted the change from #1255 (comment) so as to separate it from this PR to not have to include the odd workaround (which hadn't prevented any games from running/caused any visible issues in my testing, only console errors)

@exverge-0 exverge-0 marked this pull request as ready for review May 26, 2025 17:11
@exverge-0 exverge-0 requested a review from Exzap May 26, 2025 17:11
@hauntek
Copy link

hauntek commented May 28, 2025

  build-macos:
    runs-on: macos-14
    steps:
    - name: "Checkout repo"
      uses: actions/checkout@v4
      with:
        submodules: "recursive"

    - name: "Select Xcode 16.2"
      run: sudo xcode-select -s /Applications/Xcode_16.2.app/Contents/Developer

    - name: "Verify Xcode Version"
      run: xcodebuild -version
        
    - name: Setup release mode parameters
      run: |
        echo "BUILD_MODE=release" >> $GITHUB_ENV
        echo "BUILD_FLAGS=" >> $GITHUB_ENV
        echo "Build mode is release"

    - name: Setup build flags for version
      if: ${{ inputs.next_version_major != '' }}
      run: |
        echo "[INFO] Version ${{ inputs.next_version_major }}.${{ inputs.next_version_minor }}"
        echo "BUILD_FLAGS=${{ env.BUILD_FLAGS }} -DEMULATOR_VERSION_MAJOR=${{ inputs.next_version_major }} -DEMULATOR_VERSION_MINOR=${{ inputs.next_version_minor }}" >> $GITHUB_ENV
        
    - name: "Install system dependencies"
      run: |
        brew update
        brew install ninja nasm automake libtool

    - name: "Install molten-vk"
      run: |
        curl -L -O https://github.com/KhronosGroup/MoltenVK/releases/download/v1.3.0/MoltenVK-macos.tar
        tar xf MoltenVK-macos.tar
        sudo mkdir -p /usr/local/lib
        sudo cp MoltenVK/MoltenVK/dynamic/dylib/macOS/libMoltenVK.dylib /usr/local/lib

    - name: "Setup cmake"
      uses: jwlawson/actions-setup-cmake@v2
      with:
        cmake-version: '3.29.0'

    - name: "Bootstrap vcpkg"
      run: |
        bash ./dependencies/vcpkg/bootstrap-vcpkg.sh
        
    - name: 'Setup NuGet Credentials for vcpkg'
      shell: 'bash'
      run: |
        mono `./dependencies/vcpkg/vcpkg fetch nuget | tail -n 1` \
        sources add \
        -source "https://nuget.pkg.github.com/${{ github.repository_owner }}/index.json" \
        -storepasswordincleartext \
        -name "GitHub" \
        -username "${{ github.repository_owner }}" \
        -password "${{ secrets.GITHUB_TOKEN }}"
        mono `./dependencies/vcpkg/vcpkg fetch nuget | tail -n 1` \
        setapikey "${{ secrets.GITHUB_TOKEN }}" \
        -source "https://nuget.pkg.github.com/${{ github.repository_owner }}/index.json"
        
    - name: "cmake x64"
      run: |
        mkdir build
        cd build
        cmake .. ${{ env.BUILD_FLAGS }} \
        -DCMAKE_BUILD_TYPE=${{ env.BUILD_MODE }} \
        -DCMAKE_OSX_ARCHITECTURES=x86_64 \
        -DMACOS_BUNDLE=ON \
        -G Ninja
        
    - name: "Build Cemu x64"
      run: |
        cmake --build build

    - name: "Move x64 artifact"
      run: |
        mkdir bin/x64
        mv bin/Cemu_release.app bin/x64/Cemu.app
        mv bin/x64/Cemu.app/Contents/MacOS/Cemu_release bin/x64/Cemu.app/Contents/MacOS/Cemu
        sed -i '' 's/Cemu_release/Cemu/g' bin/x64/Cemu.app/Contents/Info.plist
        chmod a+x bin/x64/Cemu.app/Contents/MacOS/{Cemu,update.sh}
        codesign --force --deep --sign - bin/x64/Cemu.app

    - name: "cmake arm64"
      run: |
        rm -rf build
        mkdir build
        cd build
        cmake .. ${{ env.BUILD_FLAGS }} \
        -DCMAKE_BUILD_TYPE=${{ env.BUILD_MODE }} \
        -DCMAKE_OSX_ARCHITECTURES=arm64 \
        -DMACOS_BUNDLE=ON \
        -G Ninja

    - name: "Build Cemu arm64"
      run: |
        cmake --build build

    - name: "Move arm64 artifact"
      run: |
        mkdir bin/arm64
        mkdir bin/universal
        mv bin/Cemu_release.app bin/arm64/Cemu.app
        cp -R bin/arm64/Cemu.app bin/universal/Cemu.app
        mv bin/arm64/Cemu.app/Contents/MacOS/Cemu_release bin/arm64/Cemu.app/Contents/MacOS/Cemu
        sed -i '' 's/Cemu_release/Cemu/g' bin/arm64/Cemu.app/Contents/Info.plist
        chmod a+x bin/arm64/Cemu.app/Contents/MacOS/{Cemu,update.sh}
        codesign --force --deep --sign - bin/arm64/Cemu.app

    - name: "Create Universal binary"
      run: |
        rm bin/universal/Cemu.app/Contents/MacOS/Cemu_release
        lipo -create -output bin/universal/Cemu.app/Contents/MacOS/Cemu \
        bin/x64/Cemu.app/Contents/MacOS/Cemu \
        bin/arm64/Cemu.app/Contents/MacOS/Cemu
        rm bin/universal/Cemu.app/Contents/Frameworks/libusb-1.0.0.dylib
        lipo -create -output bin/universal/Cemu.app/Contents/Frameworks/libusb-1.0.0.dylib \
        bin/x64/Cemu.app/Contents/Frameworks/libusb-1.0.0.dylib \
        bin/arm64/Cemu.app/Contents/Frameworks/libusb-1.0.0.dylib
        sed -i '' 's/Cemu_release/Cemu/g' bin/universal/Cemu.app/Contents/Info.plist
        chmod a+x bin/universal/Cemu.app/Contents/MacOS/{Cemu,update.sh}
        codesign --force --deep --sign - bin/universal/Cemu.app
        rm -rf bin/x64/Cemu.app bin/arm64/Cemu.app

    - name: Prepare artifacts
      run: |
        mkdir bin/Cemu_app_universal
        mv bin/universal/Cemu.app bin/Cemu_app_universal/Cemu.app
        ln -s /Applications bin/Cemu_app_universal/Applications
        hdiutil create ./bin/tmp.dmg -ov -volname "Cemu" -fs HFS+ -srcfolder "./bin/Cemu_app_universal"
        hdiutil convert ./bin/tmp.dmg -format UDZO -o bin/Cemu_universal.dmg
        rm bin/tmp.dmg
              
    - name: Upload universal artifact
      uses: actions/upload-artifact@v4
      with:
        name: cemu-bin-macos-universal
        path: ./bin/Cemu_universal.dmg
      - uses: actions/download-artifact@v4
        with:
          name: cemu-bin-macos-universal
          path: cemu-bin-macos-universal

      - name: Create release from macos-bin-universal
        run: cp cemu-bin-macos-universal/Cemu_universal.dmg upload/cemu-${{ env.CEMU_VERSION }}-macos-12-universal.dmg

I suggest building a universal version would be better, to avoid distinguishing between x86_64 and arm64 versions. After all, the universal version can run directly on both x86_64 and arm64 platforms. Moreover, on arm64 platforms, the universal version can switch to running the x86_64 version through the Rosetta 2 option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add native ARM version for Mac
4 participants