Skip to content

Releases: ConfettiFX/The-Forge

Release 1.63 - March 20th, 2025 - Advanced RTX Global Illumination Middleware | Quest Run-time switches to OpenXR | Triangle Visibility Buffer with Programmable MSAA | Ephemeris running on low end mobile devices | Particle System UT now runs on Adreno Devices with lower storage buffer limits

21 Mar 09:51
7443530

Choose a tag to compare

Release 1.63 - March 20th, 2025 - Advanced RTX Global Illumination Middleware | Quest Run-time switches to OpenXR | Triangle Visibility Buffer with Programmable MSAA | Ephemeris running on low end mobile devices | Particle System UT now runs on Adreno Devices with lower storage buffer limits

Advanced RTX Global Illumination Middleware

Over the last three years we developed a solution for Global Illumination based on the RTX / DXR interfaces. This development was fuelled by the games that will use this solution. We are adding it to our arsenal of graphics middleware, which is not publicly available. In case you want more information, please drop us a note.
Here is a feature list and screenshots:

  • New sky shading model for more accurate GI
  • Normal texture generation from depth
  • Single pass depth hierarchy from FSSR
  • Screen space ray marching for evaluation of screen space GI
  • Single bounce hybrid ray tracing for GI
  • Multi bounce GI with probe volume cascades
  • World and frustum GPU hash implementation for samples accumulation
  • Denoising with reprojection and a weighted blur
  • Batching rays for indirect dispatches as an optimization
  • Spatiotemporal reservoir sampling

Following the trajectory of RTX based Global Illumination approaches, this should be the most advanced and already proven system out there ...

Android Samsung S24 Xclipse 940
RTX GI Samsung S24

PS5
RTX GI PS5

Steam Deck
RTX GI Steamdeck

Xbox Series X
RTX GI Xbox Series X

iPhone 15 Pro
RTX GI iPhone 15 Pro

Quest Run-time switches to OpenXR

We made the internal switch to finally use OpenXR. A funfact is that we used and helped to develop OpenXR in projects since about 2016 but never implemented it into our own code base ... only customer code bases.

Triangle Visibility Buffer with Programmable MSAA

This is an oldie but goodie :-) ... I (Wolfgang) wrote several blog posts about it and talked on conferences about it and now we have it in our code base. I remember having been involved at Rockstar Games with developing a programmable MSAA approach on XBOX and PS in 2006 (?), we had examples running with it in our code base over the years. We finally found time to bring it also into our open-source code base inside the Triangle Visibility Buffer.

Android Samsung S21 Mali-G78
Triangle Visibility Buffer with Programmable MSAA on Android Samsung S21 Mali-G78

PS4 Pro
Triangle Visibility Buffer with Programmable MSAA on PS4

PS5
Triangle Visibility Buffer with Programmable MSAA on PS5

Xbox Series X
Triangle Visibility Buffer with Programmable MSAA on Xbox Series X

Quest
Triangle Visibility Buffer with Programmable MSAA on Quest

Debug screenshots
Godray samples 4xMSAA
Godray samples 4xMSAA

VB Shading samples 4xMSAA
VB Shading samples 4xMSAA

Stencil mask 4xMSAA
Stencil mask 4xMSAA

Ephemeris running on low end mobile devices

This was also one of the overdue things. Ephemeris is our skydome system that shipped in game engines before. Most of the time we had it running on PS4 class hardware but now it also supports
low-end mobile phones. It is in our non-public middleware folder. Drop me a note if you want to know more.

Particle System UT now runs on Adreno Devices that do not support bindless Textures

Now there is a fallback for devices with lower storage buffer limits. Also we further optimized the size of the particle data to accommodate more particles in the particle buffer.

Release 1.62 - February 27th, 2025 - C99 Vulkan/DirectX rewrite | Scene resolution using GPUCfg | SRT updates | In-Flight Motion Vector

28 Feb 07:30
02dcb27

Choose a tag to compare

Release 1.62 - February 27th, 2025 - C99 Vulkan/DirectX rewrite | Scene resolution using GPUCfg | SRT updates | In-Flight Motion Vector

C99 Vulkan/DirectX API rewrite

Our quest to move as much code to C99 as possible is motivated by the idea that small teams deal better with a C99 code base. We are targeting this framework at small teams that need to be agile and quick.
We finished a first pass on the Vulkan and DirectX run-time. There is more to come.

Scene Resolution using GPUCfg

On mobile phones and mobile devices, the scene resolution is quite often very different even for one category of devices like Android Phones. So we have a better system now in place to really define scene and screen resolution with the GPU config system.

FSL 2 improvements

After having shipped a Shader Resource Table based FSL language two weeks ago, we have done a bit more clean-up work and unified and simplified naming conventions.

Quest Support

We make on-going improvements for the Quest support.

Triangle Visibility Buffer 2.0

We found several issues with TVB 2.0 and fixed those. Next step is to make another pass on the architecture and see how much better we can make the memory access patterns to improve performance.

In-Flight Motion Vectors

Many people still store motion vectors in render targets. For the last 15+ years that didn't make much sense anymore because the memory access pattern to write and read those motion vectors was so costly that calculating them on the fly made more sense.
This approach is based on Ben Padget's article in one of the ShaderX books ... he will smile about the fact that after all this time we are still quoting his article ...

Release 1.61 - February 13th, 2025 - FSL 2 | Browserstack | Android / Vulkan | DirectX 11 | Quest | flecs

27 Feb 15:20

Choose a tag to compare

Release 1.61 - February 13th, 2025 - FSL 2 | Browserstack | Android / Vulkan | DirectX 11 | Quest | flecs

FSL 2

We are enforcing now a better memory access pattern for root signatures. We unified root signature usage, so that in best case only one or two need to be used for a game. To do this we added a unified shader resource table that is shared between FSL and C++.
We wrote a more thorough documentation here:

https://github.com/ConfettiFX/The-Forge/wiki/FSL-Programming-Guide

This is a good example how shader languages should evolve. Instead of mimicking the misguided efforts in writing a C++ shader language, a shader language should mimic the memory access patterns of a GPU and guide the user towards the "best and most performant" results.
From a practical standpoint, unreliable and non-functioning shader compilers are a bigger problem than any language syntax to please some twisted abstraction that has no performance benefit and happens for no good reason.

Browserstack

For testing mobile phones we integrate Browserstack more and more into our workflows.

Android / Vulkan

After having finished our more than four year stint on the Warzone Mobile project, making and keeping the game run on Android phones, this same team is now making sure our internal Android / Vulkan run-time lives up to the same or higher expectations. Browserstack is used to test a larger number of phones now. Higher-end phones will now support our Triangle Visibility Buffer unit tests.
It appears that a lot of our priorities in the game industry are shifting towards mobile and also to a lesser extend consoles, as the most important gaming platforms now.
We are trying to find ways to make sure mobile is the first class citizen.

DirectX 11

We removed DirectX 11 support with the retirement of Windows 10.

Quest Support

We help developing the Quest since 2016 now. We somehow missed to take care of our own Quest run-time :-) ... we are currently catching up on all the missed opportunities here and updating and upgrading it. Making it a better part of our test suite and adding more unit test support.

flecs

We improved our flecs integration and upgraded to latest.

Release 1.60 - October 11th, 2024 - GPU Work Graphs | Filesystem Refactor | Window System Refactor Phase 1

14 Feb 23:14

Choose a tag to compare

Release 1.60 - October 11th, 2024 - GPU Work Graphs | Filesystem Refactor | Window System Refactor Phase 1

GPU Work Graphs

We are testing GPU Work Graphs now for a while. We see opportunities to implement more complex compute driven interactions, which helps us to move towards GPU-driven rendering more and more. The current example runs the clear buffers and triangle culling in a GPU Work Graph if GPU Work Graphs are supported. Otherwise it takes the old path. So visually there is no change.

Filesystem Refactor

We data drive the everything in a game engine. For the file system, we finally managed to implement that too in the public repository. We also removed unnecessary copies and symlinks. Please note our file system consists of "two" file systems. One for the run-time and one for tools. You can't ship a tools file system in a game :-) I know that this is well known but people always want to check if a path is right, a directory exists, a file exists, before they load this one file ... yeaahh ... so our run-time file system prevents that from happening.

Window System Refactor Phase 1

With the on-going challenges in the area of upscaling and general window management on Operating systems that support windows, we decided to refactor our window system to bring it up to existing standards and allow us to integrate upscalers easier. Currently most upscalers do not consider the pixel center, they were written by people who never studied Bresenham's algorithm. We wanted to make sure our upscaler actually works without introducing Moire patterns or strong staircase effects like the ones currently promoted by hardware vendors.
We will add more functionality in Phase 2 ...

Steamdeck

We are pushing forward in making Steamdeck a first class citizen. It represents our Linux run-time efforts, so we can switch if its necessary to any Linux distro ...

Release 1.59 - September 6th, 2024 - STAR WARS™: Bounty Hunter™ | Replaced Gainput with our own Input library | Removed Vulkan from Windows Run-Time | Removed API Switch | Third-Party Integration

14 Feb 23:14

Choose a tag to compare

Release 1.59 - September 6th, 2024 - STAR WARS™: Bounty Hunter™ | Replaced Gainput with our own Input library | Removed Vulkan from Windows Run-Time | Removed API Switch | Third-Party Integration

STAR WARS™: Bounty Hunter™
Bounty Hunter was ported with the help of The Forge Framework to all the platforms mentioned in the screenshot:
STAR WARS™: Bounty Hunter™

Replaced Gainput with our own Input library

We wrote a new input library from scratch in C. Its design follows the architecture of the rendering API. So one high-level interface file IInput.h and then platform specific files for each of the target devices. It has less lines of code compared to gainput and is easier to maintain for a small team. We are still testing it as we speak. Let us know if you see any bugs.

Removed Vulkan from Windows Run-time

Over the last couple of years in many of our projects, it became apparent that the best option to ship a PC / Windows game is DirectX 12. The main reason is the reduced QA effort and reliability. The other reason is that we constantly get forced to upgrade on PC to a newer version, while mobile -which is the more important platform- stays far behind. So we are not officially supporting the Vulkan run-time on Windows anymore. We have an internal version that we test and obviously we support Vulkan on Android, Switch and Steamdeck (native support).

Removed API Switch from the Run-time

Concluding that Vulkan is not a good API for PC anymore, we removed the switching functionality we had in there before to switch between DirectX 12 and Vulkan on PC and OpenGL ES 2.0 and Vulkan on Android.
For Android we utilized the switching functionality to switch between OpenGL ES 2.0 and Vulkan for a business class application that we helped to build (Facebook Application framework). This was necessary to run on billions of mobile devices. Looking at the latest numbers it doesn't make sense for us to support OpenGL ES 2.0 and therefore we dropped support and also don't need switching anymore.

Removed Commercial Middleware from GitHub

We decided to remove our commercial middleware from GitHub. Development will now happen in our internal repositories only.

Commercial Console licenses

For the last seven years we offered TF for free to anyone asking. We are going to change this now for the console platforms XBOX, Playstation and Switch. You will require a commercial license to use those from here on.

Removed unit tests

  • 09a_HybridRaytracing

Third-Party Libraries

We are improving our Third-Party Library integration substantially by making the ones we use more integrated into The Forge Eco System. While doing so we removed the ones we do not use anymore. Here is a list:

  • soloud
  • rmem
  • cjson
  • MTuner
  • TinyXML

Release 1.58 - June 17th, 2024 Behemoth | Compute-Driven Mega Particle System | Triangle Visibility Buffer 2.0

05 Sep 21:36
c166569

Choose a tag to compare

Release 1.58 - June 17th, 2024 Behemoth | Compute-Driven Mega Particle System | Triangle Visibility Buffer 2.0 |

Announce trailer for Behemoth

We helped Skydance Interactive to optimize Behemoth last year. Click on the image below to see the announce trailer:

Behemoth Trailer from June 2024

Compute-Based Mega Particle System

This unit test was based on some of our research into software rasterization and GPU-driven rendering. A particle system completely running in very few compute shaders with one large buffer holding most of the data. Like with all things GPU-Driven, the trick is to execute one compute shader once on one buffer to reduce read / write memory bandwidth. Although this is not new wisdom, you will be surprised how many particle systems get this still wrong ... having compute shaders for each stage of the particle life time or even worse doing most of the particle work on the CPU.
This particle system was demoed last year in a few talks in September on a Samsung S22. Here are the slides:

http://www.conffx.com/WolfgangEngelParticleSystem.pptx

It is meant to be used to implement next-gen Mega Particle systems in which we simulate always 100000th or millions of particles at once instead of the few dozen ones contemporary systems simulate.

Android Samsung S22 1170x540 resolution

This screenshot shows 4 million firefly-like particles, with 10000 lights attached to them and a shadow for the directional light. Those numbers were thought to be not possible on mobile phones before.
Mega Particle System Android Samsung S22

Android Samsung S23 1170x540 resolution

Same setting as above but this time also with 8 Shadows from Point Lights additionally.
Mega Particle System Android Samsung S23

Android Samsung S24 1170x540 resolution

Same setting as above but this time also with 8 Shadows from Point Lights additionally.
Mega Particle System Android Samsung S24

PS5 running at 4K

Mega Particle System PS5

Windows with AMD RX 6400 at 1080p

Mega Particle System PC Windows

Triangle Visibility Buffer 2.0

we have the new compute based TVB 2.0 approach now running on all platforms (on Android only S22). You can download slides from the I3D talk from

http://www.conffx.com/I3D-VisibilityBuffer2.pptx

Release 1.57 - May 8th, 2024 Visibility Buffer 2.0 Prototype | Visibility Buffer 1.0 One Draw call

08 May 21:04

Choose a tag to compare

Release 1.57 - May 8th, 2024 Visibility Buffer 2.0 Prototype | Visibility Buffer 1.0 One Draw call

Visibility Buffer Research - I3D talk

We are giving a talk about our latest Visibility Buffer research on I3D. Here is a short primer what it is about:

The original idea of the Triangle Visibility Buffer is based on an article by [[burns2013]. [schied15] and [schied16] extended what was described in the original article. Christoph Schied implemented a modern version with an early version of OpenGL (supporting MultiDrawIndirect) into The Forge rendering framework in September 2015.
We ported this code to all platforms and simplified and extended it in the following years by adding a triangle filtering stage following [chajdas] and [wihlidal17] and a new way of shading.
Our on-going improvements simplified the approach incrementally and the architecture started to resemble what was described in the original article by [burns2013] again, leveraging the modern tools of the newer graphics APIs.
In contrast to [burns2013], the actual storage of triangles in our implementation of a Visibility Buffer happens due to the triangle removal and draw compaction step with an optimal “massaged” data set.
By having removed overdraw in the Visibility Buffer and Depth Buffer, we run a shading approach that shades everything with one regular draw call. We called the shading stage Forward++ due to its resemblance to forward shading and its usage of a tiled light list for applying many lights. It was a step up from Forward+ that requires numerous draw calls.
We described all this in several talks at game industry conferences, for example on GDCE 2016 [engel16] and during XFest 2018, showing considerable performance gains due to reduced memory bandwidth compared to traditional G-buffer based rendering architectures.
A blog post that was updated over the years for what we call now Triangle Visibility Buffer 1.0 (TVB 1.0) can be found here [engel18].

Over the last years we extended this original idea with a Order-Independent Transparency approach (it is more efficient to sort triangle IDs in a per-pixel linked list compared to storing layers of a G-Buffer), software VRS and then we developed a Visibility Buffer approach that doesn't require draw calls to fill the depth and Visibility Buffer and one that requires much less draw calls in parallel.
This release offers -what we call- an updated Triangle Visibility Buffer 1.0 (TVB 1.0) and a prototype for the Triangle Visibility Buffer 2.0 (TVB 2.0).

The changes to TVB 1.0 are evolutionary. We used to map each mesh to an indirect draw element. This reuqired the use of DrawID to map back to the per-mesh data. When working on a game engine with a very high amount of draw calls, it imposed a limitation on the number of "draws" we could do, due to having only a limited number of bits available in the VB.
Additionally, instancing was implemented using a separate instanced draw for each instanced mesh. We refactored the data flow between the draws and the shade pass.
There is now no reliance on DrawID and instances are handled transparently using the same unified draw. This both simplifies the flow of data and allows us to draw more "instanced" meshes.
Apart from being able to use a very high-number of draw calls, the performance didn't change.

The new TVB 2.0 approach is revolutionary in a sense that it doesn't use draw calls anymore to fill the depth and visibility buffer. There are two compute shader invocations that filter triangles and eventually fill the depth and visibility buffer.
Not using draw calls anymore, makes the whole code base more consistent and less convoluted -compared to TVB 1.0-.

You can find now the new Visibilty Buffer 2 approach in

The-Forge\Examples_3\Visibility_Buffer2

This is still in an early stage of development. We only support a limited number of platforms: Windows D3D12, PS4/5, XBOX, and macOS / iOS.

Sanitized initRenderer

we cleaned up the whole initRenderer code. Merged GPUConfig into GraphicsConfig and unified naming.

Metal run-time improvements

We improved the Metal Validation Support.

Art

Everything related to Art assets is now in the Art folder.

Bug fixes

Lots of fixes everywhere.

References:
[burns2013] Christopher A. Burns, Warren A. Hunt, "The Visibility Buffer: A Cache-Friendly Approach to Deferred Shading", 2013, Journal of Computer Graphics Techniques (JCGT) 2:2, Pages 55 - 69.

[schied2015] Christoph Schied, Carsten Dachsbacher, "Deferred Attribute Interpolation for Memory-Efficient Deferred Shading" , Kit Publication Website: http://cg.ivd.kit.edu/publications/2015/dais/DAIS.pdf

[schied16] Christoph Schied, Carsten Dachsbacher, "Deferred Attribute Interpolation Shading", 2016, GPU Pro 7, Pages

[chajdas] Matthaeus Chajdas, GeometryFX, 2016, AMD Developer Website http://gpuopen.com/gaming-product/geometryfx/

[wihlidal17] Graham Wihlidal, "Optimizing the Graphics Pipeline with Compute", 2017, GPU Zen 1, Pages 277--320

[engel16] Wolfgang Engel, "4K Rendering Breakthrough: The Filtered and Culled Visibility Buffer", 2016, GDC Vault: https://www.gdcvault.com/play/1023792/4K-Rendering-Breakthrough-The-Filtered

[engel18] Wolfgang Engel, "Triangle Visibility Buffer", 2018, Wolfgang Engel's Diary of a Graphics Programmer Blog http://diaryofagraphicsprogrammer.blogspot.com/2018/03/triangle-visibility-buffer.html

Release 1.56 - April 4th, 2024 I3D | Warzone Mobile | Visibility Buffer | Aura on macOS | Ephemeris on Switch | GPU breadcrumbs | Swappy in Android | Screen-space Shadows | Metal Debug Markers improved

04 Apr 18:52

Choose a tag to compare

Release 1.56 - April 4th, 2024 I3D | Warzone Mobile | Visibility Buffer | Aura on macOS | Ephemeris on Switch | GPU breadcrumbs | Swappy in Android | Screen-space Shadows | Metal Debug Markers improved

I3D

We are sponsoring I3D again. Come by and say hi! We also will be giving a talk on the new development around Triangle Visibility Buffer.

I3D Sponsorship

Warzone Mobile launched

We work on Warzone Mobile since August 2020. The game launched on March 21, 2024.

Warzone Mobile

Warzone Mobile

Visibility Buffer

We removed CPU cluster culling and simplified the animation data usage. Now traingle filtering only takes one dispatch each frame again.

Swappy frame pacer is now vailable in Android/Vulkan

We integrated the Swappy frame pacer into the Android / Vulkan eco system.

GPUCfg system improved with more ids and less string compares

we did another pass on the GPUCfg system and now we can generate the vendor Ids and model Ids with a python script to keep the *_gpu.data list easily up to date for each platform.
We removed most of the name comparisons and replaced them with the id comparisons which should speed up parsing time and is more specific.

Screen-Space Shadows in UT9

We added to the number of shadow approaches in that unit test screen-space shadows. These are complementary to regular shadow mapping and add more detail. We also fixed a number of inconsistencies with the other shadow map approaches.

PS5 - Screen-Space Shadows on
Screen-Space Shadows PS5

PS5 - Screen-Space Shadows off
Screen-Space Shadows PS5

Nintendo Switch
Screen-Space Shadows Switch

PS4
Screen-Space Shadows PS4

GPU breadcrumbs on all platforms

Now you can have GPU crash reports on all platforms. We skipped OpenGL ES and DX11 so ...

A simple example of a crash report is this:

2024-04-04 23:44:08 [MainThread ] 09a_HybridRaytracing.cp:1685 ERR| [Breadcrumb] Simulating a GPU crash situation (RAYTRACE SHADOWS)...
2024-04-04 23:44:10 [MainThread ] 09a_HybridRaytracing.cp:2428 INFO| Last rendering step (approx): Raytrace Shadows, crashed frame: 2

We will extend the reporting a bit more over time.

Ephemeris now also runs on Switch ...

Release 1.55 - March 1st, 2024 - Ephemeris | gpu.data | Many bug fixes and smaller improvements

01 Mar 19:29

Choose a tag to compare

Release 1.55 - March 1st, 2024 - Ephemeris | gpu.data | Many bug fixes and smaller improvements

Ephemeris 2.0 Update

We improved Ephemeris again and support it now on more platforms. Updating some of the algorithms used and adding more features.

Ephemeris 2.0 on February 28th, 2024

Now we are supporting PC, XBOX'es, PS4/5, Android, Steamdeck, iOS (requires iPhone 11 or higher (so far not Switch)

Ephemeris on XBOX Series X
Ephemeris 2.0 on February 28th, 2024

Ephemeris on Android
Ephemeris 2.0 on February 28th, 2024

Ephemeris on PS4
Ephemeris 2.0 on February 28th, 2024

Ephemeris on PS5
Ephemeris 2.0 on February 28th, 2024

IGraphics.h

We changed the graphics interface for cmdBindRenderTargets

// old
DECLARE_RENDERER_FUNCTION(void, cmdBindRenderTargets, Cmd* pCmd, uint32_t renderTargetCount, RenderTarget** ppRenderTargets, RenderTarget* pDepthStencil, const LoadActionsDesc* loadActions, uint32_t* pColorArraySlices, uint32_t* pColorMipSlices, uint32_t depthArraySlice, uint32_t depthMipSlice)
// new
DECLARE_RENDERER_FUNCTION(void, cmdBindRenderTargets, Cmd* pCmd, const BindRenderTargetsDesc* pDesc)

Instead of a long list of parameters we now provide a struct that gives us enough flexibility to pack more functionality in there.

Variable Rate Shading

We added Variable Rate Shading to the Visibility Buffer OIT example test 15a. This way we have a better looking test scene with St. Miguel.

VRS allows rendering parts of the render target at different resolution based on the auto-generated VRS map, thus achieving higher performance with minimal quality loss. It is inspired by Michael Drobot's SIGGRAPH 2020 talk: https://docs.google.com/presentation/d/1WlntBELCK47vKyOTYI_h_fZahf6LabxS/edit?usp=drive_link&ouid=108042338473354174059&rtpof=true&sd=true

The key idea behind the software-based approach is to render everything in 4xMS targets and use a stencil buffer as a VRS map. VRS map is automatically generated based on the local image gradients.
It could be used on a way wider range of platforms and devices than the hardware-based approach since the hardware VRS support is broken or not supported on many platforms. Because this software approach utilizes 2x2 tiles we could also achieve higher image quality compared to hardware-based VRS.

Shading rate view based on the color per 2x2 pixel quad:

  • White – 1 sample (top left, always shaded);
  • Blue – 2 horizontal samples;
  • Red – 2 vertical samples;
  • Green – all 4 samples;

PC
VRS

Debug Output with the original Image on PC
VRS

PC
VRS

Debug Output with the original Image on PC
VRS

Android
VRS

Debug Output with the original Image on Android
VRS

Android
VRS

Debug Output with the original Image on Android
VRS

UI description:

  • Toggle VRS – enable/disable VRS
  • Draw Cubes – enable/disable dynamic objects in the scene
  • Toggle Debug View – shows auto-generated VRS map if VRS is enabled
  • Blur kernel Size – change blur kernel size of the blur applied to the background image to highlight performance benefits of the solution by making fragment shader heavy enough.
    Limitations:
    Relies on programmable sample locations support – not widely supported on Android devices.

Supported platforms:
PS4, PS5, all XBOXes, Nintendo Switch, Android (Galaxy S23 and higher), Windows(Vulkan/DX12), macOS/iOS.

gpu.data

You want to check out those files. They are now dedicated per supported platform. So it is easier for us to differ between different Playstations, XBOX'es, Switches, Android, iOS etc..

Unlinked Multi GPU

The Unlinked Multi GPU example was broken on AMD 7x GPUs with Vulkan. This looks like a bug. We had to disable DCC to make that work.

Vulkan

we track GPU memory now and will extend this to other platforms.

Vulkan mobile support

We support now the VK_EXT_ASTC_DECODE_MODE_EXTENSION_NAME extension

Remote UI

Various bug fixes to make this more stable. Still alpha ... will crash.

Retired:

  • 35 Variable Rate Shading ... this went into the Visibility Buffer OIT example 15a.
  • Basis Library - after not having found any practical usage case, we remove Basis again.

Release 1.54 - February 2nd, 2024 - Remote UI Control | Shader Server | Visibility Buffer | Asset Pipeline | GPU Config System | macOS/iOS | Lots more ...

05 Feb 20:55

Choose a tag to compare

Release 1.54 - February 2nd, 2024 - Remote UI Control | Shader Server | Visibility Buffer | Asset Pipeline | GPU Config System | macOS/iOS | Lots more ...

Our last release was in October 2022. We were so busy that we lost track of time. In March 2023 we planned to make the next release. We started testing and fixing and improving code up until today. The amount of improvements coming back from the -most of the time- 8 - 10 projects we are working on where so many, it was hard to integrate all this, test it and then maintain it. To a certain degree our business has higher priority than making GitHub releases but we realize that letting a lot of time pass makes it substantially harder for us to get the whole code base back in shape, even with a company size of nearly 40 graphics programmers. So we cut down functional or unit tests, so that we have less variables. We also restructured large parts of our code base so that it is easier to maintain. One of the constant maintenance challenges were the macOS / iOS run-time (More about that below).
We invested a lot in our testing environment. We have more consoles now for testing and we also have a much needed screenshot testing system. We outsource testing to external service providers more. We removed Linux as a stand-alone target but the native Steamdeck support should make up for this.
We tried to be conservative about increasing API versions because we know on many platforms our target group will use older OS or API implementations. Nevertheless we were more adventurous this year then before. So we bumped up with a larger step than in previous years.
Our next release is planned for in about four weeks time. We still have work to do to bring up a few source code parts but now the increments are much smaller.
In the meantime some of the games we worked on, or are still working on, shipped:

Forza Motorsport has launched in the meantime:

Forza Motorsport

Starfield has launched:

Starfield

No Man Sky has launched on macOS:

No Man's Sky
No Man's Sky

Internal automated testing setup on our internal GitLab server

  • Our automated testing setup that tests all the platforms now takes 38 minutes for one run. At some point it was more. We revamped this substantially since the last release adding now screenshot comparisons and a few extra steps for static code analysis.

Visibility Buffer

  • the Visibility Buffer went through a lot of upgrades since October 2022. I think the most notable ones are:
    • Refactored the whole code so that it is easier to re-use in all our examples, there is now a dedicated Visibility Buffer directory holding this code
    • Animation of characters is now integrated
    • Tangent and Bi-Tangent calculation is moved to the pixel shader and we removed the buffers

Software Variable Rate Shading

This Unit test represents software-based variable rate shading (VRS) technique that allows rendering parts of the render target at different resolution based on the auto-generated VRS map, thus achieving higher performance with minimal quality loss. It is inspired by Michael Drobot's SIGGRAPH 2020 talk: https://docs.google.com/presentation/d/1WlntBELCK47vKyOTYI_h_fZahf6LabxS/edit?usp=drive_link&ouid=108042338473354174059&rtpof=true&sd=true

PC Windows (2560x1080):
Variable Rate Shading on PC

Switch (1280x720):
Variable Rate Shading on Switch

XBOX One S (1080p):
Variable Rate Shading on XBOX One S

PS4 Pro (3840x2160):
Variable Rate Shading on XBOX One S

The key idea behind the software-based approach is to render everything in 4xMS targets and use a stencil buffer as a VRS map. The VRS map is automatically generated based on the local image gradients.
The advantage of this approach is that it runs on a wider range of platforms and devices than the hardware-based approach since the hardware VRS support is broken or not supported on many platforms. Because this software approach utilizes 2x2 tiles we can also achieve higher image quality compared to hardware-based VRS.

Shading rate view based on the color per 2x2 pixel quad:

  • White – 1 sample (top left, always shaded);
  • Blue – 2 horizontal samples;
  • Red – 2 vertical samples;
  • Green – all 4 samples;

Variable Rate Shading Debug

UI description:

  • Toggle VRS – enable/disable VRS
  • Draw Cubes – enable/disable dynamic objects in the scene
  • Toggle Debug View – shows auto-generated VRS map if VRS is enabled
  • Blur kernel Size – change blur kernel size of the blur applied to the background image to highlight performance benefits of the solution by making fragment shader heavy enough.
    Limitations:
    Relies on programmable sample locations support – not widely supported on Android devices.

Supported platforms:

PS4, PS5, all XBOXes, Nintendo Switch, Android (Galaxy S23 and higher), Windows(Vulkan/DX12).
Implemented on MacOS/IOS, but doesn’t give expected performance benefits due to the issue with stencil testing on that platform

Shader Server

To enable re-compilation of shaders during run-time we implemented a cross-platform shader server that allows to recompile shaders by pressing CTRL-S or a button in a dedicated menu.
You can find the documentation in the Wiki in the FSL section.

Remote UI Control

When working remotely, on mobile or console it can cumbersome to control the development UI.
We added a remote control application in Common_3\Tools\UIRemoteControl which allows control of all UI elements on all platforms.
It works as follows:

  • Build and Launch the Remote Control App located in Common_3/Tools/UIRemoteControl
  • When a unit test is started on the target application (i.e. consoles), it starts listening for connections on a part (8889 by default)
  • In the Remote Control App, enter the target ip address and click connect

Remote UI Control

This is alpha software so expect it to crash ...

VK_EXT_device_fault support

This extension allows developers to query for additional information on GPU faults which may have caused device loss, and to generate binary crash dumps.

Ray Queries in Ray Tracing

We switched to Ray Queries for the common Ray Tracing APIs on all the platforms we support. The current Ray Tracing APIs increase the amount of memory necessary substantially, decrease performance and can't add much visually because the whole game has to run with lower resolution, lower texture resolution and lower graphics quality (to make up for this, upscalers were introduced that add new issues to the final image).
Because Ray Tracing became a Marketing term valuable to GPU manufacturers, some game developers support now Ray Tracing to help increase hardware sales. So we are going with the flow here by offering those APIs.

macOS (1440x810)
Ray Queries on macOS

PS5 (3840x2160)
Ray Queries on PS5

Windows 10 (2560x1080)
Ray Queries on Windows 10

XBOX One Series X (1920x1080)
Ray Queries on XBOX One Series X

iPhone 11 (Model A2111) at resolution 896x414
Ray Queries on iOS

We do not have a denoiser for the Path Tracer.

GPU Configuration System

This is a cross-platform system that can track GPU capabilities on all platforms and switch on and off features of a game for different platforms. To read a lot more about this follow the link below.

GPU Configuration system

New macOS / iOS run-time

We think the Metal API is a well balanced Graphics API that walks the path between low-level and high-level very well. We ran into one general problem with the Metal API for both platforms. It is hard to maintain the code base. There is an architectural problem that was probably introduced due to lack in experience in shipping games.
In essence what Apple decided to do is have calls like this:

https://developer.apple.com/documentation/swift/marking-api-availability-in-objective-c

Anything a hardware vendor describes as available and working might not be working with the next upgrade of the operating system, hardware or just the API or XCode.
If you have a few hundred of those macros in your code, it becomes a lottery what works and what not on a variety of hardware. On some hardware one is broken, on the other hardware something else.
So there are two ways to deal with this: for every @available macro you start adding a #define to switch off or replace that code based on the underlying hardware and software platform. You would have to manually track if what the macro says is true on a wide range of platforms with different outcome.
So for example on macOS 10.13 running on a certain Macbook Pro (I make this up) with an Intel GPU it is broken but then a very similar Macbook Pro that has additionally a dedicated GPU actually runs it. Now you have to track what "class of Macbook Pro" we are talking about and if the Macbook Pro in question has an Intel or an AMD GPU.
We track all this data already so that is not a problem. We know exactly what piece of hardware we are looking at (see above GPU Config system).
The problem is that we have to guard every @available macro with som...

Read more