Skip to content

Conversation

@FireworkSky
Copy link
Contributor

  • Only applies to NoOverwrite operations (not Discard)
  • Uses ARB_map_buffer_range extension check for better portability
  • More conservative approach to avoid driver-specific issues"

This patch reduces the scope of MapBufferRange optimization:
- Only applies to NoOverwrite operations (not Discard)
- Uses ARB_map_buffer_range extension check for better portability
- Tested with good results on AMD Linux
- More conservative approach to avoid driver-specific issues
- Only applies to NoOverwrite operations (not Discard)
- Uses ARB_map_buffer_range extension check for better portability
- More conservative approach to avoid driver-specific issues
Copy link
Member

@flibitijibibo flibitijibibo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, we just need to make sure NVIDIA performance is still okay... @TheSpydog, will you have a chance to try this again soon?

@TheSpydog
Copy link
Member

Sure thing, I’ll take a look tomorrow morning.

@taxifair95-tech
Copy link

Hi I'm not from china and I'm trying to get the link for rotatingartlauncher
Can you send it to me please

@FireworkSky
Copy link
Contributor Author

你好,我不是中国人,我想找 rotatingartlauncher 的链接,你能把它发给我吗?

emm, please don't discuss this here.

@taxifair95-tech
Copy link

So where i can discuss?

@TheSpydog
Copy link
Member

Tested the sprite batch stress test again on Nvidia Tegra this morning. The performance with Deferred sprite batches is the same before vs. after the patch, but the Immediate path is more interesting. Instead of a consistent 14ms per frame, I'm now seeing this strange pattern:

Batch took 10ms with Immediate
Batch took 10ms with Immediate
Batch took 10ms with Immediate
Batch took 10ms with Immediate
Batch took 10ms with Immediate
Batch took 107ms with Immediate
Batch took 10ms with Immediate
Batch took 10ms with Immediate
Batch took 10ms with Immediate
Batch took 11ms with Immediate
Batch took 10ms with Immediate
Batch took 107ms with Immediate

While the typical batch is more performant (14->10ms), every 6 batches, we pay some sort of >100ms penalty. I don't immediately have an idea of where that might be coming from. Maybe it's some sort of driver heuristic kicking in and forcing a sync point?

@flibitijibibo
Copy link
Member

Possibly - I also wonder if it's the number of discards happening. Changing it to smaller, weirder batch sizes may have an impact too?

@flibitijibibo flibitijibibo merged commit ec54710 into FNA-XNA:master Dec 15, 2025
13 checks passed
@flibitijibibo
Copy link
Member

Merged but with an added environment variable on top:

7a6dcb7

This gives all platforms the option, but apps/users need to ask for it explicitly first. Once we have a better grasp on how this really works we can see if it works as default behavior.

@FireworkSky
Copy link
Contributor Author

合并后,顶部增加了一个环境变量:

7a6dcb7

这让所有平台都有选择权,但应用/用户需要先明确要求。一旦我们更好地理解了它的实际工作原理,就可以判断它是否作为默认行为。

The perfect idea is that Nvidia Tegra should avoid enabling supports_ARB_map_buffer_range.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants