Skip to content

Commit 67af805

Browse files
cursoragenttimfox
andcommitted
docs: Forward+ pipeline audit + Tier A tile stride guard
Add FORWARD_PLUS_PIPELINE_AUDIT.md (data flow, ordering, sync, risks). Link from ARCHITECTURE, RENDERERS, RENDERER_2026_ARCHITECTURE_PASS. renderer_regression_check: assert gen_frag.tmpl tileId*N matches MAX_PER_TILE from forward_plus_tile_cull.comp. Manifest + RENDERER_CONFIDENCE list the audit doc and new check. Co-authored-by: Tim Fox <timfox@outlook.com>
1 parent 901c0d8 commit 67af805

7 files changed

Lines changed: 172 additions & 3 deletions

File tree

docs/ARCHITECTURE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,7 @@ The shipping Vulkan renderer is **forward-only** with a layered HDR/post-process
137137

138138
`r_renderMode 1/2` remain **deferred / classic Forward+ placeholders** (no alternate full-frame path through those modes).
139139

140-
**Vulkan optional Forward+ scaffolding** (`r_forwardPlus 1`, default **0**, **latched**): GPU light record SSBO, **16×16 px** tile cull compute, and optional PBR tile debug / additive local-light shading. Packed lights are capped at **`MAX_DLIGHTS` (32)** so indices stay aligned with `tess.dlightBits`. Per-tile index count is **`r_forwardPlusMaxPerTile`** (**4–8**, latched, default **8**; SSBO stride is fixed at 8 slots). Tile buffers follow **`vk_get_render_target_width/height`** (internal resolution / `r_renderScale`) and **reallocate on resize** without `vid_restart`; toggling `r_forwardPlus` or `r_forwardPlusMaxPerTile` still needs **`vid_restart`**. Implementation: `src/renderers/vulkan/vk_forward_plus.c`, cvars in `src/renderers/vulkan/tr_init.c`.
140+
**Vulkan optional Forward+ scaffolding** (`r_forwardPlus 1`, default **0**, **latched**): GPU light record SSBO, **16×16 px** tile cull compute, and optional PBR tile debug / additive local-light shading. Packed lights are capped at **`MAX_DLIGHTS` (32)** so indices stay aligned with `tess.dlightBits`. Per-tile index count is **`r_forwardPlusMaxPerTile`** (**4–8**, latched, default **8**; SSBO stride is fixed at 8 slots). Tile buffers follow **`vk_get_render_target_width/height`** (internal resolution / `r_renderScale`) and **reallocate on resize** without `vid_restart`; toggling `r_forwardPlus` or `r_forwardPlusMaxPerTile` still needs **`vid_restart`**. Implementation: `src/renderers/vulkan/vk_forward_plus.c`, cvars in `src/renderers/vulkan/tr_init.c`. Full audit: [FORWARD_PLUS_PIPELINE_AUDIT.md](FORWARD_PLUS_PIPELINE_AUDIT.md).
141141

142142
For goals and longer notes, see [RENDERER_2026_ARCHITECTURE_PASS.md](RENDERER_2026_ARCHITECTURE_PASS.md) and [RENDERERS.md](RENDERERS.md#vulkan-forward-scaffolding).
143143

Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
# Forward+ render pipeline — audit (Vulkan, 2026)
2+
3+
This document is a **technical audit** of the current **Forward+ scaffolding** in this fork: what runs, what data flows where, synchronization, known limitations, and **risk items** for future work. It complements the narrative in [RENDERER_2026_ARCHITECTURE_PASS.md](RENDERER_2026_ARCHITECTURE_PASS.md).
4+
5+
**Scope:** `r_forwardPlus` (default **0**, **latched**), PBR-only descriptor integration, **dynamic lights** from `backEnd.refdef` (`dlight_t`), **no** replacement of the primary forward lighting path.
6+
7+
---
8+
9+
## 1. Feature summary
10+
11+
| Layer | Responsibility |
12+
|--------|----------------|
13+
| **C / `vk_forward_plus.c`** | Host-visible **light SSBO** packing, **tile SSBO** allocation, **param SSBO** (`clipFromWorld` + aux uvec4), compute **pipeline + dispatch**, graphics **descriptor set** (set **18**), tile grid from **`VK_FP_TILE_DIM`** (16 px) and **`vk_get_render_target_width/height`**. |
14+
| **Compute / `forward_plus_tile_cull.comp`** | Per-tile **light index lists** ( **`MAX_PER_TILE` = 8** ), sphere-in-screen projection cull, **`MAX_LIGHTS` = 32** aligned with **`MAX_DLIGHTS`**. |
15+
| **Fragment / `gen_frag.tmpl`** (PBR) | Optional **debug heatmap** (`r_forwardPlusDebug`), optional **additive experimental shade** (`r_forwardPlusShade` → specialization **`forward_plus_shade_strength`**). Uses **`fp_params.fp_clip_from_world`** and SSBO light + tile data. |
16+
| **Uniform bridge / `tr_shade.c`** | When Forward+ is on, **`pbrForwardPlus.y`** carries **`floatBitsToUint(tess.dlightBits)`** so the fragment path can **skip** culled lights that the surface already received via the classic packed path (first **32** indices). |
17+
18+
**Cvars** (see `tr_init.c`): `r_forwardPlus`, `r_forwardPlusMaxPerTile` (latched **4–8**), `r_forwardPlusDebug`, `r_forwardPlusShade` (pipeline invalidation on change in `vk_frame_submit.c`).
19+
20+
---
21+
22+
## 2. Frame / command ordering
23+
24+
Within **`RB_DrawSurfs`** (`tr_backend.c`), order is:
25+
26+
1. **`vk_prepare_frame_temporal_state()`**
27+
2. **`vk_forward_plus_ensure_render_resolution()`** — may resize **tile SSBO** if render target dimensions changed (matches FBO / `r_renderScale` via **`vk_get_render_target_*`**).
28+
3. **`vk_forward_plus_update_for_refdef()`** — CPU writes **light SSBO** header + records from **`backEnd.refdef.dlights`**. Clears **tail** of the buffer when light count drops (avoids stale records).
29+
4. **`RB_RenderSunShadowMap`** — can alter **`vk.renderWidth`**; light/tile packing already uses **`vk_get_render_target_*`**, not transient globals.
30+
5. **`RB_BeginDrawingView()`** — begins the **main** render pass.
31+
6. **`vk_forward_plus_dispatch_tile_cull()`****compute** inside the active render pass (see §5).
32+
33+
Then the world/entity draws run; PBR draws bind **descriptor set 18** when Forward+ resources are live (`vk_draw_state.c`).
34+
35+
---
36+
37+
## 3. Data layout (SSBOs)
38+
39+
### 3.1 Light buffer (`binding = 0`)
40+
41+
Packed as **`float`** array in **`vk_forward_plus_update_for_refdef`**:
42+
43+
| Offset (vec4 index) | Content |
44+
|---------------------|---------|
45+
| `data[0]` | **x** = packed light count **n**, **y** = refdef time (ms), **z** = **`max_per_tile`** (effective **4–8**), **w** = debug scale |
46+
| `data[1]` | **x,y** = **`tiles_x`, `tiles_y`**, **z,w** = viewport **width/height** (render target pixels) |
47+
| `data[2 + i*4 …]` | Four **`vec4`** per light **i** (origin+radius, color+linear flag, axis/cone pack, etc.) — mirrors **`dlight_t`** fields |
48+
49+
**Caps:** at most **`MAX_DLIGHTS` (32)** lights for index compatibility with **`tess.dlightBits`**. Packing may be further limited by **buffer capacity**; overflow is clamped with a **developer** log (rate-limited by last source count).
50+
51+
### 3.2 Tile buffer (`binding = 1`)
52+
53+
Linear array: **`total_tiles × MAX_PER_TILE`** **`uint32`** indices. Unused slots **`0xFFFFFFFF`**. Stride per tile is fixed at **8** slots in the SSBO layout ( **`VK_FP_MAX_PER_TILE`** ); **`r_forwardPlusMaxPerTile`** only limits how many indices the **compute** and **fragment** loops **consume**.
54+
55+
### 3.3 Param buffer (`binding = 2`)
56+
57+
- **`mat4 clipFromWorld`****`view × projection_vk`** (same Y-flip convention as MVP path).
58+
- **`uvec4 tiles_xy_viewport`** — redundant with light header in places; used by compute for push/debug consistency.
59+
60+
---
61+
62+
## 4. Compute shader behavior (`forward_plus_tile_cull.comp`)
63+
64+
- **Workgroup:** 64 threads; dispatch **`ceil(totalTiles / 64)`**.
65+
- **Per thread:** one **tileId**; clears **MAX_PER_TILE** slots, then iterates lights **0 … min(n, numLights, MAX_LIGHTS)-1**.
66+
- **Projection:** `clip = clipFromWorld * vec4(worldPos,1)`; NDC bounds check (with margin on XY); center in **pixels** via **`0.5*(1+ndc)*viewport`**; **screen-radius** heuristic from world radius and **`clip.w`**; **AABB tile overlap** via **`sphere_tile_overlap`** with **`tilePxX/Y = viewport / tileGrid`** (aligned with fragment mapping).
67+
68+
**Ordering bias:** lights are appended in **increasing index** order when a tile is under capacity—no distance or importance sort.
69+
70+
---
71+
72+
## 5. Synchronization and pass placement
73+
74+
**Barriers in `vk_forward_plus_dispatch_tile_cull`:**
75+
76+
1. **Before compute:** HOST_WRITE → SHADER_READ on **light** + **param** buffers; tile buffer **dst** SHADER_WRITE (from prior fragment/compute read—first frame **`srcAccessMask = 0`**).
77+
2. **After compute:** SHADER_WRITE → SHADER_READ on **tile** buffer for subsequent **VS/FS** (and compute if chained).
78+
79+
**Compute inside render pass:** The dispatch is issued while **`vk.inRenderPass`** is true (main pass). This is **legal in Vulkan 1.x** when the pass does not use **subpasses** that forbid side effects; the engine uses **load/store** attachments and does not declare **subpass dependencies** that would make this invalid. **Risk:** some layers or future **render-pass graph** refactors could want compute **between** passes instead—worth revisiting if subpasses or **fragment density** are introduced.
80+
81+
**Host coherence:** Light and param buffers are **host-visible**; barriers use **`VK_PIPELINE_STAGE_HOST_BIT`** for writes before compute. Typical pattern is correct for **CPU write → GPU read** in the same frame.
82+
83+
---
84+
85+
## 6. Fragment path (`gen_frag.tmpl`)
86+
87+
**Gates:**
88+
89+
- **`USE_FORWARD_PLUS_FRAG`** / **`USE_FORWARD_PLUS_WORLD_POS`** — experimental shade and overlays require **world position** in the fragment stage.
90+
- **`forward_plus_shade_strength`** — specialization constant; must stay in sync with **`vk_create_pipeline.c`** (Tier A check in **`renderer_regression_check.sh`**).
91+
92+
**Tile lookup:** Matches compute: **`tilePx`** from SSBO header, **`clip_from_world`** from **`fp_params`**, **`gl_FragCoord`**-style pixel mapping (same formula as compute). **`tbase = tileId * 8u`** — must stay equal to **`MAX_PER_TILE`** in **`forward_plus_tile_cull.comp`** (Tier A check).
93+
94+
**Energy / BRDF:** Additive pass uses **`CalcSpecular`** and a **renormalization** factor against **primary direct** (`fpRenorm`). This is explicitly **experimental**—not a second physically correct light transport path.
95+
96+
**`dlightBits` skip:** Prevents double-counting when the classic path already applied a dynamic light to this surface (first 32 bits only—documented limitation vs **`MAX_DLIGHTS`** if they ever diverge on other platforms).
97+
98+
---
99+
100+
## 7. Tier A regression coverage
101+
102+
`scripts/renderer_regression_check.sh` asserts:
103+
104+
- **`MAX_LIGHTS` == `MAX_DLIGHTS`**
105+
- **`MAX_PER_TILE` == `VK_FP_MAX_PER_TILE`**
106+
- **`VK_FP_MIN_PER_TILE``MAX_PER_TILE`**
107+
- **`r_forwardPlusMaxPerTile`** CheckRange uses **`vk_forward_plus_get_*_per_tile_cap`**
108+
- **`forward_plus_shade_strength`** `constant_id` matches **`ADD_FRAG_SPEC`**
109+
- Compute uses **dynamic** tile pixels (no hard-coded **`16u`** tile corners)
110+
- **PBR fragment tile stride** (`tileId * N`) matches **`MAX_PER_TILE`** from the compute shader
111+
- **`VK_FP_TILE_DIM`** consistency (host grid)
112+
113+
---
114+
115+
## 8. Findings and recommendations
116+
117+
### Strengths
118+
119+
- **Single source of truth** for render resolution in packing/cull/shade: **`vk_get_render_target_width/height`** (+ cached main-color extent when FBO active).
120+
- **Stale light** tail zeroing when counts drop.
121+
- **Clip matrix** matches view/projection convention used elsewhere.
122+
- **Dummy buffers** when Forward+ is off so set **18** stays valid for PBR pipelines.
123+
124+
### Risks / limitations (accepted for scaffolding)
125+
126+
| Item | Severity | Note |
127+
|------|-----------|------|
128+
| **No light sort** in tile lists | Medium (quality) | First-N lights win per tile; can miss visually dominant lights under overload. |
129+
| **Sphere screen approximation** | Low–Medium | Conservative enough for prototyping; not a tight spotlight frustum test. |
130+
| **`dlightBits` 32-bit** | Low | Matches **`MAX_DLIGHTS`** today; document if caps change. |
131+
| **Compute inside render pass** | Low (portability) | Valid now; revisit with subpass graphs or render graph. |
132+
| **Primary + Forward+ energy** | Medium (art) | Renormalization is heuristic; tune per title if shade is enabled. |
133+
134+
### Suggested next steps (roadmap)
135+
136+
1. **Depth-aware culling** (optional Hi-Z or linear depth rejection) before accepting a light for a tile.
137+
2. **Sort or priority** (distance / luminance) when filling **`maxPerTile`** slots.
138+
3. **Decouple** Forward+ light ceiling from **`MAX_DLIGHTS`** only if the **game protocol** and **`tess.dlightBits`** story are redesigned together.
139+
4. **Tier B** map with mixed point + spot lights to validate heatmap vs ground truth.
140+
141+
---
142+
143+
## 9. Primary references
144+
145+
| File | Role |
146+
|------|------|
147+
| `src/renderers/vulkan/vk_forward_plus.c` | Packing, buffers, dispatch, tile resize |
148+
| `src/renderers/vulkan/shaders/glsl/forward_plus_tile_cull.comp` | Tile list build |
149+
| `src/renderers/vulkan/shaders/glsl/gen_frag.tmpl` | Debug + experimental shade |
150+
| `src/renderers/vulkan/tr_shade.c` | `pbrForwardPlus` uniform |
151+
| `src/renderers/vulkan/tr_backend.c` | Scheduling |
152+
| `src/renderers/vulkan/vk_create_pipeline.c` | `forward_plus_shade_strength` spec |
153+
| `scripts/renderer_regression_check.sh` | Tier A drift guards |

docs/RENDERERS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ Optional **GPU light packing + per-tile cull** on the existing forward path (not
3030

3131
Code: `src/renderers/vulkan/vk_forward_plus.c`, `VK_FP_*` constants; cvar registration `src/renderers/vulkan/tr_init.c`.
3232

33+
**Audit:** [FORWARD_PLUS_PIPELINE_AUDIT.md](FORWARD_PLUS_PIPELINE_AUDIT.md).
34+
3335
### Compute shaders, mesh shaders (NV), and upscaling (DLSS)
3436
- **Compute:** already central to Vulkan (volumetric fog stages, vegetation wind, fluid sim, terrain CBT, etc.).
3537
- **Mesh shaders (NVIDIA):** optional device extension **`VK_NV_mesh_shader`** when **`r_vk_meshShaderNV 1`** (default **0**, **latched**, `vid_restart`). Enables `meshShader` in `VkPhysicalDeviceMeshShaderFeaturesNV` for future pipelines; **no mesh-shader draw path** is wired yet - safe on `main`.

docs/RENDERER_2026_ARCHITECTURE_PASS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,7 @@ Use **Vulkan as the primary renderer architecture**, freeze OpenGL as compatibil
158158

159159
## Phase 2: Lighting Scale
160160

161-
- Add Vulkan light records plus cluster/tile culling. **Incremental (engine):** `r_forwardPlus 1` (default 0) allocates light + tile SSBOs, packs **at most `MAX_DLIGHTS` (32)** lights from `backEnd.refdef` (indices align with `tess.dlightBits`; excess lights are omitted and a **developer** log notes when the source count exceeds the cap), and runs a **compute tile cull** (`forward_plus_tile_cull.comp`, **`VK_FP_TILE_DIM` (16)** px tiles via `ceil(viewport / dim)`, up to **8** index slots per tile; active count **4–8** via latched **`r_forwardPlusMaxPerTile`**, default **8**) after `RB_BeginDrawingView` inside the main render pass. **Compute + PBR fragment** derive **tile pixel size** from the packed **viewport ÷ tile grid** so cull, debug overlay, and experimental shade stay aligned if tile policy changes. Tile grid and NDC→pixel use **`vk_get_render_target_width/height`** (FBO / `r_renderScale`); the tile SSBO is **reallocated when that resolution changes** (no `vid_restart` for resize alone). **PBR fragment:** descriptor set 18 binds light + tile + **param** SSBOs (clip matrix for shading). `r_forwardPlusDebug` (0–1) = debug overlay; **`r_forwardPlusShade`** (0–4, default 0) adds **experimental diffuse + microfacet spec** (per-light `CalcSpecular`, scaled) from tile-culled **point and linear/spot** lights. When `r_forwardPlus` is on, **`pbrForwardPlus.y`** carries **`floatBitsToUint(tess.dlightBits)`** so the shader **skips** any packed light index whose bit is set in `tess.dlightBits` (first **32** indices only; matches typical `MAX_DLIGHTS` range). Primary direct is still **softly renormalized** vs Forward+ energy. Works with deluxe/lightmap; toggling shade **invalidates cached pipelines** next frame.
161+
- Add Vulkan light records plus cluster/tile culling. **Incremental (engine):** `r_forwardPlus 1` (default 0) allocates light + tile SSBOs, packs **at most `MAX_DLIGHTS` (32)** lights from `backEnd.refdef` (indices align with `tess.dlightBits`; excess lights are omitted and a **developer** log notes when the source count exceeds the cap), and runs a **compute tile cull** (`forward_plus_tile_cull.comp`, **`VK_FP_TILE_DIM` (16)** px tiles via `ceil(viewport / dim)`, up to **8** index slots per tile; active count **4–8** via latched **`r_forwardPlusMaxPerTile`**, default **8**) after `RB_BeginDrawingView` inside the main render pass. **Compute + PBR fragment** derive **tile pixel size** from the packed **viewport ÷ tile grid** so cull, debug overlay, and experimental shade stay aligned if tile policy changes. Tile grid and NDC→pixel use **`vk_get_render_target_width/height`** (FBO / `r_renderScale`); the tile SSBO is **reallocated when that resolution changes** (no `vid_restart` for resize alone). **PBR fragment:** descriptor set 18 binds light + tile + **param** SSBOs (clip matrix for shading). `r_forwardPlusDebug` (0–1) = debug overlay; **`r_forwardPlusShade`** (0–4, default 0) adds **experimental diffuse + microfacet spec** (per-light `CalcSpecular`, scaled) from tile-culled **point and linear/spot** lights. When `r_forwardPlus` is on, **`pbrForwardPlus.y`** carries **`floatBitsToUint(tess.dlightBits)`** so the shader **skips** any packed light index whose bit is set in `tess.dlightBits` (first **32** indices only; matches typical `MAX_DLIGHTS` range). Primary direct is still **softly renormalized** vs Forward+ energy. Works with deluxe/lightmap; toggling shade **invalidates cached pipelines** next frame. **Formal audit:** [FORWARD_PLUS_PIPELINE_AUDIT.md](FORWARD_PLUS_PIPELINE_AUDIT.md).
162162
- Introduce Forward+ shading for local lights.
163163
- Keep shadow budgets conservative and explicit.
164164

docs/RENDERER_CONFIDENCE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ On **`main`**, GitHub Actions **`.github/workflows/build.yml`** runs **`renderer
2020

2121
| Check | Command | What it proves |
2222
|--------|---------|----------------|
23-
| Renderer regression (repo) | `./scripts/renderer_regression_check.sh` | Regression docs present (manifest includes `docs/ARCHITECTURE.md`, `docs/ROADMAP.md`, `BUILD.md`, `src/renderers/vulkan/vk_temporal.h`, …); `shader_data.c` / `shader_binding.c` exist; recursive GLSL `glslang` pass; IQM/glTF morph **#define** parity; Forward+ **tile cull vs `tr_types.h` / `vk_forward_plus.c`** caps; **`r_forwardPlusMaxPerTile`** range wired to `vk_forward_plus_get_*_per_tile_cap`; PBR **`forward_plus_shade_strength`** `constant_id` matches `vk_create_pipeline.c`; **`vk_temporal`** reset bitmask count matches switch + `knownReasons[]` table. Optional: set `GAME_BASE` and uncomment BSP paths in `OPTIONAL_GAME_ASSETS.txt` to require packaged maps. |
23+
| Renderer regression (repo) | `./scripts/renderer_regression_check.sh` | Regression docs present (manifest includes `docs/ARCHITECTURE.md`, `docs/ROADMAP.md`, `BUILD.md`, `docs/FORWARD_PLUS_PIPELINE_AUDIT.md`, `src/renderers/vulkan/vk_temporal.h`, …); `shader_data.c` / `shader_binding.c` exist; recursive GLSL `glslang` pass; IQM/glTF morph **#define** parity; Forward+ **tile cull vs `tr_types.h` / `vk_forward_plus.c`** caps; **`gen_frag.tmpl`** tile stride `tileId * Nu` matches **`MAX_PER_TILE`**; **`r_forwardPlusMaxPerTile`** range wired to `vk_forward_plus_get_*_per_tile_cap`; PBR **`forward_plus_shade_strength`** `constant_id` matches `vk_create_pipeline.c`; **`vk_temporal`** reset bitmask count matches switch + `knownReasons[]` table. Optional: set `GAME_BASE` and uncomment BSP paths in `OPTIONAL_GAME_ASSETS.txt` to require packaged maps. |
2424
| Map load sanity (content) | `GAME_BASE=/abs/path/to/base ./scripts/renderer_regression_maps.sh` | Dedicated server runs `+map` for each `rtest_*` map; log scanned for `ERROR:`, `couldn't load`, `CM_LoadMap`, crashes. Requires full game base (VM + assets), not the regression pk3 alone. `RELEASE_DIR` optional (default `<repo>/release`). |
2525
| Full local CI parity | `./scripts/validate_ci_build.sh` | SPIR-V generation, Vulkan Release build, smoke test, renderer regression check. |
2626
| CMake smoke + artifacts | `ctest --output-on-failure` (from `build-vk-Release` or `build-gl-Release`) | Smoke, **renderer_regression_check**, artifacts, unit hooks, **demo_game pk3 layout** (`test_demo_game_pk3`). |

docs/samples/renderer_regression/REGRESSION_REPO_MANIFEST.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ BUILD.md
99
docs/ROADMAP.md
1010
docs/RENDERERS_FUTURE.md
1111
docs/RENDERER_2026_ARCHITECTURE_PASS.md
12+
docs/FORWARD_PLUS_PIPELINE_AUDIT.md
1213
docs/GLTF.md
1314
src/renderers/vulkan/tr_model_gltf.h
1415
src/renderers/opengl/tr_local.h

0 commit comments

Comments
 (0)