|
| 1 | +# Forward+ render pipeline — audit (Vulkan, 2026) |
| 2 | + |
| 3 | +This document is a **technical audit** of the current **Forward+ scaffolding** in this fork: what runs, what data flows where, synchronization, known limitations, and **risk items** for future work. It complements the narrative in [RENDERER_2026_ARCHITECTURE_PASS.md](RENDERER_2026_ARCHITECTURE_PASS.md). |
| 4 | + |
| 5 | +**Scope:** `r_forwardPlus` (default **0**, **latched**), PBR-only descriptor integration, **dynamic lights** from `backEnd.refdef` (`dlight_t`), **no** replacement of the primary forward lighting path. |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## 1. Feature summary |
| 10 | + |
| 11 | +| Layer | Responsibility | |
| 12 | +|--------|----------------| |
| 13 | +| **C / `vk_forward_plus.c`** | Host-visible **light SSBO** packing, **tile SSBO** allocation, **param SSBO** (`clipFromWorld` + aux uvec4), compute **pipeline + dispatch**, graphics **descriptor set** (set **18**), tile grid from **`VK_FP_TILE_DIM`** (16 px) and **`vk_get_render_target_width/height`**. | |
| 14 | +| **Compute / `forward_plus_tile_cull.comp`** | Per-tile **light index lists** ( **`MAX_PER_TILE` = 8** ), sphere-in-screen projection cull, **`MAX_LIGHTS` = 32** aligned with **`MAX_DLIGHTS`**. | |
| 15 | +| **Fragment / `gen_frag.tmpl`** (PBR) | Optional **debug heatmap** (`r_forwardPlusDebug`), optional **additive experimental shade** (`r_forwardPlusShade` → specialization **`forward_plus_shade_strength`**). Uses **`fp_params.fp_clip_from_world`** and SSBO light + tile data. | |
| 16 | +| **Uniform bridge / `tr_shade.c`** | When Forward+ is on, **`pbrForwardPlus.y`** carries **`floatBitsToUint(tess.dlightBits)`** so the fragment path can **skip** culled lights that the surface already received via the classic packed path (first **32** indices). | |
| 17 | + |
| 18 | +**Cvars** (see `tr_init.c`): `r_forwardPlus`, `r_forwardPlusMaxPerTile` (latched **4–8**), `r_forwardPlusDebug`, `r_forwardPlusShade` (pipeline invalidation on change in `vk_frame_submit.c`). |
| 19 | + |
| 20 | +--- |
| 21 | + |
| 22 | +## 2. Frame / command ordering |
| 23 | + |
| 24 | +Within **`RB_DrawSurfs`** (`tr_backend.c`), order is: |
| 25 | + |
| 26 | +1. **`vk_prepare_frame_temporal_state()`** |
| 27 | +2. **`vk_forward_plus_ensure_render_resolution()`** — may resize **tile SSBO** if render target dimensions changed (matches FBO / `r_renderScale` via **`vk_get_render_target_*`**). |
| 28 | +3. **`vk_forward_plus_update_for_refdef()`** — CPU writes **light SSBO** header + records from **`backEnd.refdef.dlights`**. Clears **tail** of the buffer when light count drops (avoids stale records). |
| 29 | +4. **`RB_RenderSunShadowMap`** — can alter **`vk.renderWidth`**; light/tile packing already uses **`vk_get_render_target_*`**, not transient globals. |
| 30 | +5. **`RB_BeginDrawingView()`** — begins the **main** render pass. |
| 31 | +6. **`vk_forward_plus_dispatch_tile_cull()`** — **compute** inside the active render pass (see §5). |
| 32 | + |
| 33 | +Then the world/entity draws run; PBR draws bind **descriptor set 18** when Forward+ resources are live (`vk_draw_state.c`). |
| 34 | + |
| 35 | +--- |
| 36 | + |
| 37 | +## 3. Data layout (SSBOs) |
| 38 | + |
| 39 | +### 3.1 Light buffer (`binding = 0`) |
| 40 | + |
| 41 | +Packed as **`float`** array in **`vk_forward_plus_update_for_refdef`**: |
| 42 | + |
| 43 | +| Offset (vec4 index) | Content | |
| 44 | +|---------------------|---------| |
| 45 | +| `data[0]` | **x** = packed light count **n**, **y** = refdef time (ms), **z** = **`max_per_tile`** (effective **4–8**), **w** = debug scale | |
| 46 | +| `data[1]` | **x,y** = **`tiles_x`, `tiles_y`**, **z,w** = viewport **width/height** (render target pixels) | |
| 47 | +| `data[2 + i*4 …]` | Four **`vec4`** per light **i** (origin+radius, color+linear flag, axis/cone pack, etc.) — mirrors **`dlight_t`** fields | |
| 48 | + |
| 49 | +**Caps:** at most **`MAX_DLIGHTS` (32)** lights for index compatibility with **`tess.dlightBits`**. Packing may be further limited by **buffer capacity**; overflow is clamped with a **developer** log (rate-limited by last source count). |
| 50 | + |
| 51 | +### 3.2 Tile buffer (`binding = 1`) |
| 52 | + |
| 53 | +Linear array: **`total_tiles × MAX_PER_TILE`** **`uint32`** indices. Unused slots **`0xFFFFFFFF`**. Stride per tile is fixed at **8** slots in the SSBO layout ( **`VK_FP_MAX_PER_TILE`** ); **`r_forwardPlusMaxPerTile`** only limits how many indices the **compute** and **fragment** loops **consume**. |
| 54 | + |
| 55 | +### 3.3 Param buffer (`binding = 2`) |
| 56 | + |
| 57 | +- **`mat4 clipFromWorld`** — **`view × projection_vk`** (same Y-flip convention as MVP path). |
| 58 | +- **`uvec4 tiles_xy_viewport`** — redundant with light header in places; used by compute for push/debug consistency. |
| 59 | + |
| 60 | +--- |
| 61 | + |
| 62 | +## 4. Compute shader behavior (`forward_plus_tile_cull.comp`) |
| 63 | + |
| 64 | +- **Workgroup:** 64 threads; dispatch **`ceil(totalTiles / 64)`**. |
| 65 | +- **Per thread:** one **tileId**; clears **MAX_PER_TILE** slots, then iterates lights **0 … min(n, numLights, MAX_LIGHTS)-1**. |
| 66 | +- **Projection:** `clip = clipFromWorld * vec4(worldPos,1)`; NDC bounds check (with margin on XY); center in **pixels** via **`0.5*(1+ndc)*viewport`**; **screen-radius** heuristic from world radius and **`clip.w`**; **AABB tile overlap** via **`sphere_tile_overlap`** with **`tilePxX/Y = viewport / tileGrid`** (aligned with fragment mapping). |
| 67 | + |
| 68 | +**Ordering bias:** lights are appended in **increasing index** order when a tile is under capacity—no distance or importance sort. |
| 69 | + |
| 70 | +--- |
| 71 | + |
| 72 | +## 5. Synchronization and pass placement |
| 73 | + |
| 74 | +**Barriers in `vk_forward_plus_dispatch_tile_cull`:** |
| 75 | + |
| 76 | +1. **Before compute:** HOST_WRITE → SHADER_READ on **light** + **param** buffers; tile buffer **dst** SHADER_WRITE (from prior fragment/compute read—first frame **`srcAccessMask = 0`**). |
| 77 | +2. **After compute:** SHADER_WRITE → SHADER_READ on **tile** buffer for subsequent **VS/FS** (and compute if chained). |
| 78 | + |
| 79 | +**Compute inside render pass:** The dispatch is issued while **`vk.inRenderPass`** is true (main pass). This is **legal in Vulkan 1.x** when the pass does not use **subpasses** that forbid side effects; the engine uses **load/store** attachments and does not declare **subpass dependencies** that would make this invalid. **Risk:** some layers or future **render-pass graph** refactors could want compute **between** passes instead—worth revisiting if subpasses or **fragment density** are introduced. |
| 80 | + |
| 81 | +**Host coherence:** Light and param buffers are **host-visible**; barriers use **`VK_PIPELINE_STAGE_HOST_BIT`** for writes before compute. Typical pattern is correct for **CPU write → GPU read** in the same frame. |
| 82 | + |
| 83 | +--- |
| 84 | + |
| 85 | +## 6. Fragment path (`gen_frag.tmpl`) |
| 86 | + |
| 87 | +**Gates:** |
| 88 | + |
| 89 | +- **`USE_FORWARD_PLUS_FRAG`** / **`USE_FORWARD_PLUS_WORLD_POS`** — experimental shade and overlays require **world position** in the fragment stage. |
| 90 | +- **`forward_plus_shade_strength`** — specialization constant; must stay in sync with **`vk_create_pipeline.c`** (Tier A check in **`renderer_regression_check.sh`**). |
| 91 | + |
| 92 | +**Tile lookup:** Matches compute: **`tilePx`** from SSBO header, **`clip_from_world`** from **`fp_params`**, **`gl_FragCoord`**-style pixel mapping (same formula as compute). **`tbase = tileId * 8u`** — must stay equal to **`MAX_PER_TILE`** in **`forward_plus_tile_cull.comp`** (Tier A check). |
| 93 | + |
| 94 | +**Energy / BRDF:** Additive pass uses **`CalcSpecular`** and a **renormalization** factor against **primary direct** (`fpRenorm`). This is explicitly **experimental**—not a second physically correct light transport path. |
| 95 | + |
| 96 | +**`dlightBits` skip:** Prevents double-counting when the classic path already applied a dynamic light to this surface (first 32 bits only—documented limitation vs **`MAX_DLIGHTS`** if they ever diverge on other platforms). |
| 97 | + |
| 98 | +--- |
| 99 | + |
| 100 | +## 7. Tier A regression coverage |
| 101 | + |
| 102 | +`scripts/renderer_regression_check.sh` asserts: |
| 103 | + |
| 104 | +- **`MAX_LIGHTS` == `MAX_DLIGHTS`** |
| 105 | +- **`MAX_PER_TILE` == `VK_FP_MAX_PER_TILE`** |
| 106 | +- **`VK_FP_MIN_PER_TILE` ≤ `MAX_PER_TILE`** |
| 107 | +- **`r_forwardPlusMaxPerTile`** CheckRange uses **`vk_forward_plus_get_*_per_tile_cap`** |
| 108 | +- **`forward_plus_shade_strength`** `constant_id` matches **`ADD_FRAG_SPEC`** |
| 109 | +- Compute uses **dynamic** tile pixels (no hard-coded **`16u`** tile corners) |
| 110 | +- **PBR fragment tile stride** (`tileId * N`) matches **`MAX_PER_TILE`** from the compute shader |
| 111 | +- **`VK_FP_TILE_DIM`** consistency (host grid) |
| 112 | + |
| 113 | +--- |
| 114 | + |
| 115 | +## 8. Findings and recommendations |
| 116 | + |
| 117 | +### Strengths |
| 118 | + |
| 119 | +- **Single source of truth** for render resolution in packing/cull/shade: **`vk_get_render_target_width/height`** (+ cached main-color extent when FBO active). |
| 120 | +- **Stale light** tail zeroing when counts drop. |
| 121 | +- **Clip matrix** matches view/projection convention used elsewhere. |
| 122 | +- **Dummy buffers** when Forward+ is off so set **18** stays valid for PBR pipelines. |
| 123 | + |
| 124 | +### Risks / limitations (accepted for scaffolding) |
| 125 | + |
| 126 | +| Item | Severity | Note | |
| 127 | +|------|-----------|------| |
| 128 | +| **No light sort** in tile lists | Medium (quality) | First-N lights win per tile; can miss visually dominant lights under overload. | |
| 129 | +| **Sphere screen approximation** | Low–Medium | Conservative enough for prototyping; not a tight spotlight frustum test. | |
| 130 | +| **`dlightBits` 32-bit** | Low | Matches **`MAX_DLIGHTS`** today; document if caps change. | |
| 131 | +| **Compute inside render pass** | Low (portability) | Valid now; revisit with subpass graphs or render graph. | |
| 132 | +| **Primary + Forward+ energy** | Medium (art) | Renormalization is heuristic; tune per title if shade is enabled. | |
| 133 | + |
| 134 | +### Suggested next steps (roadmap) |
| 135 | + |
| 136 | +1. **Depth-aware culling** (optional Hi-Z or linear depth rejection) before accepting a light for a tile. |
| 137 | +2. **Sort or priority** (distance / luminance) when filling **`maxPerTile`** slots. |
| 138 | +3. **Decouple** Forward+ light ceiling from **`MAX_DLIGHTS`** only if the **game protocol** and **`tess.dlightBits`** story are redesigned together. |
| 139 | +4. **Tier B** map with mixed point + spot lights to validate heatmap vs ground truth. |
| 140 | + |
| 141 | +--- |
| 142 | + |
| 143 | +## 9. Primary references |
| 144 | + |
| 145 | +| File | Role | |
| 146 | +|------|------| |
| 147 | +| `src/renderers/vulkan/vk_forward_plus.c` | Packing, buffers, dispatch, tile resize | |
| 148 | +| `src/renderers/vulkan/shaders/glsl/forward_plus_tile_cull.comp` | Tile list build | |
| 149 | +| `src/renderers/vulkan/shaders/glsl/gen_frag.tmpl` | Debug + experimental shade | |
| 150 | +| `src/renderers/vulkan/tr_shade.c` | `pbrForwardPlus` uniform | |
| 151 | +| `src/renderers/vulkan/tr_backend.c` | Scheduling | |
| 152 | +| `src/renderers/vulkan/vk_create_pipeline.c` | `forward_plus_shade_strength` spec | |
| 153 | +| `scripts/renderer_regression_check.sh` | Tier A drift guards | |
0 commit comments