Project: VulkanSceneRenderer Branch:
main
Date: 2025
Status: Living document — Phase 1 ✅ Phase 2 ✅ Phase 3 ✅ (+ cascade blend & debug view) Phase 4 ✅ Phase 5 ✅ Phase 6 ✅ Bloom ✅ Related:docs/refactoring-plan.md(architecture refactoring)
This document records the renderer feature plan and the rationale behind the current lighting pipeline. Some early audit rows are intentionally historical; when behavior differs, trust the implementation and the phase-completion notes later in the file.
The active frame-lighting path is:
Depth prepass
-> Hi-Z / occlusion cull
-> G-buffer
-> shadow cascades
-> tile light cull
-> GTAO
-> deferred lighting + transparent OIT
-> bloom
-> post-process
Coordinate, reverse-Z, viewport, and winding rules are defined in
coordinate-conventions.md. Shader comments should
point back to those rules instead of restating a conflicting local convention.
- Current State Audit
- Gap Analysis
- Phase 1 — Extract Shared BRDF Include
- Phase 2 — G-Buffer Optimization
- Phase 3 — Cascaded Shadow Maps
- Phase 4 — Clustered / Tiled Light Culling
- Phase 5 — Image-Based Lighting & SSAO
- Phase 6 — GPU-Driven Rendering
- Bloom Post-Processing
- Appendix A — File Inventory
- Appendix B — Reference Material
Depth Prepass → G-Buffer Fill (4 MRTs) → OIT Clear
→ Shadow Cascade 0..3 (depth-only, directional light CSM)
→ Tile Cull (compute, bins lights into 16×16 tiles)
→ GTAO (compute, half-res AO) → GTAO Blur (compute, bilateral denoise)
→ Directional Light (fullscreen quad, CSM shadows + IBL ambient + SSAO)
→ Point Lights (tiled deferred, stencil fallback) → OIT Transparent
→ Bloom (compute, dual-filter downsample/upsample mip chain)
→ Post-Process (OIT Resolve + Bloom Composite + Tone Map) → Debug Overlays → ImGui
| Attachment | Format | Content |
|---|---|---|
| RT0 — Albedo | R8G8B8A8_UNORM |
Base color + alpha |
| RT1 — Normal | R16G16B16A16_SFLOAT |
World-space normal (encoded xyz * 0.5 + 0.5) |
| RT2 — Material | R16G16B16A16_SFLOAT |
Metallic (R), Roughness (G), Occlusion (B) |
| RT3 — Emissive | R16G16B16A16_SFLOAT |
Emissive RGB |
| Depth/Stencil | D32_S8 | Reverse-Z depth + stencil for light volumes (sampled in lighting pass via depthSamplingView) |
Total G-Buffer bandwidth per pixel: ~40 bytes (1 × RGBA8 + 3 × RGBA16F + D32_S8).
Note: Position is reconstructed from depth via ReconstructWorldPosition() in brdf_common.slang (Phase 2).
| Component | Current Implementation |
|---|---|
| Directional light | 1 hardcoded sun; fullscreen triangle; Cook-Torrance GGX in deferred_directional.slang; shadow sampling via CSM (Phase 3, in progress) |
| Point lights | kMaxDeferredPointLights = 4; hardcoded positions in LightingManager::updateLightingData(); per-light stencil-mark cube + fullscreen-triangle shading |
| Light data transport | Single LightingData uniform buffer (~144 bytes); CPU-written via SceneController::writeToBuffer() each frame |
| Shadows | CSM — 4-cascade, 2048², D32_S8 array texture, 3×3 PCF, comparison sampler (Phase 3 ✅) |
| Ambient | IBL split-sum (irradiance cubemap + pre-filtered specular + BRDF LUT) × GTAO in deferred_directional.slang (Phase 5 ✅) |
| IBL / Probes | Placeholder white cubemaps (1×1); BRDF LUT generated at startup; framework ready for HDR environment loading (Phase 5 ✅) |
| SSAO | GTAO compute pass (half-res R8) + bilateral blur; dispatched per-frame between G-Buffer and Lighting (Phase 5 ✅) |
| Frustum culling | None (all objects drawn unconditionally) |
| Occlusion culling | None |
| PBR BRDF | Cook-Torrance GGX (D), Smith (G), Schlick (F) — shared via brdf_common.slang include (Phase 1) |
The point light loop now reached from
DeferredRasterLightingPassRecorder::record() performs per light:
vkCmdClearAttachments— stencil clearvkCmdBindPipeline— stencil volume pipelinevkCmdBindDescriptorSets— lighting descriptorsvkCmdPushConstants— light push constantsvkCmdDraw(36)— stencil cube (36 vertices)vkCmdBindPipeline— point light shading pipelinevkCmdBindDescriptorSets— lighting descriptors (redundant rebind)vkCmdPushConstants— light push constants (duplicate)vkCmdDraw(3)— fullscreen triangle
At 4 lights: 8 draw calls + 4 stencil clears = 12 GPU commands.
At 1000 lights: 2000 draw calls + 1000 stencil clears = 3000 GPU commands — completely command-buffer-bound.
| Gap | Severity | Impact |
|---|---|---|
Hard cap at 4 point lights (kMaxDeferredPointLights) |
Critical | Cannot exceed 4 lights without changing UBO layout, shader, and C++ struct |
| Per-light draw call loop (stencil mark + shade) | Critical | O(N) pipeline binds and draw calls; unusable beyond ~32 lights |
| No light culling | High | Every light is evaluated for every pixel inside its stencil volume |
| No frustum culling for geometry | High | All meshes drawn regardless of camera visibility |
| No indirect/multi-draw batching | Medium | One draw call per mesh per pass |
| Gap | Severity | Impact | Status |
|---|---|---|---|
| No shadows at all | Critical | Scene has no grounding, objects float, no depth cues | ✅ Phase 3 complete |
0.03) |
✅ Phase 5 complete | ||
| ✅ Phase 5 complete | |||
| ~~Wastes ~16 bytes/pixel bandwidth~~ | ✅ Resolved (Phase 2) |
| Gap | Severity | Impact | Status |
|---|---|---|---|
✅ Resolved (Phase 1, extended to forward_transparent.slang) |
|||
LightingData struct layout shared between C++ and Slang via convention only |
✅ Resolved (static_asserts in SceneData.h for CameraData, LightingData, PointLightData, ShadowCascadeData, ShadowData, ObjectData) |
Goal: Eliminate shader code duplication and establish the pattern for shared shader includes.
Risk: Low — shader-only change, no C++ or pipeline modifications.
Status: ✅ Complete
-
Created
shaders/brdf_common.slang— shared BRDF functions (PI,SafeNormalize,DistributionGGX,GeometrySchlickGGX,GeometrySmith,FresnelSchlick) plusReconstructWorldPosition()(moved fromgbuffer_composite.slang). -
Created
shaders/lighting_structs.slang— shared struct definitions (CameraBuffer,PointLightData,LightingBuffer,ShadowCascadeData,ShadowBuffer). -
Updated 4 consumer shaders to
#includeinstead of duplicating:deferred_directional.slangpoint_light.slangpoint_light_stencil_debug.slanggbuffer_composite.slang
-
Updated
cmake/Shaders.cmake— added exclude filters forbrdf_common.slangandlighting_structs.slang(include-only files, not compiled directly).
- No duplicated BRDF functions remain in individual shader files
- All shaders compile without errors
- Rendered output is pixel-identical before and after
- Validation checks pass
Goal: Remove the world-position render target and reconstruct position from depth.
Risk: Medium — changes G-Buffer attachment layout, framebuffer creation, and multiple shaders.
Status: ✅ Complete
Prerequisite: Phase 1 (shared includes) ✅
Removed RT4 (world-position) from the G-Buffer. All shaders now reconstruct position from depth via ReconstructWorldPosition() in brdf_common.slang.
Files modified (7 shaders):
gbuffer.slang— removedPSOutput.positionoutput (5 → 4 MRTs)deferred_directional.slang— replacedgPositionTexturewithTexture2D<float> gDepthTexture(binding 6), usesdepth == 0.0as sky testpoint_light.slang— same depth reconstruction changepoint_light_stencil_debug.slang— same depth reconstruction changeforward_transparent.slang— removed position readgbuffer_composite.slang— removed position samplingpost_process.slang— removed position sampling
Files modified (C++ infrastructure):
FrameResources.h— removedAttachmentImage position, addedVkImageView depthSamplingViewFrameResourceManager.h/.cpp— removed position format, addedVK_IMAGE_USAGE_SAMPLED_BITto depth, created depth-only image view for shader sampling, updated framebuffers from 6 → 5 attachments, descriptor writes use depth viewRenderPassManager.h/.cpp— removedpositionFormatparameter, 6 → 5 attachments, 5 → 4 color refs, lighting depth layout changed toDEPTH_READ_ONLY_STENCIL_ATTACHMENT_OPTIMALRendererFrontend.cpp— updated to match new render pass/framebuffer sizesFrameRecorder.cpp— clear values 6 → 5GraphicsPipelineBuilder.cpp— blend state 5 → 4 color attachments
- G-Buffer uses 4 color attachments + depth instead of 5 + depth
- Position reconstruction matches previous world-position output
- No visual regression in lit scenes
- Validation checks pass
| Attachment | Format | Content |
|---|---|---|
| RT0 — Albedo | R8G8B8A8_UNORM |
Base color + alpha |
| RT1 — Normal | R16G16B16A16_SFLOAT |
World-space normal (encoded) |
| RT2 — Material | R16G16B16A16_SFLOAT |
Metallic (R), Roughness (G), Occlusion (B) |
| RT3 — Emissive | R16G16B16A16_SFLOAT |
Emissive RGB |
| Depth/Stencil | D32_S8 | Reverse-Z depth (sampled via depthSamplingView) + stencil |
Goal: Add directional light shadows via Cascaded Shadow Maps (CSM).
Risk: High — new render passes, new shaders, new descriptor bindings, changes to lighting shaders.
Status: ✅ Complete — all core implementation done, build verified, all 82 tests pass
Prerequisite: Phase 2 (G-Buffer optimization) ✅
Cascaded Shadow Maps split the camera frustum into 4 depth slices, each rendered from the directional light's perspective into a shadow atlas (D32_S8 2D array texture, 2048×2048, 4 layers). The lighting pass samples the shadow atlas with a comparison sampler and 3×3 PCF kernel.
// SceneData.h — implemented
inline constexpr uint32_t kShadowCascadeCount = 4;
inline constexpr uint32_t kShadowMapResolution = 2048;
struct ShadowCascadeData {
alignas(16) glm::mat4 viewProj{1.0f};
alignas(4) float splitDepth{0.0f};
alignas(4) float padding[3]{};
};
struct ShadowData {
ShadowCascadeData cascades[kShadowCascadeCount];
};
struct ShadowPushConstants {
uint32_t objectIndex{0};
uint32_t cascadeIndex{0};
};Matching Slang structs added to lighting_structs.slang (SHADOW_CASCADE_COUNT, ShadowCascadeData, ShadowBuffer).
| Step | Status | Details |
|---|---|---|
SceneData.h data structures |
✅ | kShadowCascadeCount, kShadowMapResolution, ShadowCascadeData, ShadowData, ShadowPushConstants |
lighting_structs.slang structs |
✅ | SHADOW_CASCADE_COUNT, ShadowCascadeData, ShadowBuffer |
shadow_common.slang |
✅ | SelectCascade(), SampleCascadeShadow() helper, ComputeShadowFactor() with 3×3 PCF, normal offset bias, slope-scaled bias, cascade blend overlap zone (10% blend fraction) |
shadow_depth.slang |
✅ | Vertex-only: ObjectBuffer SSBO + ShadowBuffer UBO + push constants (objectIndex, cascadeIndex) |
deferred_directional.slang shadow integration |
✅ | Bindings 7–9 (shadow UBO, atlas Texture2DArray, comparison sampler), computes viewDepth, calls ComputeShadowFactor(), multiplies directLighting |
RenderPassManager shadow render pass |
✅ | D32_S8 depth-only, CLEAR→STORE, UNDEFINED→SHADER_READ_ONLY_OPTIMAL, dependencies for shader read |
ShadowManager class |
✅ | Owns shadow atlas image (2D array, 4 layers), per-cascade views, comparison sampler (VK_COMPARE_OP_GREATER), shadow UBO, descriptor set (shadow UBO binding 0), framebuffers; update() computes cascade splits (practical scheme λ=0.75) and cascade viewProj matrices |
PipelineTypes.h |
✅ | Added VkPipeline shadowDepth, VkPipelineLayout shadow, VkDescriptorSetLayout shadow, VkRenderPass shadow |
GraphicsPipelineBuilder.cpp |
✅ | Shadow depth pipeline: vertex-only, front-face culling, depth bias enabled (constant=4.0, slope=1.5), uses shadow render pass; shadow pipeline layout: {scene, shadow} descriptor sets + ShadowPushConstants |
FrameResourceManager descriptor expansion |
✅ | Lighting layout expanded from 7 → 10 bindings (7=shadow UBO, 8=shadow atlas SAMPLED_IMAGE, 9=shadow comparison SAMPLER); pool sizes updated; updateDescriptorSets() accepts shadow params |
FrameRecorder render graph integration |
✅ | 4 ShadowCascade0..3 passes added between OitClear and Lighting; recordShadowPass() method renders all opaque geometry per cascade |
cmake/Shaders.cmake exclude filter |
✅ | shadow_common.slang excluded from compilation (include-only) |
src/CMakeLists.txt |
✅ | renderer/shadow/ShadowManager.cpp added to VulkanSceneRenderer_renderer library |
RendererFrontend wiring |
✅ | ShadowManager in OwnedSubsystems; createResources() + createFramebuffers() in initialize(); per-frame update() with camera/aspect/lightDirection; shadow descriptor set + framebuffers passed to FrameRecordParams; shadow UBO/atlas/sampler passed to updateDescriptorSets() |
shadow_depth.slang binding fix |
✅ | Corrected descriptor bindings to (1,0) for ObjectBuffer SSBO (scene set 0, binding 1) and (0,1) for ShadowBuffer UBO (shadow set 1, binding 0) |
| Build verification | ✅ | C++ build successful |
| Test verification | ✅ | All 82 existing tests pass |
| Cascade debug view | ✅ | GBufferViewMode::ShadowCascades = 11 — color-coded cascade visualization (green/blue/yellow/red) with blend zone indicator in post_process.slang |
| GUI controls | ⬜ | Shadow on/off toggle, bias controls, PCF kernel size (deferred) |
Descriptor Set 0 (scene): binding 0 = camera UBO, binding 1 = objects SSBO, ...
Descriptor Set 1 (shadow): binding 0 = ShadowBuffer UBO
Push Constants: { objectIndex, cascadeIndex }
Rasterization: front-face culling, depth bias (constant=4.0, slope=1.5)
Depth/Stencil: depth test + write enabled, GREATER_OR_EQUAL (reverse-Z)
Color: no color attachments
Binding 7 (set 0): ConstantBuffer<ShadowBuffer> — cascade viewProj + split depths
Binding 8 (set 0): Texture2DArray<float> — shadow atlas (4 layers)
Binding 9 (set 0): SamplerComparisonState — VK_COMPARE_OP_GREATER, linear PCF filtering
DepthPrepass → GBuffer → OitClear → ShadowCascade0 → ShadowCascade1
→ ShadowCascade2 → ShadowCascade3 → Lighting → OitResolve → PostProcess
Practical split: splitDepth[i] = lerp(near * pow(far/near, i/N), near + (far-near) * i/N, lambda)
lambda = 0.75
- Normal offset:
texelSize * 2.0along surface normal (inshadow_common.slang) - Slope-scaled bias:
max(0.005 * (1.0 - NdotL), 0.001)(in shader) - Pipeline depth bias:
depthBiasConstantFactor = 4.0,depthBiasSlopeFactor = 1.5(inGraphicsPipelineBuilder)
All core CSM infrastructure is fully wired:
- ✅
shadow_common.slangexcluded from Shaders.cmake (include-only file) - ✅
renderer/shadow/ShadowManager.cppadded tosrc/CMakeLists.txt - ✅
shadow_depth.slangdescriptor bindings corrected (binding(1,0)for ObjectBuffer,binding(0,1)for ShadowBuffer) - ✅
ShadowManagerwired intoRendererFrontend:std::unique_ptr<ShadowManager> shadowManagerinOwnedSubsystems- Created in
initialize()afterLightingManager, resources + framebuffers created descriptorSetLayout()passed toPipelineDescriptorLayouts.shadowresources_.renderPasses.shadowpassed toPipelineRenderPasses.shadow- Per-frame
update(camera, aspect, lightDirection)called before command recording shadowDescriptorSet+shadowFramebuffersset inbuildFrameRecordParams()- Shadow UBO, atlas view, and sampler passed to
updateDescriptorSets() - Destroyed in
shutdown()alongside other subsystems
- ✅ Tests updated for new
RenderPassHandles.shadowandshadowDescriptorSetfields - ✅ Build verified, all 82 tests pass
To eliminate hard cuts between cascade levels, ComputeShadowFactor() in shadow_common.slang now detects when a fragment is near a cascade split boundary and blends between the current and next cascade:
- Blend fraction:
CASCADE_BLEND_FRACTION = 0.1(10% of each cascade's depth range) - Blend zone detection:
blendStart = splitDepths[cascadeIndex] - blendZonewhereblendZone = cascadeRange * CASCADE_BLEND_FRACTION - Smooth interpolation:
blendFactor = saturate((viewDepth - blendStart) / blendZone)— linearly interpolates betweenSampleCascadeShadow()results from the current and next cascade - Helper function:
SampleCascadeShadow()extracts single-cascade PCF sampling (normal offset bias + slope-scaled bias + 3×3 PCF) for reuse by the blend logic
A new G-Buffer debug view mode (GBufferViewMode::ShadowCascades = 11) provides visual verification of cascade coverage and blend zones:
- Depth linearization: Converts reverse-Z depth to linear view depth via
viewZ = (near * far) / (far - depth * (far - near)) - Cascade color coding: Green (cascade 0), Blue (cascade 1), Yellow (cascade 2), Red (cascade 3)
- Blend zone indicator: Pixels in the blend overlap zone are brightened to show transition regions
- Scene overlay: Color-coded cascade index composited over a dimmed scene (0.3× albedo)
- Push constants:
PostProcessPushConstantsextended withcameraNear,cameraFar, andcascadeSplits[4]to pass cascade split depths to the post-process shader
- GUI controls: shadow on/off toggle, bias parameter tuning, PCF kernel size
- 4 cascade shadow maps render correctly (verified with shadow cascade debug view)
- Shadow acne is eliminated with bias settings (triple bias: hardware depth bias + normal offset + slope-scaled)
- Peter panning is minimal (front-face culling + moderate bias values)
- Cascade transitions are smooth (10% blend overlap zone with linear interpolation)
- Performance: shadow pass adds < 2ms at 2048² per cascade on mid-range GPU (requires runtime profiling)
- All existing tests pass (84/84)
Goal: Replace the per-light stencil draw loop with a GPU-driven tiled/clustered culling system, enabling hundreds to thousands of dynamic point lights.
Risk: High — new compute pipeline, major changes to lighting data layout and shaders.
Estimated effort: Large (3–5 days)
Prerequisites: Phase 1 (shared includes) ✅, Phase 2 (depth reconstruction) ✅
The current per-light stencil loop generates 2N + N Vulkan commands for N lights (2 draw calls + 1 stencil clear each). This is command-processor-bound and fundamentally cannot scale.
Divide the screen into 16×16 pixel tiles. A compute shader tests each light sphere against each tile frustum. Each tile builds a light index list. The lighting fragment shader reads only its tile's list.
Extend tiles into the Z dimension (depth slices), creating 3D clusters. Better for scenes with wide depth range and many overlapping lights. More complex but similar shader structure.
// SceneData.h — updated
inline constexpr uint32_t kMaxClusteredLights = 4096;
inline constexpr uint32_t kTileSize = 16; // pixels
inline constexpr uint32_t kMaxLightsPerTile = 256;
struct PointLightData {
alignas(16) glm::vec4 positionRadius{0.0f, 0.0f, 0.0f, 1.0f};
alignas(16) glm::vec4 colorIntensity{1.0f, 1.0f, 1.0f, 1.0f};
};
// No change to struct, but stored in SSBO instead of UBO.
struct TileLightGrid {
uint32_t offset; // into global light index list
uint32_t count; // number of lights affecting this tile
};| Resource | Type | Description |
|---|---|---|
| Light SSBO | VkBuffer (storage) |
PointLightData[kMaxClusteredLights] — all scene lights |
| Tile light grid | VkBuffer (storage) |
TileLightGrid[tileCountX * tileCountY] |
| Light index list | VkBuffer (storage) |
uint32_t[] — packed light indices per tile |
| Tile culling compute pipeline | VkPipeline (compute) |
Bins lights into tiles |
| Shader | Description |
|---|---|
shaders/tile_light_cull.slang |
Compute shader: one workgroup per tile; loads tile frustum planes; tests all lights; writes light indices to tile grid |
shaders/tiled_lighting.slang |
Fragment shader: replaces point_light.slang; reads tile grid for pixel's tile; loops over tile's lights; accumulates Cook-Torrance |
-
✅ Added
kMaxClusteredLights,kTileSize,kMaxLightsPerTileconstants inSceneData.h. AddedTileLightGrid,TileCullPushConstants,TiledLightingPushConstantsstructs. Mirrored inlighting_structs.slang. -
✅ Updated
LightingManager:createTiledResources()allocates light SSBO, tile grid SSBO, light index list SSBOupdateLightingData()mirrors point lights topointLightsSsbo_vectordispatchTileCull()uploads SSBO, updates compute set0 descriptors, dispatches compute, inserts barrier- Compute pipeline + 3 descriptor set layouts (camera+depth, light SSBO, tile grid + index list)
- Fragment-side tiled descriptor set (light SSBO + tile grid + index list)
-
✅ Created
shaders/tile_light_cull.slangcompute shader:- One workgroup (16×16) per tile, shared memory for min/max depth + light indices
- Sphere-vs-AABB culling in world space using inverse view-projection
- Fixed-stride tile allocation (tileIndex × MAX_LIGHTS_PER_TILE)
-
✅ Created
shaders/tiled_lighting.slangfragment shader:- Fullscreen triangle, reads G-Buffer + tile grid, loops tile lights
- Cook-Torrance BRDF from
brdf_common.slang
-
✅ Updated
FrameRecorder::buildGraph():- Added
TileCullpass (compute dispatch) between shadow cascades and Lighting - Lighting pass auto-selects tiled path when ready, falls back to stencil loop
- Pipeline barrier compute→fragment on tile SSBOs
- Added
-
Stencil volume infrastructure kept as fallback — tiled path is primary, stencil is auto-selected when tiled resources are not available.
-
Light spawning for testing — deferred to future work (GUI controls).
- ✅
PipelineManager: addedcreateComputePipeline()method - ✅
cmake/Shaders.cmake: added compute shader entry point (computeMain) compilation - ✅
PipelineTypes.h: addedtiledLightinglayout,tiledPointLightpipeline,tileddescriptor layout - ✅
GraphicsPipelineBuilder: loads tiled_lighting vert/frag modules, creates tiled pipeline - ✅
FrameRecordParams: addedtiledDescriptorSet,cameraBuffer,cameraBufferSize,gBufferSampler - ✅
RendererFrontend: wirescreateTiledResources(), tiled descriptor set, camera buffer, sampler
| Metric | Target |
|---|---|
| 64 lights | < 0.5 ms total (cull + shade) |
| 256 lights | < 1.5 ms total |
| 1024 lights | < 4 ms total |
| 4096 lights | < 10 ms total (at 1080p on mid-range GPU) |
- Tile culling compute shader correctly bins lights (verified with tile light heat map debug view:
GBufferViewMode::TileLightHeatMap = 12, tile light count → heat map color) w- [ ] Visual output matches per-light stencil approach for ≤ 4 lights - Stable 60 FPS with 256+ lights at 1080p
- No light popping at tile boundaries (conservative 5% radius padding on sphere-vs-AABB test)
- All existing tests pass (84/84)
- C++ build successful
- All shaders compile (Slang → SPIR-V)
Goal: Replace the flat ambient term with physically-based environment lighting and screen-space ambient occlusion.
Risk: Medium — new textures and shader modifications, but no fundamental pipeline changes.
Status: ✅ Complete — all core implementation done and build verified
Prerequisites: Phase 2 (G-Buffer optimization) ✅, Phase 3 (shadows) ✅
Phase 5 adds two major lighting quality improvements:
- Part A — IBL: Split-sum ambient lighting using irradiance cubemap, pre-filtered specular cubemap, and BRDF integration LUT.
- Part B — GTAO: Screen-space ambient occlusion via compute shader (half-resolution) + bilateral blur.
-
✅ Created
EnvironmentManagerclass (include/Container/renderer/lighting/EnvironmentManager.h,src/renderer/lighting/EnvironmentManager.cpp):- Owns BRDF LUT (512×512 RG16F), placeholder irradiance/prefiltered cubemaps (1×1 white), all samplers
createResources()generates BRDF LUT via one-time compute dispatch at startupcreatePlaceholderCubemaps()creates white 1×1 RGBA16F cubemaps (6 layers) — ready for HDR environment loading- Compute pipeline for BRDF LUT generation with descriptor pool/set/layout
-
✅ Created preprocessing compute shaders (compiled to SPIR-V, ready for HDR environment pipeline):
shaders/brdf_lut.slang— Hammersley sampling, importance-sampled GGX, IBL geometry termshaders/equirect_to_cubemap.slang— equirectangular→cubemap face conversionshaders/irradiance_convolution.slang— hemisphere integral for diffuse irradianceshaders/prefilter_specular.slang— importance-sampled pre-filtered specular per mip level
-
✅ Updated
deferred_directional.slang— full split-sum IBL ambient:- Bindings 10-14: irradiance cubemap, prefiltered cubemap, BRDF LUT, env sampler, BRDF sampler
FresnelSchlickRoughness()for energy-conserving kS/kD split- Diffuse IBL from irradiance cubemap, specular IBL from prefiltered cubemap + BRDF LUT
ambientLighting = (kD * diffuseIBL + specularIBL) * occlusion * ssao
-
✅ Updated
FrameResourceManager— lighting descriptor layout expanded to 17 bindings:- Bindings 10-14: IBL textures and samplers
- Bindings 15-16: AO texture and sampler
- Descriptor pool sizes updated, fallback descriptors for uninitialized resources
-
✅ Updated
RendererFrontend—EnvironmentManagerfully wired:- Created in
initialize()afterShadowManager createResources()called with executable directory (shader path)createGtaoResources()called ininitializeScene()recreateGtaoTextures()called on resize- IBL + AO resources passed to
updateDescriptorSets()
- Created in
-
✅ Created
shaders/gtao.slangcompute shader:- Half-resolution R8 output, 8×8 workgroups
- Fibonacci spiral sampling in normal-oriented hemisphere
- Interleaved gradient noise for spatial jitter
- World-space depth reconstruction + range falloff
- Push constants: aoRadius, aoIntensity, sampleCount, fullWidth, fullHeight
-
✅ Created
shaders/gtao_blur.slangcompute shader:- 5×5 bilateral blur (Gaussian spatial × depth-aware edge preservation)
- Push constants: width, height, depthThreshold
-
✅ GTAO resources in
EnvironmentManager:- Two half-res R8 textures (raw AO + blurred result)
- Compute pipelines + descriptor sets for GTAO and blur
dispatchGtao()updates descriptors per-frame, dispatches compute, inserts barrierdispatchGtaoBlur()dispatches blur, inserts compute→fragment barrier- Configurable settings:
aoRadius(0.5),aoIntensity(1.5),aoSampleCount(16),aoEnabled(true)
-
✅ GTAO pass in render graph (
FrameRecorder::buildGraph()):GTAOpass added betweenTileCullandLighting- Dispatches GTAO compute + blur when
isGtaoReady()is true - Reads depth (via
depthSamplingView) and normals from G-Buffer
-
✅ AO bound in lighting pass — binding 15 (SAMPLED_IMAGE) + binding 16 (SAMPLER) in lighting descriptor set, sampled in
deferred_directional.slangasgAOTexture -
✅
cmake/Shaders.cmake— all Phase 5 shaders excluded from vert/frag glob, added as explicit compute entries -
✅
src/CMakeLists.txt—renderer/lighting/EnvironmentManager.cppadded toVulkanSceneRenderer_renderer
DepthPrepass → GBuffer → OitClear → ShadowCascade0..3
→ TileCull → GTAO (+ Blur) → Lighting → OitResolve → PostProcess
| Binding | Type | Content | Phase |
|---|---|---|---|
| 0 | UBO | CameraBuffer | 2 |
| 1 | Sampler | G-Buffer sampler | 2 |
| 2-5 | SAMPLED_IMAGE | Albedo, Normal, Material, Emissive | 2 |
| 6 | SAMPLED_IMAGE | Depth (via depthSamplingView) | 2 |
| 7 | UBO | ShadowBuffer | 3 |
| 8 | SAMPLED_IMAGE | Shadow atlas (Texture2DArray) | 3 |
| 9 | Sampler | Shadow comparison sampler | 3 |
| 10 | SAMPLED_IMAGE | Irradiance cubemap | 5 |
| 11 | SAMPLED_IMAGE | Pre-filtered specular cubemap | 5 |
| 12 | SAMPLED_IMAGE | BRDF LUT | 5 |
| 13 | Sampler | Environment cubemap sampler | 5 |
| 14 | Sampler | BRDF LUT sampler | 5 |
| 15 | SAMPLED_IMAGE | AO texture (blurred GTAO) | 5 |
| 16 | Sampler | AO sampler | 5 |
- HDR environment map loading (
.hdr/.exrequirectangular → cubemap → irradiance/prefiltered via compute) ✅ Implemented —EnvironmentManager::loadHdrEnvironment()loads.exrvia tinyexr, runsequirect_to_cubemap,irradiance_convolution, andprefilter_specularcompute shaders. Default HDR loaded at startup:hdr/citrus_orchard_road_puresky_4k.exr(seeAppConfig::kDefaultEnvironmentHdrRelativePath). - GUI controls: AO radius, intensity, sample count, on/off toggle
- GUI controls: IBL intensity, environment map selection
- Temporal GTAO for reduced noise
- Screen-space reflections (SSR) complement to IBL
- IBL produces visible environment reflections on metallic surfaces (requires HDR environment map)
- Rough dielectrics receive soft diffuse irradiance (requires HDR environment map)
- SSAO darkens crevices, contact edges, and concavities
- AO does not produce halo artifacts at object edges
- All existing tests pass (84/84)
- C++ build successful
- All shaders compile (Slang → SPIR-V)
Goal: Reduce CPU draw call overhead via frustum culling, Hi-Z occlusion culling, and multi-draw indirect.
Risk: High — requires compute pipelines, indirect draw buffers, and changes to scene data upload.
Estimated effort: Large (3–5 days)
Prerequisites: All prior phases
Status: ✅ Complete
Currently every mesh is drawn unconditionally in every pass (depth prepass, G-Buffer, shadow passes). For scenes with 1000+ objects, CPU-side loop and draw call overhead becomes the bottleneck.
ObjectData— replacedalignas(8) glm::vec2 paddingwithalignas(4) float padding0+alignas(16) glm::vec4 boundingSphere(xyz = world-space center, w = radius)- All 10 vertex shaders updated to match:
float padding0; float4 boundingSphere;inObjectBuffer - Added
GpuDrawIndexedIndirectCommand(matchesVkDrawIndexedIndirectCommand),CullPushConstants,HiZPushConstantsstructs
SceneController::syncObjectDataFromSceneGraph()— computes world-space bounding spheres from primitive vertex AABB → local center + radius → transform by worldTransform with max scale factor
- All 10 vertex shaders migrated from
pc.objectIndexpush constant toSV_InstanceID(maps togl_InstanceIndex) - All
vkCmdDrawIndexedcall sites updated to passdc.objectIndexas thefirstInstanceparameter - This enables both direct draws (CPU loop) and indirect draws (
VkDrawIndexedIndirectCommand.firstInstance) without shader branching - Shadow shader:
cascadeIndexremains in push constants;objectIndexviafirstInstance
- Header:
include/Container/renderer/culling/GpuCullManager.h - Implementation:
src/renderer/culling/GpuCullManager.cpp - Frustum cull pipeline: 5-binding descriptor set (camera UBO, object SSBO, input draws, output draws, draw count) + push constants for object count
- Occlusion cull pipeline: 8-binding descriptor set (camera UBO, object SSBO, input draws, output draws, occlusion count, Hi-Z sampled image, frustum draw count, Hi-Z sampler) + push constants
- Hi-Z generation pipeline: 3-binding descriptor set (source sampled image, sampler, destination storage image) + push constants for mip dimensions
- Buffer management: input (HOST_SEQUENTIAL_WRITE), output (DEVICE_LOCAL + INDIRECT), count (DEVICE_LOCAL + INDIRECT + TRANSFER_DST), occlusion output/count (same layout)
- Hi-Z image: R32_SFLOAT with full mip chain, per-mip storage views, nearest-clamp sampler
- Dispatch flow: FrustumCull → DepthPrepass → HiZGenerate (per-mip with barriers) → OcclusionCull → GBuffer
drawIndirect(): issuesvkCmdDrawIndexedIndirectCountwith frustum-culled resultsdrawIndirectOccluded(): issuesvkCmdDrawIndexedIndirectCountwith occlusion-culled results
| Shader | Purpose |
|---|---|
shaders/frustum_cull.slang |
Gribb-Hartmann frustum plane extraction, sphere-plane test, atomic draw count, 64 threads/group |
shaders/hiz_generate.slang |
Min-reduction depth mip pyramid (8×8, reverse-Z aware, per-mip compute dispatch) |
shaders/occlusion_cull.slang |
Full Hi-Z occlusion culling: sphere-to-screen projection, mip selection, conservative 4-corner sampling, reverse-Z comparison |
- Pass order: FrustumCull → DepthPrepass → HiZGenerate → OcclusionCull → CullStatsReadback → GBuffer → OitClear → ShadowCascades → ...
- FrustumCull uploads draw commands, dispatches compute, produces indirect buffer
- DepthPrepass uses
gpuCullManager_->drawIndirect(cmd)(frustum-culled) - HiZGenerate transitions depth to readable, dispatches per-mip Hi-Z generation, transitions depth back
- OcclusionCull dispatches occlusion cull against Hi-Z pyramid using frustum-culled input
- CullStatsReadback copies frustum + occlusion draw counts to HOST_VISIBLE staging buffer (1-frame latency readback, no GPU stall)
- GBuffer uses
gpuCullManager_->drawIndirectOccluded(cmd)(occlusion-culled) - ShadowCascades use
gpuCullManager_->drawIndirect(cmd)(frustum-culled, single indirect call per cascade) - Diagnostic cube always drawn via direct draw (separate geometry buffer)
CullStatsstruct:totalInputCount,frustumPassedCount,occlusionPassedCountscheduleStatsReadback(cmd): barrier (compute→transfer), 2×vkCmdCopyBufferfrom count buffers to HOST_VISIBLE staging, barrier (transfer→host)collectStats(): reads mapped staging pointer at frame start (after fence), populateslastStats_- Stats exposed in ImGui via
GuiManager::setCullStats()— shows input count, per-stage passed/culled counts
- Purpose: Verify culling correctness by freezing the culling camera and freely moving the rendering camera to inspect what was culled.
- F8 keybinding or ImGui checkbox toggles the freeze state.
- When frozen:
GpuCullManager::freezeCulling(cmd, ...)snapshots the live camera UBO into afrozenCameraBuffer_viavkCmdCopyBuffer. SubsequentdispatchFrustumCull/dispatchOcclusionCulluse the frozen buffer for descriptor binding 0 instead of the live camera. - When unfrozen:
unfreezeCulling()clears the flag; cull dispatches resume using the live camera. - ImGui shows orange "CULLING FROZEN" indicator when active.
- Bidirectional sync between F8 key toggle and ImGui checkbox via
presentSceneControls().
cmake/Shaders.cmake— exclude filters + explicit compute entries for 3 new shaderssrc/CMakeLists.txt— addedrenderer/culling/GpuCullManager.cpp
| File | Change |
|---|---|
include/Container/utility/SceneData.h |
ObjectData.boundingSphere, new GPU-driven structs |
src/renderer/scene/SceneController.cpp |
Bounding sphere computation |
include/Container/renderer/culling/GpuCullManager.h |
New — GPU cull manager class |
src/renderer/culling/GpuCullManager.cpp |
New — full implementation |
shaders/frustum_cull.slang |
New — frustum cull compute |
shaders/hiz_generate.slang |
New — Hi-Z mip generation |
shaders/occlusion_cull.slang |
New — full Hi-Z occlusion cull (sphere projection, mip selection, 4-corner sampling) |
shaders/depth_prepass.slang |
ObjectBuffer + SV_InstanceID |
shaders/gbuffer.slang |
ObjectBuffer + SV_InstanceID |
shaders/shadow_depth.slang |
ObjectBuffer + SV_InstanceID |
shaders/forward_transparent.slang |
ObjectBuffer + SV_InstanceID |
shaders/geometry_debug.slang |
ObjectBuffer + SV_InstanceID |
shaders/normal_validation.slang |
ObjectBuffer + SV_InstanceID |
shaders/object_normals.slang |
ObjectBuffer + SV_InstanceID |
shaders/surface_normals.slang |
ObjectBuffer + SV_InstanceID |
shaders/wireframe_debug.slang |
ObjectBuffer + SV_InstanceID |
shaders/wireframe_fallback.slang |
ObjectBuffer + SV_InstanceID |
src/renderer/debug/DebugOverlayRenderer.cpp |
firstInstance = dc.objectIndex in all draw calls |
src/renderer/core/FrameRecorder.cpp |
GpuCullManager wiring, FrustumCull pass, indirect draw paths |
include/Container/renderer/core/FrameRecorder.h |
GpuCullManager params |
include/Container/renderer/core/RendererFrontend.h |
GpuCullManager ownership |
src/renderer/core/RendererFrontend.cpp |
GpuCullManager creation + wiring, collectStats() at frame start, F8 freeze toggle |
include/Container/utility/GuiManager.h |
setCullStats(), setFreezeCulling(), freezeCullingRequested() methods |
src/utility/GuiManager.cpp |
Stats display, freeze checkbox + orange indicator in Scene Controls |
include/Container/renderer/debug/DebugRenderState.h |
freezeCulling + freezeCullingKeyDown fields |
include/Container/renderer/core/FrameRecorder.h |
debugFreezeCulling in FrameRecordParams |
cmake/Shaders.cmake |
3 new compute shader entries |
src/CMakeLists.txt |
GpuCullManager.cpp source |
| Scene | Current | Target |
|---|---|---|
| 100 objects | ~100 draw calls/pass | 1 indirect call/pass |
| 1000 objects | ~1000 draw calls/pass | 1 indirect call/pass, ~200 drawn after culling |
| 10,000 objects | Not viable | 1 indirect call/pass, ~500 drawn after culling |
- Frustum culling correctly hides off-screen objects (freeze-camera debug mode implemented: F8 toggle)
- Hi-Z occlusion culling correctly hides fully occluded objects
- Draw call count drops to 1–2 per pass regardless of object count
- No visual popping or missing objects
- All existing tests pass
- C++ build successful
- All shaders compile (Slang → SPIR-V)
| File | Role | Phase |
|---|---|---|
include/Container/utility/SceneData.h |
LightingData, PointLightData, ShadowData, ShadowCascadeData, ShadowPushConstants, TileLightGrid, TileCullPushConstants, constants |
1, 3, 4 |
include/Container/renderer/lighting/LightingManager.h |
LightingManager class, SceneLightingAnchor, LightPushConstants, tiled culling |
4 |
src/renderer/lighting/LightingManager.cpp |
Light data computation, descriptor setup, gizmo drawing, tiled culling dispatch | 4 |
include/Container/renderer/shadow/ShadowManager.h |
ShadowManager class — owns shadow atlas, UBO, sampler, framebuffers, cascade computation |
3 |
src/renderer/shadow/ShadowManager.cpp |
Shadow manager implementation — cascade splits, viewProj, VMA image/buffer allocation | 3 |
include/Container/renderer/lighting/EnvironmentManager.h |
EnvironmentManager class — BRDF LUT, placeholder cubemaps, GTAO compute dispatch, IBL accessors |
5 |
src/renderer/lighting/EnvironmentManager.cpp |
IBL resource creation (BRDF LUT compute, cubemaps, samplers), GTAO pipelines/textures, per-frame dispatch | 5 |
include/Container/renderer/core/FrameRecorder.h |
FrameRecordParams (shadow/tiled/camera/gpuCullManager params), recordShadowPass() |
3, 4, 6 |
src/renderer/core/FrameRecorder.cpp |
Render pass recording; shadow cascade + tile cull + GTAO + frustum cull + indirect draw + lighting passes in render graph | 3, 4, 5, 6 |
include/Container/renderer/resources/FrameResources.h |
Per-frame GPU resources (framebuffers, descriptor sets, depthSamplingView) |
2 |
include/Container/renderer/resources/FrameResourceManager.h |
Descriptor layout creation (17-binding lighting layout with shadow 7–9, IBL 10–14, AO 15–16) | 2, 3, 5 |
src/renderer/resources/FrameResourceManager.cpp |
Attachment/framebuffer/descriptor creation, updateDescriptorSets() with shadow/IBL/AO params |
2, 3, 5 |
include/Container/renderer/core/RenderPassManager.h |
RenderPasses struct (includes shadow render pass) |
3 |
src/renderer/core/RenderPassManager.cpp |
Render pass creation (depth prepass, G-Buffer, shadow, lighting, post-process) | 2, 3 |
include/Container/renderer/pipeline/PipelineTypes.h |
PipelineLayouts (shadow, tiledLighting), GraphicsPipelines (shadowDepth, tiledPointLight), descriptor layouts |
3, 4 |
include/Container/renderer/pipeline/GraphicsPipelineBuilder.h |
Pipeline builder interface | 3 |
src/renderer/pipeline/GraphicsPipelineBuilder.cpp |
Shadow depth + tiled lighting pipeline creation | 3, 4 |
include/Container/renderer/core/RendererFrontend.h |
OwnedSubsystems (shadowManager, environmentManager, gpuCullManager), buildFrameRecordParams() |
3, 5, 6 |
src/renderer/core/RendererFrontend.cpp |
ShadowManager + EnvironmentManager + GpuCullManager wiring, IBL/AO descriptor updates | 3, 5, 6 |
| File | Role | Phase |
|---|---|---|
shaders/brdf_common.slang |
Shared BRDF functions + FresnelSchlickRoughness() + ReconstructWorldPosition() (include-only) |
1, 5 |
shaders/lighting_structs.slang |
Shared struct definitions: CameraBuffer, LightingBuffer, ShadowBuffer, TileLightGrid (include-only) |
1, 4 |
shaders/shadow_common.slang |
SelectCascade(), SampleCascadeShadow(), ComputeShadowFactor() with 3×3 PCF + cascade blend (include-only) |
3 |
shaders/shadow_depth.slang |
Vertex-only shadow depth shader (ObjectBuffer SSBO + ShadowBuffer UBO + push constants) | 3 |
shaders/tile_light_cull.slang |
Compute shader: tiled light culling (16×16 tiles, sphere-vs-AABB) | 4 |
shaders/tiled_lighting.slang |
Fragment shader: tiled deferred point light shading | 4 |
shaders/brdf_lut.slang |
Compute shader: BRDF integration LUT (512×512, RG16F, split-sum) | 5 |
shaders/equirect_to_cubemap.slang |
Compute shader: equirectangular→cubemap face conversion | 5 |
shaders/irradiance_convolution.slang |
Compute shader: hemisphere integral for diffuse irradiance cubemap | 5 |
shaders/prefilter_specular.slang |
Compute shader: importance-sampled pre-filtered specular per mip | 5 |
shaders/gtao.slang |
Compute shader: GTAO half-res ambient occlusion | 5 |
shaders/gtao_blur.slang |
Compute shader: bilateral blur for GTAO denoising | 5 |
shaders/gbuffer.slang |
G-Buffer fill (4 MRT output) | 2 |
shaders/deferred_directional.slang |
Directional light fullscreen pass (Cook-Torrance + CSM shadows + IBL + SSAO, bindings 0–16) | 1, 2, 3, 5 |
shaders/point_light.slang |
Point light fullscreen pass (depth reconstruction, shared BRDF) | 1, 2 |
shaders/point_light_stencil_debug.slang |
Debug visualization of point light stencil volumes | 1 |
shaders/light_stencil.slang |
Stencil volume cube vertex shader | — |
shaders/light_gizmo.slang |
Light source gizmo visualization | — |
shaders/gbuffer_composite.slang |
Post-process composite (depth reconstruction, shared BRDF) | 1, 2 |
shaders/post_process.slang |
Tone mapping, final output, bloom composite, shadow cascade debug view (mode 11), tile light heat map (mode 12) | 2, Bloom, 3, 4 |
shaders/surface_normal_common.slang |
Shared normal utilities (include-only) | — |
shaders/frustum_cull.slang |
Compute shader: frustum culling (Gribb-Hartmann, sphere test, indirect buffer output) | 6 |
shaders/hiz_generate.slang |
Compute shader: Hi-Z depth mip pyramid (min reduction, reverse-Z) | 6 |
shaders/occlusion_cull.slang |
Compute shader: Hi-Z occlusion culling (sphere projection, mip selection, conservative sampling) | 6 |
shaders/bloom_downsample.slang |
Compute shader: bloom brightness extraction + 13-tap progressive downsample (Jimenez 2014) | Bloom |
shaders/bloom_upsample.slang |
Compute shader: bloom 9-tap tent filter upsample with additive blend | Bloom |
| Phase | File | Type | Status |
|---|---|---|---|
| 6 | shaders/frustum_cull.slang |
Compute shader | ✅ Created |
| 6 | shaders/hiz_generate.slang |
Compute shader | ✅ Created |
| 6 | shaders/occlusion_cull.slang |
Compute shader | ✅ Created (full Hi-Z implementation) |
| 6 | include/Container/renderer/culling/GpuCullManager.h |
C++ header | ✅ Created |
| 6 | src/renderer/culling/GpuCullManager.cpp |
C++ source | ✅ Created |
| Feature | File | Type | Status |
|---|---|---|---|
| Bloom | shaders/bloom_downsample.slang |
Compute shader | ✅ Created (13-tap downsample, Jimenez 2014, soft knee threshold, Karis 2014) |
| Bloom | shaders/bloom_upsample.slang |
Compute shader | ✅ Created (9-tap tent filter upsample, additive blend) |
| Bloom | include/Container/renderer/effects/BloomManager.h |
C++ header | ✅ Created |
| Bloom | src/renderer/effects/BloomManager.cpp |
C++ source | ✅ Created |
Status: ✅ Complete
Physically-motivated bloom using a dual-filter downsample/upsample mip chain (Jimenez, "Next Generation Post Processing in Call of Duty: Advanced Warfare", SIGGRAPH 2014). Bright regions of the HDR scene color are extracted with a soft knee threshold (Karis 2014) and progressively blurred through a 6-level mip pyramid, then composited back into the scene before tone mapping.
-
Brightness extraction + Downsample chain (compute shader,
bloom_downsample.slang):- Mip 0 applies soft-knee threshold:
max(0, brightness - threshold + knee)² / (4 * knee) - Each subsequent mip uses a 13-tap filter (Jimenez 2014) for high-quality progressive downsample
- 6 mip levels at successive half-resolutions (
R16G16B16A16_SFLOAT)
- Mip 0 applies soft-knee threshold:
-
Upsample chain (compute shader,
bloom_upsample.slang):- 9-tap tent filter (3×3 bilinear) for smooth upsampling
- Additive blend with destination mip at each level
- Final pass writes result to mip 0
-
Compositing (in
post_process.slang):- Bloom result (mip 0) sampled and added to HDR scene color before ACES tone mapping
- Controlled by
bloomEnabledandbloomIntensitypush constant fields
... → OIT Resolve → Bloom → Post-Process → ...
The Bloom pass sits between OIT Resolve and Post-Process. It transitions the
scene color attachment from COLOR_ATTACHMENT_OPTIMAL to SHADER_READ_ONLY_OPTIMAL,
runs the downsample/upsample compute dispatches, then the post-process shader
samples the bloom result via descriptor set bindings 7–8.
| Control | Range | Default | Description |
|---|---|---|---|
| Bloom Enabled | checkbox | ✅ on | Enable/disable the entire bloom pass |
| Bloom Threshold | 0.0 – 5.0 | 1.0 | HDR brightness threshold for extraction |
| Bloom Knee | 0.0 – 1.0 | 0.1 | Soft knee width (smooth threshold transition) |
| Bloom Intensity | 0.0 – 2.0 | 0.3 | Strength of bloom contribution to final image |
| Bloom Radius | 0.1 – 3.0 | 1.0 | Filter radius for upsample tent filter |
| File | Changes |
|---|---|
shaders/post_process.slang |
Added bloom texture/sampler bindings (7–8), bloom compositing before tone mapping, shadow cascade debug view (outputMode 11) |
include/Container/utility/SceneData.h |
Extended PostProcessPushConstants with bloomEnabled, bloomIntensity, cameraNear, cameraFar, cascadeSplits[4], tileCountX, totalLights |
src/renderer/core/FrameRecorder.cpp |
Added Bloom render graph pass, bloom push constant fields |
include/Container/renderer/core/FrameRecorder.h |
Added BloomManager* to FrameRecordParams and constructor |
src/renderer/core/RendererFrontend.cpp |
BloomManager lifecycle, descriptor updates, GUI bidirectional sync |
include/Container/renderer/core/RendererFrontend.h |
Added unique_ptr<BloomManager> to OwnedSubsystems |
src/renderer/resources/FrameResourceManager.cpp |
Extended post-process descriptor layout (7→9 bindings), pool sizes, descriptor writes |
include/Container/renderer/resources/FrameResourceManager.h |
Extended updateDescriptorSets() signature |
src/utility/GuiManager.cpp |
Added bloom GUI section (checkbox + sliders), setBloomSettings() |
include/Container/utility/GuiManager.h |
Added bloom settings members and accessors |
cmake/Shaders.cmake |
Added bloom shader compilation entries |
src/CMakeLists.txt |
Added renderer/effects/BloomManager.cpp |
- Olsson & Assarsson, "Tiled Shading" (2011) — foundational paper on tiled deferred
- Olsson, Billeter & Assarsson, "Clustered Deferred and Forward Shading" (2012)
- Epic Games / Unreal Engine 5 — Lumen uses screen-space probes + clustered forward for local lights
- id Software / DOOM (2016) — textbook clustered forward implementation
- Dimitrov & Wimmer, "Cascaded Shadow Maps" (ShaderX5, 2006)
- Microsoft DirectX 11 CSM sample — practical split scheme and stabilization
- Reverse-Z CSM — use
VK_COMPARE_OP_GREATERfor better depth precision
- Karis, "Real Shading in Unreal Engine 4" (SIGGRAPH 2013) — split-sum approximation
- Lagarde & de Rousiers, "Moving Frostbite to PBR" (SIGGRAPH 2014)
- Jimenez et al., "Practical Real-Time Strategies for Accurate Indirect Occlusion" (SIGGRAPH 2016) — GTAO
- Bavoil & Sainz, "Screen Space Ambient Occlusion" (NVIDIA, 2008) — original SSAO/HBAO
- Wihlidal, "Optimizing the Graphics Pipeline with Compute" (GDC 2016)
- Ubisoft, "GPU-Driven Rendering Pipelines" (SIGGRAPH 2015)
- Jimenez, "Next Generation Post Processing in Call of Duty: Advanced Warfare" (SIGGRAPH 2014) — dual-filter downsample/upsample
- Karis, "Real Shading in Unreal Engine 4" (SIGGRAPH 2013) — soft knee threshold for bloom extraction