61 lines (49 loc) · 2.65 KB

Advanced Investigation 11: GPU-Driven Pipeline Building Blocks

1. Lecture Focus

Concept: Compose culling, compaction, bucketing, and argument-like generation.
Why this matters: Demonstrates practical GPU-driven thinking similar to modern rendering architectures.
Central question: What bottlenecks appear when primitive stages are composed into a submission-ready pipeline?

2. Learning Objectives

By the end of this investigation, you should be able to:

justify why this systems-level problem matters in practical GPU pipelines
design a controlled benchmark matrix with clear independent variables
interpret results without confusing correlation and causation
extract design rules and limitations suitable for portfolio presentation

3. Theory Primer (Lecture Notes)

Start with a pipeline-level mental model, not just a kernel-level view.
Identify resource bottlenecks: memory traffic, synchronization, occupancy pressure, and control-flow efficiency.
Separate algorithmic cost from implementation artifacts.
Record assumptions and known unknowns before running the benchmarks.

4. Hypothesis

Scan/compaction stages and list construction dominate unless ordering and buffering are tuned.

5. Experimental Design

Independent variables

Stage variants, visibility ratios, bucket strategy, ordering coherence.

Controlled variables

Fixed benchmark harness and timing method (GPU timestamp queries).
Fixed data generation seeds per scenario where reproducibility is needed.
Fixed correctness oracle per variant.

Metrics

Per-stage runtime, end-to-end throughput, bottleneck share.

6. Implementation Plan

Implement minimally correct baseline variant first.
Add one optimized variant at a time to preserve causal clarity.
Add deterministic correctness tests and edge-case datasets.
Run warmup plus repeated measured runs for each matrix point.
Export raw data and metadata to versioned result files.
Generate charts and write a short interpretation section with caveats.

7. Analysis Prompts

Which stage or operation dominates total cost and why?
Which tuning parameter is most sensitive?
Which findings are likely architecture-dependent?
What would change in a production rendering/compute pipeline?

8. Deliverables

Pipeline stage table, bottleneck map, architecture notes.

Minimum artifact set:

one core chart
one summary table
one short conclusions page with limitations

9. Portfolio Framing Notes

Frame conclusions as measured observations plus reasoned interpretation.
Avoid claiming universal behavior from one GPU unless cross-GPU validated.
Highlight tradeoffs and failure modes, not just best numbers.