Skip to content

Commit e84fcbf

Browse files
author
BiomeOS Developer
committed
fix: Remove orphaned doc comments from normalization files
1 parent a77d4a5 commit e84fcbf

3 files changed

Lines changed: 3 additions & 33 deletions

File tree

showcase/gpu-universal/ml-inference/src/wgpu/normalization/groupnorm.rs

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -268,7 +268,5 @@ impl WgpuExecutor {
268268
self.read_buffer(&staging_buffer, total_size).await
269269
}
270270

271-
/// Execute Instance Normalization
272-
///
273-
/// Normalizes each instance (batch sample) independently across spatial dimensions.
271+
}
274272
}

showcase/gpu-universal/ml-inference/src/wgpu/normalization/instance_norm.rs

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -167,13 +167,5 @@ impl WgpuExecutor {
167167
self.read_buffer(&staging_buffer, total_size).await
168168
}
169169

170-
/// Execute RMS Normalization
171-
///
172-
/// Simpler alternative to LayerNorm used in modern transformers.
173-
/// RMSNorm(x) = x / sqrt(mean(x²) + epsilon) * gamma
174-
///
175-
/// No mean subtraction, only RMS scaling - faster and simpler than LayerNorm.
176-
/// Used in: LLaMA, GPT-NeoX, T5, modern large language models.
177-
///
178-
/// Deep Debt: Runtime dimensions, learnable scale parameters.
170+
}
179171
}

showcase/gpu-universal/ml-inference/src/wgpu/normalization/rms_norm.rs

Lines changed: 1 addition & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -143,26 +143,6 @@ impl WgpuExecutor {
143143
self.read_buffer(&staging_buffer, total_size).await
144144
}
145145

146-
/// Execute Fused LayerNorm: SINGLE-PASS layer normalization
147-
///
148-
/// **BREAKTHROUGH OPTIMIZATION**: Combines all 3 passes into ONE kernel launch!
149-
///
150-
/// Previous (3-pass):
151-
/// - Pass 1: Compute partial stats → launch overhead + sync
152-
/// - Pass 2: Finalize stats → launch overhead + sync
153-
/// - Pass 3: Normalize → launch overhead + sync
154-
/// - Total: 3x launch overhead + 2x global sync
155-
///
156-
/// Fused (1-pass):
157-
/// - Single kernel launch with Welford's algorithm in shared memory
158-
/// - Immediate normalization (no intermediate global memory)
159-
/// - Grid-stride loop for large inputs
160-
/// - Total: 1x launch overhead + 0x global sync
161-
///
162-
/// **Expected Speedup**: 8-12x for LLaMA-scale (118ms → 10-15ms)
163-
///
164-
/// **Memory Pattern**: Streaming (one read, one write, no intermediate buffers)
165-
///
166-
/// Formula: output = (input - mean) / sqrt(variance + epsilon) * gamma + beta
146+
}
167147
}
168148
}

0 commit comments

Comments
 (0)