Tagline: ReLU vs Leaky ReLU vs GELU in a PointNet baseline — accuracy, stability, and latency under a fair, single-switch setup.
- Leaky ReLU and GELU tied for best accuracy (89.98%) on ModelNet10 in this setup.
- ReLU delivered the fastest inference (2.4× faster than GELU) and is the most deployment/INT8-friendly.
- GELU trained stably but was significantly slower on this backbone.
Quantify the impact of ReLU, Leaky ReLU (α = 0.01), and GELU (tanh approximation) on accuracy, training stability, and inference latency in a PointNet classifier trained on ModelNet10.
| Activation | Best Acc (%) | Latency (ms/batch) | Latency (ms/sample) | Speed vs ReLU |
|---|---|---|---|---|
| RELU | 89.76 | 28.64 ± 1.99 | 0.224 | 1.00× |
| LEAKY | 89.98 | 39.82 ± 2.17 | 0.311 | 0.72× |
| GELU | 89.98 | 69.86 ± 1.66 | 0.546 | 0.41× |
Key takeaways:
- Accuracy: Leaky ReLU and GELU tied at 89.98% (+0.22 pp vs ReLU).
- Speed winner: ReLU (2.4× faster than GELU per batch on a T4).
- Stability: All three converge smoothly; GELU shows slightly lower final gradient norms (0.67 vs 0.74-0.77).
- Model: Simplified PointNet with a single activation switch (applied identically in backbone and head).
- Dataset: ModelNet10 (~5k CAD meshes), 1,024 points per shape (surface sampling, normalized).
- Training: 10 epochs, Adam (lr = 0.001), batch size 128, StepLR at epoch 20 (γ = 0.5).
- Initialization: He/Kaiming for all runs.
- Seed: 42 (single-seed; see Limitations).
- Hardware: Google Colab, Tesla T4.
Fair-comparison guarantees:
- Same activation everywhere (no mixing).
- Same initialization, seed, optimizer, schedule, and augmentations.
- Latency measured with warmup,
model.eval(), and CUDA synchronization; both per-batch and per-sample reported. - Dead units tracked per-channel (a channel counted "dead" if it never activates on the evaluation set), not element-wise sparsity.
-
Accuracy
Leaky ReLU and GELU both achieved 89.98%, tying for best accuracy. ReLU reached 89.76% (only 0.22 pp behind). The differences are within typical single-seed variance, suggesting all three activations perform comparably on this task. -
Latency
ReLU is consistently fastest and fuses well in typical deployment stacks. GELU's tanh/erf math introduces 2.4× overhead vs ReLU. Leaky adds 39% overhead vs ReLU. -
Training stability
All three converge smoothly. Final gradient norms: ReLU 0.74, Leaky 0.77, GELU 0.67 (lowest, indicating smoother optimization). -
Dead units
With proper initialization and normalization, dead ReLUs were negligible: 0.0% of channels never activated on the eval set. Leaky's benefit is mainly insurance under tougher conditions (e.g., poorer inits, stronger imbalance).
- Choose for: tight latency budgets, embedded/edge deployment, and INT8 quantization.
- Why: fastest (28.64 ms/batch), widely optimized, quantization-friendly.
- This study: 89.76% accuracy, 0.224 ms/sample, 0.0% dead neurons.
- Choose for: slight robustness boost against dead units or mild class imbalance, with a modest latency cost.
- This study: 89.98% accuracy (tied best), 39.82 ms/batch, +39% latency vs ReLU.
- Choose for: attention/Transformer-style blocks or tasks where smooth gating helps and latency is less critical.
- Caveat: typically less INT8-friendly without QAT or special kernels.
- This study: 89.98% accuracy (tied best), 69.86 ms/batch, +144% latency vs ReLU.
With only 0.22% accuracy difference between ReLU (89.76%) and the tied leaders (89.98%), the practical choice depends on your constraints:
- Latency-critical (real-time)? → ReLU (2.4× faster than GELU, 0.0% dead neurons with proper init)
- Slight accuracy/robustness boost acceptable? → Leaky ReLU (39% slower but tied for best)
- Transformer/attention modules? → GELU (smooth gradients, but 2.4× slower)
- Single-notebook pipeline (Colab) with sections: setup → data → model (activation switch) → training → evaluation → profiling → plots.
- Save checkpoints and logs after each run.
- Warm up before timing; report both ms/batch and ms/sample with the same batch size across runs.
- CAD vs LiDAR: ModelNet10 meshes are CAD; results generalize directionally, not absolutely, to LiDAR perception.
- Single seed: Numbers can shift slightly; run three seeds for tighter confidence intervals.
- Short training (10 epochs): Longer training may increase absolute accuracy but is unlikely to change relative ranking.
- Batch size 128: Latency numbers scale with batch size; measure on target hardware for final decisions.
On this PointNet baseline and setup: Leaky ReLU and GELU tied for best accuracy (89.98%), ReLU delivered the best latency (28.64 ms/batch) and deployability, and all three trained stably.
For real-time perception stacks: Start with ReLU (fastest, 0.0% dead neurons with He init). Consider Leaky if you observe dead-unit issues or class imbalance in your specific dataset. Reserve GELU for attention-heavy modules where its smooth gradients provide tangible benefits and latency budgets allow.
Practical recommendation: Given the marginal 0.22% accuracy difference and 2.4× speed advantage, ReLU is the pragmatic choice for production PointNet-style models unless your application specifically demands the robustness of Leaky or smoothness of GELU.



