fix #575 use a flag to enable large-kernel algo

FindDefinition · FindDefinition · commit cd99e7a63be2 · 2023-03-24T00:30:45.000+08:00
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,7 +2,7 @@
 
 ## [2.3.5] - 2023-03-24
 ### Fixed 
-- pypi project reach size limit, so we need to assign a new version number.
+- use a flag to enable large kernel algo (need time to compile at runtime)
 
 ## [2.3.4] - 2023-03-23
 ### Added 
diff --git a/docs/PERFORMANCE_GUIDE.md b/docs/PERFORMANCE_GUIDE.md
@@ -26,4 +26,6 @@
 * spconv 2.x in Windows 10 is 1.5x~2x slower than Linux. use Linux if possible.
 * If you train with float32 and ampere or later GPUs, you can set ```spconv.constants.SPCONV_ALLOW_TF32``` to enable faster fp32 training.
 See [benchmark](BENCHMARK.md) for more performance details of different algorithms.
-* Different CUDA version of spconv may have different performance. Use newest cuda version if possible. For example, spconv-cu117 is faster than spconv-cu114, spconv-cu114 is faster than spconv-cu111.
+* Different CUDA version of spconv may have different performance. Use newest cuda version if possible. For example, spconv-cu117 is faster than spconv-cu114, spconv-cu114 is faster than spconv-cu111.
+* if your kernel size volume larger than 32, spconv will use a slower (and more inaccurate in fp16) algorithm. to use a faster algo for large kernel size (need time to compile at runtime), use ```large_kernel_fast_algo=True```
+* use ```SparseGlobalMaxPool``` instead of use large kernel size when you need global pool.
diff --git a/spconv/pytorch/conv.py b/spconv/pytorch/conv.py