Skip to content

Commit cd99e7a

Browse files
fix #575 use a flag to enable large-kernel algo
1 parent f101f97 commit cd99e7a

File tree

3 files changed

+260
-182
lines changed

3 files changed

+260
-182
lines changed

CHANGELOG.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## [2.3.5] - 2023-03-24
44
### Fixed
5-
- pypi project reach size limit, so we need to assign a new version number.
5+
- use a flag to enable large kernel algo (need time to compile at runtime)
66

77
## [2.3.4] - 2023-03-23
88
### Added

docs/PERFORMANCE_GUIDE.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -26,4 +26,6 @@
2626
* spconv 2.x in Windows 10 is 1.5x~2x slower than Linux. use Linux if possible.
2727
* If you train with float32 and ampere or later GPUs, you can set ```spconv.constants.SPCONV_ALLOW_TF32``` to enable faster fp32 training.
2828
See [benchmark](BENCHMARK.md) for more performance details of different algorithms.
29-
* Different CUDA version of spconv may have different performance. Use newest cuda version if possible. For example, spconv-cu117 is faster than spconv-cu114, spconv-cu114 is faster than spconv-cu111.
29+
* Different CUDA version of spconv may have different performance. Use newest cuda version if possible. For example, spconv-cu117 is faster than spconv-cu114, spconv-cu114 is faster than spconv-cu111.
30+
* if your kernel size volume larger than 32, spconv will use a slower (and more inaccurate in fp16) algorithm. to use a faster algo for large kernel size (need time to compile at runtime), use ```large_kernel_fast_algo=True```
31+
* use ```SparseGlobalMaxPool``` instead of use large kernel size when you need global pool.

0 commit comments

Comments
 (0)