feat: [cuda] performance improvement reducers for axis=None and lazy parents allocation
#8451
| Job | Run time |
|---|---|
| 1s | |
| 1s |
axis=None and lazy parents allocation
#8451
| Job | Run time |
|---|---|
| 1s | |
| 1s |