Commit f3b53ed
ex_*stencil_cpu: add legacy-transposed pass; BatchAccessor: document §8l
Added a tap-outer, voxel-inner variant of the Legacy path (same
leaf-only ReadAccessor, same probeLeaf + getValue mechanics, just the
nested loops swapped) as a new `legacy-transposed` benchmark pass in
both examples. Checksums match the voxel-outer `legacy` pass byte-
for-byte on both synthetic and narrow-band workloads.
During the experiment we hit a GCC inlining pitfall: a runtime-args
inner lambda `[&](int di, int dj, int dk)` invoked 18 times via
parameter-pack fold did *not* get inlined (lambda body contains a
128-iteration loop; GCC's inline budget × 18 is exhausted). Result:
18 explicit call instructions to a 542-byte processTap function with
6-register prologue/epilogue per call, plus tap offsets becoming
runtime register arguments (one spilled to stack) — accounting for
~13 % of the observed slowdown vs. Legacy. Fix is a templated lambda
`[&]<int DI, int DJ, int DK>() [[gnu::always_inline]]` dispatched via
`.template operator()<...>()`. Standalone processTap symbol vanishes;
transposed body grows 4.4 → 9.8 KB, matching Legacy's 10.5 KB.
Measured at ~32M active voxels on i9-285K (24 threads):
- Narrowband taperLER: Legacy 2.2 ns/vox vs Transposed 2.1 ns/vox
(marginal, within the ~10 % noise floor)
- Synthetic 64M/50%: Legacy 2.4 ns/vox vs Transposed 2.8 ns/vox
(+19 %, outside noise)
Implementation verdict: LegacyStencilAccessor's voxel-outer moveTo
stays the default. Tap-outer has no consistent perf advantage, and
voxel-outer wins on cleanliness (self-contained accessor, no scratch
arrays, no compiler-inlining fragility, 1:1 mapping to the stencil-
operator mental model). `legacy-transposed` kept as a benchmark pass
for future reference.
BatchAccessor.md §8l captures the experiment, the inlining-pitfall
lesson, the measurement matrix, and the implementation-quality
rationale behind keeping voxel-outer as the production default.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Efty Sifakis <esifakis@nvidia.com>1 parent 24c2de7 commit f3b53ed
3 files changed
Lines changed: 238 additions & 5 deletions
File tree
- nanovdb/nanovdb
- examples
- ex_narrowband_stencil_cpu
- ex_stencil_gather_cpu
- util
Lines changed: 82 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
356 | 356 | | |
357 | 357 | | |
358 | 358 | | |
359 | | - | |
| 359 | + | |
| 360 | + | |
360 | 361 | | |
361 | 362 | | |
362 | 363 | | |
| |||
539 | 540 | | |
540 | 541 | | |
541 | 542 | | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
542 | 616 | | |
543 | 617 | | |
544 | 618 | | |
| |||
552 | 626 | | |
553 | 627 | | |
554 | 628 | | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
555 | 632 | | |
556 | 633 | | |
557 | | - | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
558 | 637 | | |
559 | 638 | | |
560 | 639 | | |
| |||
574 | 653 | | |
575 | 654 | | |
576 | 655 | | |
577 | | - | |
| 656 | + | |
578 | 657 | | |
579 | 658 | | |
580 | 659 | | |
| |||
Lines changed: 81 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
262 | 262 | | |
263 | 263 | | |
264 | 264 | | |
265 | | - | |
| 265 | + | |
| 266 | + | |
266 | 267 | | |
267 | 268 | | |
268 | 269 | | |
| |||
445 | 446 | | |
446 | 447 | | |
447 | 448 | | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
448 | 522 | | |
449 | 523 | | |
450 | 524 | | |
| |||
458 | 532 | | |
459 | 533 | | |
460 | 534 | | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
461 | 538 | | |
462 | 539 | | |
463 | | - | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
464 | 543 | | |
465 | 544 | | |
466 | 545 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1467 | 1467 | | |
1468 | 1468 | | |
1469 | 1469 | | |
| 1470 | + | |
| 1471 | + | |
| 1472 | + | |
| 1473 | + | |
| 1474 | + | |
| 1475 | + | |
| 1476 | + | |
| 1477 | + | |
| 1478 | + | |
| 1479 | + | |
| 1480 | + | |
| 1481 | + | |
| 1482 | + | |
| 1483 | + | |
| 1484 | + | |
| 1485 | + | |
| 1486 | + | |
| 1487 | + | |
| 1488 | + | |
| 1489 | + | |
| 1490 | + | |
| 1491 | + | |
| 1492 | + | |
| 1493 | + | |
| 1494 | + | |
| 1495 | + | |
| 1496 | + | |
| 1497 | + | |
| 1498 | + | |
| 1499 | + | |
| 1500 | + | |
| 1501 | + | |
| 1502 | + | |
| 1503 | + | |
| 1504 | + | |
| 1505 | + | |
| 1506 | + | |
| 1507 | + | |
| 1508 | + | |
| 1509 | + | |
| 1510 | + | |
| 1511 | + | |
| 1512 | + | |
| 1513 | + | |
| 1514 | + | |
| 1515 | + | |
| 1516 | + | |
| 1517 | + | |
| 1518 | + | |
| 1519 | + | |
| 1520 | + | |
| 1521 | + | |
| 1522 | + | |
| 1523 | + | |
| 1524 | + | |
| 1525 | + | |
| 1526 | + | |
| 1527 | + | |
| 1528 | + | |
| 1529 | + | |
| 1530 | + | |
| 1531 | + | |
| 1532 | + | |
| 1533 | + | |
| 1534 | + | |
1470 | 1535 | | |
1471 | 1536 | | |
1472 | 1537 | | |
| |||
1559 | 1624 | | |
1560 | 1625 | | |
1561 | 1626 | | |
| 1627 | + | |
| 1628 | + | |
| 1629 | + | |
| 1630 | + | |
| 1631 | + | |
| 1632 | + | |
| 1633 | + | |
| 1634 | + | |
| 1635 | + | |
| 1636 | + | |
1562 | 1637 | | |
1563 | 1638 | | |
1564 | 1639 | | |
| |||
0 commit comments