Bugfixes, benchmarks and improvements to FlatMap by kennyweiss · Pull Request #1882 · llnl/axom

kennyweiss · 2026-06-11T04:08:29Z

Summary

This PR adds some bugfixes and performance improvements to axom::FlatMap
It also adds an initial benchmark suite for FlatMap against std::unordered_map, google sparsehash and std::map
Bugfixes:
- There were some bugs related to truncating hashes to 32 bits (when IndexType is 32 bits), and in casting from float to int, and in using operator[] on const maps and in the copy-assign operator.
Optimizations
- Since the hashes are powers of 2, we can use bitmasks rather than mod (%)
- Specialized batch insertion for sequential exec policy, where we don't need to worry about synchronization

Results/comparisons

Serial benchmark results using a RelWithDebInfo config (lower is better)

We now get roughly comparable or better results in serial -- compared to std::unordered_map, std::map and our vendored google sparsehash. FlatMap is our default hash function, FlatMapFastHash is a different hash function that appears to be somewhat faster.

Hashing 32K pairs ($2^{15}$)

Hashing 1M pairs ($2^{20}$)

Serial vs. OMP vs GPU

This branch has some modest speedups vs. develop (showing serial and omp for this branch against axom@develop)
Showing SEQ and OMP with {1,2,4,8,16,32,64} threads and run with

OMP_NUM_THREADS=<n> OMP_PLACES=cores OMP_PROC_BIND=close

Hashing 32K pairs ($2^{15}$)

Hashing 1M pairs ($2^{20}$)

Hashing 32M pairs ($2^{25}$)

Adds typed tests covering assignment over a non-empty target, source preservation, and self-assignment.

Removing it cannot break callers since this would not have compiled. Const callers should use find()/at()/count()/contains(). at() throws std::out_of_range on a missing key.

DeviceHashHelper returned axom::IndexType and integer keys were converted before the 64-bit mixer ran. With AXOM_USE_64BIT_INDEXTYPE=OFF every key wider than 32 bits is truncated first, so keys equal mod 2^32 produce identical final hashes. This was happening in the Morton codes in spin's SparseOctreeLevel and in numerics/quadrature.

The floating-point specialization returned the key converted to an integer. Every key sharing an integer part therefore collided -- e.g. all numbers between -1 and 1 converted to the integer 0, so a FlatMap keyed on fractional floats degenerated into one probe chain with O(size) inserts and finds

The quadratic probe advance in probeIndex and probeEmptyIndex wrapped using a mod (%) operator. Since the group count is always a power of two, we can use a bitmask instead. Adds a cross-group probe stress test: a degenerate hash drives 600 keys through one initial group so inserts, lookups, misses, erases, and reinserts all walk and wrap the group sequence.

BM_Find_Hit looks keys up in the order they were inserted. Since node-based maps walk the heap nearly sequentially, the hardware prefetcher hides their pointer-chasing latency. This commit adds find_hit_shuffled (same keys, independently shuffled lookup order) and find_hit_randkeys (distinct pseudorandom 64-bit keys, shuffled lookup order) to better exhibit expected lookup behavior.

When find_with_hash() in not inlined, every lookup is more expensive (extra registers, and a stack spill for the key) and requires loop-invariant setup that cannot be hoisted out of the caller's lookup loop. Forcing the probe path inline removed 20-40% of find_hit time and 15-35% of find_miss time for FlatMap<int64,int64> at n = 2^16 and 2^20.

`getEmplacePos()` computed `Hash{}(key)`, then called `find(key)`, which hashed the same key a second time. It then performed a floating-point division against MAX_LOAD_FACTOR on every insertion to decide whether to grow. Note: This reduced instruction count but the performance improvements within run-to-run noise in our measurements.

FlatMap rounds its group count up to a power of two, so for a fixed element count the achievable load factors form a geometric ladder and a nominal target is quantized to the next rung at or below it. At n = 2^16 the 0.70 target and the default reserve(n) geometry coincide (actual load factor 0.533, which is why find_hit_lf0p70 reproduced find_hit to within noise), and the 0.50 target lands at 0.267 -- a table twice as large. That scenario was really measuring a larger working set, not a shorter probe sequence.

The SSE2 path of GroupBucket::visitHashBucket() stops visiting as soon as the visitor returns false, but the scalar fallback (including GPU path) ignored the return value and kept scanning all 15 slots. In-tree visitors and the duplicate check in the batched insert path return false to mean 'stop', and extra visits load and compare a key which could incur a cache miss per probe group.

Emplacing a new key walked the probe sequence twice -- first to check for a key and then to find an empty slot within the key. We now do both within a single call.

* Disables sequential find_hit search by default since it is not representative. * Guards several tests by the feature they are testing

Also adds more device hashing tests

…ft and masking

Also improves device hashing of floating point types (float and long double).

… including omp

kennyweiss added 10 commits June 10, 2026 18:37

Fix FlatMap copy assignment -- need to compare addresses, not values

6e2ab90

Adds typed tests covering assignment over a non-empty target, source preservation, and self-assignment.

Remove FlatMap's const operator[], which inserts for missing keys

b04e96d

Removing it cannot break callers since this would not have compiled. Const callers should use find()/at()/count()/contains(). at() throws std::out_of_range on a missing key.

Adds initial benchmark for flatmap vs map vs unordered_map vs sparsehash

59c191b

Improves performance of FlatMap batched insertion for SEQ policy

4d2b750

Adds FlatMap benchmarks for hits and misses of precached entities

1be08c9

Exploring faster hash functions

0d23442

Adds benchmark for flatmap load factor

1bbdc90

kennyweiss self-assigned this Jun 11, 2026

kennyweiss added bug Something isn't working Core Issues related to Axom's 'core' component Performance Issues related to code performance labels Jun 11, 2026

kennyweiss added 16 commits June 11, 2026 00:07

Fixes hip build via missing AXOM_HOST_DEVICE

55e4a99

FlatMap: Fuse the find and empty-slot probes in getEmplacePos()

7f9cada

Emplacing a new key walked the probe sequence twice -- first to check for a key and then to find an empty slot within the key. We now do both within a single call.

FlatMap: Keep move semantics during batch insertion

30fde63

Improves FlatMap benchmark

f7fff6c

* Disables sequential find_hit search by default since it is not representative. * Guards several tests by the feature they are testing

FlatMap: Device hash type must be 64 bits

d1dfef6

Also adds more device hashing tests

Moves AXOM_FORCE_INLINE to core's Macros.hpp

d73b677

Adds utility function for initializing initial probe group via bitshi…

c885eb2

…ft and masking

FlatMap: Improves documentation and testing of find_with_hash

78f96b7

Also improves device hashing of floating point types (float and long double).

Adds benchmarks for device contruction and lookup

59f602b

FlatMap: Generalizes the device benchmarks to other execution spaces,…

7af4c53

… including omp

Add number of threads to omp benchmarks

8c3ff60

Updates RELEASE-NOTES

52e6658

kennyweiss requested review from Arlie-Capps, BradWhitlock, bmhan12, jcs15c, nselliott, publixsubfan, rhornung67 and white238 June 12, 2026 22:11

kennyweiss marked this pull request as ready for review June 12, 2026 22:11

kennyweiss mentioned this pull request Jun 13, 2026

Bugfix and optimization for 1D Array/ArrayView #1884

Open

kennyweiss added this to the FY26 August release milestone Jun 13, 2026

Bugfix for rzvector -- if constexpr needs an else

8ae7955

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfixes, benchmarks and improvements to FlatMap#1882

Bugfixes, benchmarks and improvements to FlatMap#1882
kennyweiss wants to merge 28 commits into
developfrom
feature/kweiss/flatmap-improvements

kennyweiss commented Jun 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kennyweiss commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Results/comparisons

Serial benchmark results using a RelWithDebInfo config (lower is better)

Hashing 32K pairs ($2^{15}$)

Hashing 1M pairs ($2^{20}$)

Serial vs. OMP vs GPU

Hashing 32K pairs ($2^{15}$)

Hashing 1M pairs ($2^{20}$)

Hashing 32M pairs ($2^{25}$)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kennyweiss commented Jun 11, 2026 •

edited

Loading