Description
tl;dr should the usage of hash maps in wasmparser
be entirely replaced with BTree{Map,Set}
-based equivalents?
Some history here looks like:
- Originally
wasmparser
was written with hash-based collections everywhere - When
no_std
support was added this required using thehashbrown
crate for these maps - Later btree-based collection support was added to handle the use case where
no_std
environments can't seed their maps with a randomness source meaning thatno_std
usage was theoretically susceptible to collision attacks, where btree-based collections don't have this weakness. - Later the way features in this space were updated
The current state we then now have today is that wasmparser
has its own set of collections but they are conditionally backed by hash maps or btree maps. This isn't great for wasmparser
as the wrapper collections are nontrivial to have and maintain. Ideally they wouldn't exist at all!
Hash maps are considered not suitable as "only support this" due to the risk of collisions in no_std
environments. The question this issue poses is could the btree collections be considered to be used 100% of the time with no options for hash maps? The main benefit of this would be that we could remove all the wrapper collections and simplify the features in the wasmparser
crate here by simply always using btree-based maps. This works in no_std
environments (at least those with an allocator) and others alike.
Personally I think the main decision here will come down to benchmarks. Put another way - is it expected for uses of wasmparser
and large-ish wasm modules to strongly benefit from hash-based collections instead of btree-based collections? Benchmarking is starting to indicate this but the measurements there don't necessarily capture the full picture, so I wanted to write down some more questions here:
- Is there a size of wasm module (probably in terms of export/import count) where hash maps perform measurably better than btree maps?
- Is there a similar limit/size for components? (components are heavier on maps than core wasm)
- Are there memory-size benefits to btree maps over hash maps?
- To what degree does this performance affect embedders such as Wasmtime? For example not only during validation but also general operation with the wasm module/component?
In the limit I suspect that many of these questions, if not all, are basically un-answerable. They're ones worth considering but I don't think there's necessarily going to be a definitive answer one way or the other. Nevertheless I wanted to open an issue for this at least as a place of discussion for this point.
cc @Robbepop
Activity