Skip to content

experiment(value): evaluate flat ObjectMap backends#1826

Draft
lukesteensen wants to merge 8 commits into
vectordotdev:mainfrom
lukesteensen:experiment/objectmap-backends
Draft

experiment(value): evaluate flat ObjectMap backends#1826
lukesteensen wants to merge 8 commits into
vectordotdev:mainfrom
lukesteensen:experiment/objectmap-backends

Conversation

@lukesteensen

Copy link
Copy Markdown
Member

Summary

This is an experimental PR evaluating alternative ObjectMap storage layouts.

The goal is to improve memory locality and clone/allocation efficiency for VRL
object values. Today ObjectMap is effectively BTree-backed, which gives good
lookup behavior but poor locality and expensive structural clones. This branch
adds enum-backed ObjectMap variants so we can compare the current BTree layout
against flatter vector-backed layouts.

The enum is mostly experimental scaffolding. It lets us compare designs, but it
also adds cost through larger object size and extra branching. If we identify a
winning representation, implementing that representation directly should be
better than keeping enum dispatch in the hot path.

What changed

  • Added enum-backed ObjectMap storage variants.
  • Added flat/vector-backed ObjectMap experiments.
  • Moved many call sites away from assuming ObjectMap is a BTreeMap.
  • Added targeted ObjectMap benchmarks:
    • objectmap
    • objectmap_cliff
    • objectmap_hybrid
  • Added independent KeyString construction cleanup and benchmarks.

Benchmark takeaways

The isolated benchmarks show the expected tradeoff:

  • Flat maps have a clear width cliff for isolated lookup/update operations.
  • Hit lookups are only competitive at very small widths.
  • Miss lookups cross over around ~128 fields.
  • Building wide flat maps from scratch is worse due to repeated linear scans.

However, the more realistic cloned-event benchmarks are much more promising:

  • Flat storage benefits from cheaper clones and better memory locality.
  • objectmap_cliff/realistic_event favors flat maps from roughly width 16 onward.
  • objectmap_cliff/realistic_event_readonly strongly favors flat maps because
    clones remain cheap and no mutation forces extra work.

So the experiment suggests the core hypothesis is valid: improving memory
locality and clone efficiency can outweigh worse isolated lookup behavior in
realistic VRL/event workloads.

KeyString changes

This branch also includes independent KeyString cleanup.

Several call sites were constructing temporary Strings only to immediately
convert them into KeyString. With today’s String-backed KeyString, the
impact is small because LLVM can often optimize the extra work away. The cleanup
becomes more important if we later move KeyString to an SSO/refcounted backing
type, where direct construction from &str/Cow<str> can avoid heap allocation.

These changes are separable from the ObjectMap experiment, but came out of the
same performance investigation.

API compatibility

This is technically a breaking change.

ObjectMap / Value::Object have exposed BTreeMap-specific behavior and
construction patterns publicly. External code that constructs, destructures, or
uses Value::Object as a BTreeMap will need to move to ObjectMap APIs
instead.

This is part of the experiment: hiding the backing map type is necessary if we
want freedom to change the representation.

Validation

Local checks run:

cargo fmt --check
cargo test --features 'default test'
cargo bench --features 'default test' --bench objectmap_cliff --no-run
cargo bench --features 'default test' --bench objectmap_hybrid --no-run
cargo bench --features 'default test' --bench objectmap_cliff -- --noplot

Open questions

  • Have we sufficiently captured workloads that are adversarial to this design,
    such that we would have confidence turning it on by default?
  • Do we need to maintain the enum-backed implementation for optionality / opt-in
    despite its cost, or should we switch to a single winning representation once
    the design is chosen?

lukesteensen and others added 8 commits June 17, 2026 15:44
Many call sites were creating a String allocation only to immediately
convert it to KeyString. This is wasteful because KeyString::from(&str)
and KeyString::from(Cow<str>) can construct directly without the
intermediate heap allocation.

The most impactful fix is in crud/insert.rs where field.to_string().into()
was called on every VRL path insertion — field is Cow<str> (usually
Borrowed), so this was allocating on every event for no reason.

Also fixes ~20 other sites across stdlib parsers (parse_syslog,
parse_grok, parse_key_value, flatten, unflatten, tally, etc.) and the
value! macro where literal strings went through String::from() first.

These changes are independently valuable with String-backed KeyString
(saves one allocation per conversion) and become even more important
with SSO string types where the &str path can inline short strings
with zero allocation.
Four benchmark groups for comparing KeyString backing types:
- keystring_micro: construction, clone, roundtrip costs
- path_ops: owned vs JIT path traversal
- vrl_programs: end-to-end VRL execution (remap, parse_syslog, flatten, object construction)
- json_deser: serde_json::Value intermediate vs direct deserialization
Rename from_long_str/clone_long to from_medium_str/clone_medium (22B, within
CompactString inline but spills EcoString). Add true from_long_str/clone_long
at 31B that spills both SSO types for a fair heap-allocated comparison.
@github-actions github-actions Bot added the docs review on hold PR is pending a docs team review label Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs review on hold PR is pending a docs team review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant