Skip to content

fix(enrichment tables): preserve memory enrichment table state on reload#25547

Open
esensar wants to merge 19 commits into
vectordotdev:masterfrom
esensar:fix/memory-table-reload-state
Open

fix(enrichment tables): preserve memory enrichment table state on reload#25547
esensar wants to merge 19 commits into
vectordotdev:masterfrom
esensar:fix/memory-table-reload-state

Conversation

@esensar

@esensar esensar commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Summary

While working on #25143, it was brought to my attention that reload was not handled for memory tables (#25143 (comment)) - that is, new components were generated, that were not attached to the tables that were queried. This PR resolves that by making these tables take over the state of previous components.

Vector configuration

enrichment_tables:
  memory_table:
    type: memory
    ttl: 60
    flush_interval: 5
    inputs: ["cache_generator"]


sources:
  demo_logs_test:
    type: "demo_logs"
    format: "json"

transforms:
  demo_logs_processor:
    type: "remap"
    inputs: ["demo_logs_test"]
    source: |
      . = parse_json!(.message)
      user_id = get!(., path: ["user-identifier"])

      existing, err = get_enrichment_table_record("memory_table", { "key": user_id })

      if err == null {
        . = existing.value
        .source = "cache"
      } else {
        .referer = parse_url!(.referer)
        .referer.host = encode_punycode!(.referer.host)
        .source = "transform"
      }      

  cache_generator:
    type: "remap"
    inputs: ["demo_logs_processor"]
    source: |
      existing, err = get_enrichment_table_record("memory_table", { "key": get!(., path: ["user-identifier"]) })
      if err != null {
        data = .
        . = set!(value: {}, path: [get!(data, path: ["user-identifier"])], data: data)
      } else {
        . = {}
      }      

sinks:
  console:
    inputs: ["demo_logs_processor"]
    target: "stdout"
    type: "console"
    encoding:
      codec: "json"

How did you test this PR?

Ran vector with the above configuration and the --watch-config flag. Changed TTL a couple of times and Vector properly reloaded and kept state, observed by seeing cached output data, instead of newly generated.

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details on the dd-rust-license-tool.

Sponsored by Quad9

@esensar esensar requested a review from a team as a code owner June 1, 2026 12:51
@github-actions github-actions Bot added the domain: topology Anything related to Vector's topology code label Jun 1, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6583fd70e4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/topology/running.rs Outdated
Comment thread src/topology/builder.rs Outdated
@pront

pront commented Jun 1, 2026

Copy link
Copy Markdown
Member

Thanks @esensar! Per our new policy I will come back to this once codex comments are resolved.

@esensar esensar changed the title fix(enrichment): preserve memory enrichment table state on reload fix(enrichment tables): preserve memory enrichment table state on reload Jun 2, 2026
@esensar

esensar commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

While resolving this I found that there were quite a few typos and missed cases for reload in enrichment tables that have source/sink. I think I covered them all now.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0044cc7f03

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/topology/builder.rs Outdated

@pront pront left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution. Can please add a couple of test cases, e.g. take_state_preserves_data and reload_preserves_state?

Comment thread lib/vector-vrl/enrichment/src/lib.rs Outdated
Comment thread lib/vector-vrl/enrichment/src/lib.rs Outdated
Comment thread lib/vector-vrl/enrichment/src/lib.rs Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1ed0732c66

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread lib/vector-vrl/enrichment/src/test_util.rs Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f4cb301f9e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/topology/builder.rs
@esensar

esensar commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

I am adding a topology reload test to confirm this, but I am having some issues with it. For some reason tables are missing on reload, that is TableRegistry::finish_load doesn't store tables properly during the test (it works fine in runtime)

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 961e72fe3b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/enrichment_tables/memory/table.rs
Comment thread src/topology/running.rs
@esensar

esensar commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

I am adding a topology reload test to confirm this, but I am having some issues with it. For some reason tables are missing on reload, that is TableRegistry::finish_load doesn't store tables properly during the test (it works fine in runtime)

I have added the test - it works when run individually, but fails when run together with others - I assume because ENRICHMENT_TABLES is static and shared between tests. Not sure how to best resolve this...

@esensar esensar requested a review from pront June 8, 2026 15:27
@pront

pront commented Jun 12, 2026

Copy link
Copy Markdown
Member

I am adding a topology reload test to confirm this, but I am having some issues with it. For some reason tables are missing on reload, that is TableRegistry::finish_load doesn't store tables properly during the test (it works fine in runtime)

I have added the test - it works when run individually, but fails when run together with others - I assume because ENRICHMENT_TABLES is static and shared between tests. Not sure how to best resolve this...

@esensar no worries, I will take a closer look and make suggestions.

@pront

pront commented Jun 16, 2026

Copy link
Copy Markdown
Member

Updated this branch with a fix. Can you try the tests now?

@esensar

esensar commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

Updated this branch with a fix. Can you try the tests now?

Thanks, that fixed it!

@pront pront left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit:

This makes memory table state preservation across reload unconditional when the table key remains the same. I can imagine users who treat reload as a cache invalidation point after changing the table’s producing logic or cached value schema.

This is currently breaking the existing reload behavior. Thoughts on adding an explicit reload behavior config option? Or preserving the current behavior and making this opt-in?

@esensar

esensar commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Edit:

This makes memory table state preservation across reload unconditional when the table key remains the same. I can imagine users who treat reload as a cache invalidation point after changing the table’s producing logic or cached value schema.

This is currently breaking the existing reload behavior. Thoughts on adding an explicit reload behavior config option? Or preserving the current behavior and making this opt-in?

Okay, that makes sense. Maybe a flag for state preservation on reload, or like you said reload behavior with just these 2 options for now so that we can expand it in the future if some other interesting behavior comes up.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 12939a04e6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/topology/running.rs
Comment on lines +472 to 476
.chain(to_change_table_sources.iter())
{
debug!(component_id = %key, "Changing source.");

self.remove_outputs(key);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reattach consumers of changed table sources

When a memory enrichment table source is changed, this now removes and rebuilds that source output, but unchanged downstream transforms/sinks are never reattached because reattach_severed_inputs only looks at diff.sources.to_change and diff.transforms.to_change. If only the memory table config changes while a sink still reads from the same source_key, the reload succeeds but the sink is left disconnected from the rebuilt fanout, so subsequent exported records are dropped before reaching it.

Useful? React with 👍 / 👎.

Comment thread src/topology/builder.rs
Comment on lines +201 to +206
let mut prev_state = None;
if !self.diff.enrichment_tables.is_added(name)
&& table_outer.inner.wants_previous_state()
{
prev_state = ENRICHMENT_TABLES.extract_state(&table_name);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep memory state when a reload is rolled back

Because load_enrichment_tables swaps ENRICHMENT_TABLES during the tentative build before healthchecks/build errors can roll the reload back, skipping extract_state for the default ClearState path can destroy the current memory table even when the new config is rejected. For example, a reload that changes a memory table and also has an invalid sink builds an empty table here, fails later, then restores the old config from that already-cleared registry, so accepted records disappear despite the failed reload.

Useful? React with 👍 / 👎.

Comment on lines 405 to +406
fn needs_reload(&self) -> bool {
false
true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Rebuild the memory sink when forcing reloads

With the default ClearState behavior, returning true here makes every config reload rebuild the registry's memory table even when the table itself is unchanged, but diff.enrichment_tables does not mark the table sink changed in that case. After an unrelated reload, transforms read from the new empty table while the still-running memory sink keeps writing to the old instance, so all subsequent inserts become invisible until the table is explicitly changed or Vector restarts.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: topology Anything related to Vector's topology code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants