Skip to content

Conversation

@orestisfl
Copy link
Contributor

@orestisfl orestisfl commented Oct 27, 2025

Proposal: Lazy Initialization of the Cache Processor's File Store

The Problem

The basic problem is that processors often use paths.Resolve to find directories like "data" or "logs". This function uses a global variable for the base path, which is fine when a Beat runs as a standalone process.

But when a Beat is embedded as a receiver (e.g., fbreceiver in the OTel Collector), this global causes problems. Each receiver needs its own isolated state directory, and a single global path prevents this.

The cache processor currently tries to set up its file-based store in its New function, which is too early. It only has access to the global path, not the receiver-specific path that gets configured later.

The Solution

My solution is to initialize the cache's file store lazily.

Instead of creating the store in cache.New, I've added a SetPaths(*paths.Path) method to the processor. This method creates the file store and is wrapped in a sync.Once to make sure it only runs once. The processor's internal store object stays nil until SetPaths is called during pipeline construction.

How it Works

The path info gets passed down when a client connects to the pipeline. Here's the flow:

  1. x-pack/filebeat/fbreceiver: createReceiver instantiates the processors (including cache with a nil store) and calls instance.NewBeatForReceiver.
  2. x-pack/libbeat/cmd/instance: NewBeatForReceiver creates the paths.Path object from the receiver's specific configuration.
  3. libbeat/publisher/pipeline: This paths.Path object is passed into the pipeline. When a client connects, the path is added to the beat.ProcessingConfig.
  4. libbeat/publisher/processing: The processing builder gets this config and calls group.SetPaths, which passes the path down to each processor.
  5. libbeat/processors/cache: SetPaths is finally called on the cache processor instance, and the sync.Once guard ensures the file store is created with the correct path.

Diagram

graph TD
    subgraph "libbeat/processors/cache (init)"
        A["init()"]
    end
    subgraph "libbeat/processors"
        B["processors.RegisterPlugin"]
        C{"registry"}
    end
    A --> B;
    B -- "Save factory" --> C;

    subgraph "x-pack/filebeat/fbreceiver"
        D["createReceiver"]
    end

    subgraph "libbeat/processors"
         E["processors.New(config)"]
         C -. "Lookup 'cache'" .-> E;
    end
    D --> E;
    D --> I;
    E --> G;

    subgraph "libbeat/processors/cache"
        G["cache.New()"] -- store=nil --> H{"cache"};
    end

    subgraph "x-pack/libbeat/cmd/instance"
        I["instance.NewBeatForReceiver"];
        I --> J{"paths.Path object"};
    end

    subgraph "libbeat/publisher/pipeline"
        J --> K["pipeline.New"];
        K --> L["ConnectWith"];
    end

    subgraph "libbeat/publisher/processing"
        L -- "Config w/ paths" --> N["builder.Create"];
        N --> O["group.SetPaths"];
    end

    subgraph "libbeat/processors/cache"
        O --> P["cache.SetPaths"];
        P --> Q["sync.Once"];
        Q -- "initialize store" --> H;
    end
Loading

Pros and Cons of This Approach

  • Pros:
    • It's a minimal, targeted change that solves the immediate problem.
    • It avoids a large-scale, breaking refactoring of all processors.
    • It maintains backward compatibility for existing processors and downstream consumers of libbeat.
  • Cons:
    • Using a type assertion for the setPaths interface feels a bit like magic, since the behavior changes at runtime depending on whether a processor implements it.

Alternatives Considered

Option 1: Add a paths argument to all processor constructors

  • Pros:
    • Simple and direct.
  • Cons:
    • Requires a global refactoring of all processors.
    • Breaks external downstream libbeat importers like Cloudbeat.
    • The paths argument is not needed in many processors, so adding a rarely used option to the function signature is verbose.

Option 2: Refactor processors to introduce a "V2" interface

  • Pros:
    • Allows for a new, backwards-compatible signature (e.g., using a config struct).
    • This can still be done later.
    • We can support both V1 processors and gradually move processors to V2.
  • Cons:
    • Needs a significant refactoring effort.

Proposed commit message

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works. Where relevant, I have used the stresstest.sh script to run them under stress conditions and race detector to verify their stability.
  • I have added an entry in ./changelog/fragments using the changelog tool.

Disruptive User Impact

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

@orestisfl orestisfl self-assigned this Oct 27, 2025
@orestisfl orestisfl added enhancement backport-skip Skip notification from the automated backport with mergify Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team Draft labels Oct 27, 2025
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Oct 27, 2025
@github-actions
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@leehinman
Copy link
Contributor

One other idea I had was to stop registering the processors in the init function. And move that to something that is done inside beat configureafter the paths are initialized. For most processors we would just add the existing Constructor, but for ones that need a Path we could wrap them in a function that is a closure with the path set internally.

This has the advantage of getting rid of calls to init which slows down startup but it would mean we need a registry of processors per beat. It is definitely more invasive, but it does make the beat more independent. If we come across a second or third thing that needs to be unique among processors it would make adding those unique things easier.


// Run enriches the given event with the host metadata.
func (p *cache) Run(event *beat.Event) (*beat.Event, error) {
p.SetPaths(paths.Paths) // set default if paths is not initialized
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to do this? This means if we get the order wrong we default to all the beats in the process using the same paths which is wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-skip Skip notification from the automated backport with mergify discuss Issue needs further discussion. Draft enhancement skip-changelog Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[beatreceiver] replace global paths in cache processor

3 participants