[cache-processor] WIP: SetPaths draft #47353
                
     Draft
            
            
          
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Proposal: Lazy Initialization of the Cache Processor's File Store
The Problem
The basic problem is that processors often use
paths.Resolveto find directories like "data" or "logs". This function uses a global variable for the base path, which is fine when a Beat runs as a standalone process.But when a Beat is embedded as a receiver (e.g.,
fbreceiverin the OTel Collector), this global causes problems. Each receiver needs its own isolated state directory, and a single global path prevents this.The
cacheprocessor currently tries to set up its file-based store in itsNewfunction, which is too early. It only has access to the global path, not the receiver-specific path that gets configured later.The Solution
My solution is to initialize the cache's file store lazily.
Instead of creating the store in
cache.New, I've added aSetPaths(*paths.Path)method to the processor. This method creates the file store and is wrapped in async.Onceto make sure it only runs once. The processor's internal store object staysniluntilSetPathsis called during pipeline construction.How it Works
The path info gets passed down when a client connects to the pipeline. Here's the flow:
x-pack/filebeat/fbreceiver:createReceiverinstantiates the processors (includingcachewith anilstore) and callsinstance.NewBeatForReceiver.x-pack/libbeat/cmd/instance:NewBeatForReceivercreates thepaths.Pathobject from the receiver's specific configuration.libbeat/publisher/pipeline: Thispaths.Pathobject is passed into the pipeline. When a client connects, the path is added to thebeat.ProcessingConfig.libbeat/publisher/processing: The processing builder gets this config and callsgroup.SetPaths, which passes the path down to each processor.libbeat/processors/cache:SetPathsis finally called on the cache processor instance, and thesync.Onceguard ensures the file store is created with the correct path.Diagram
graph TD subgraph "libbeat/processors/cache (init)" A["init()"] end subgraph "libbeat/processors" B["processors.RegisterPlugin"] C{"registry"} end A --> B; B -- "Save factory" --> C; subgraph "x-pack/filebeat/fbreceiver" D["createReceiver"] end subgraph "libbeat/processors" E["processors.New(config)"] C -. "Lookup 'cache'" .-> E; end D --> E; D --> I; E --> G; subgraph "libbeat/processors/cache" G["cache.New()"] -- store=nil --> H{"cache"}; end subgraph "x-pack/libbeat/cmd/instance" I["instance.NewBeatForReceiver"]; I --> J{"paths.Path object"}; end subgraph "libbeat/publisher/pipeline" J --> K["pipeline.New"]; K --> L["ConnectWith"]; end subgraph "libbeat/publisher/processing" L -- "Config w/ paths" --> N["builder.Create"]; N --> O["group.SetPaths"]; end subgraph "libbeat/processors/cache" O --> P["cache.SetPaths"]; P --> Q["sync.Once"]; Q -- "initialize store" --> H; endPros and Cons of This Approach
libbeat.setPathsinterface feels a bit like magic, since the behavior changes at runtime depending on whether a processor implements it.Alternatives Considered
Option 1: Add a
pathsargument to all processor constructorspathsargument is not needed in many processors, so adding a rarely used option to the function signature is verbose.Option 2: Refactor
processorsto introduce a "V2" interfaceProposed commit message
Checklist
stresstest.shscript to run them under stress conditions and race detector to verify their stability../changelog/fragmentsusing the changelog tool.Disruptive User Impact
Author's Checklist
How to test this PR locally
Related issues
Use cases
Screenshots
Logs