Skip to content

Adds configuration file for sidecar proxy#683

Open
DhritiShikhar wants to merge 1 commit intollm-d:mainfrom
DhritiShikhar:issue-634
Open

Adds configuration file for sidecar proxy#683
DhritiShikhar wants to merge 1 commit intollm-d:mainfrom
DhritiShikhar:issue-634

Conversation

@DhritiShikhar
Copy link
Copy Markdown

@DhritiShikhar DhritiShikhar commented Mar 6, 2026

Resolves

  1. Add configuration file for sidecar proxy #634

Changes

  1. Adds ability to configure sidecar proxy through YAML.

  2. YAML configuration can be provided via command line flags --configuration and/or --configuration-file.

  3. YAML configuration provided via --configuration is given higher priority.

  4. --configuration enables sidecar configuration YAML to be provided as inline specification.
    Example: --configuration={port: 8085, vllm-port: 8203, connector: sglang}.

  5. --configuration-file enables sidecar configuration YAML to be provided as file.
    Example: --configuration-file=/etc/config/sidecar-config.yaml.

port: 8083
vllm-port: 8201
connector: "sglang"
secure-proxy: false
  1. Configuration file is mounted as file at /etc/config/sidecar-config.yaml.

  2. Sidecar configuration priority (as specified here):

    • Command-line flags (highest precedence)
    • Environment variables
    • Inline specification
    • Configuration file
    • Built-in defaults (lowest precedence)
  3. Sidecar configuration key-value pairs are set from different sources - built-in, individual flags --port, file --config-file, inline YAML --config. If a key is missing from one source, it picks up from a different source (if it is provided and based on priority). So, the final config can be a mix of settings from different sources.

  4. Logs contain:

  • final sidecar configuration
  • where the configuration value is sourced from - default, environment variable, flag, inline specification, file

Tests

This PR tests following cases:

Case 1: No sidecar configuration provided by user i.e. default configuration is used.

Case 2: Configuration provided individually through flags (e.g --port, --vllm-port). Configuration from flags over-ride default configuration.

Case 3: Configuration provided through environment variables: INFERENCE_POOL, INFERENCE_POOL_NAMESPACE and INFERENCE_POOL_NAME. INFERENCE_POOL has higher priority over INFERENCE_POOL_NAMESPACE and INFERENCE_POOL_NAME`. Configuration from environment variables over-ride default configuration.

Case 4: YAML configuration provided through inline specification --configuration. Configuration from YAML inline specification over-ride default configuration.

Case 5: YAML configuration provided through file --configuration-file. Configuration from YAML file over-ride default configuration.

Case 6: Case 2 + Case 3: Configuration provided through individual flags + environment variables. Configuration from individual flags over-ride configuration from environment variable.

Case 7: Configuration provided through environment variables: INFERENCE_POOL, INFERENCE_POOL_NAMESPACE and INFERENCE_POOL_NAME. Environment variable INFERENCE_POOLhas higher priority over individual flags--inference-pool-namespaceand--inference-pool-name`.

Case 8: Case 2 + Case 4: Configuration provided through individual flags + YAML inline specification. Configuration from individual flags over-ride configuration from inline specification.

Case 9: Case 2 + Case 5: Configuration provided through individual flags + file. Configuration from individual flags over-ride configuration from file.

Case 10: Case 3 + Case 4: Configuration provided through environment variable + YAML inline specification. Configuration from environment variable over-rides configuration from inline specification. Rest of the configuration is picked up from YAML inline specification.

Case 11: Case 3 + case 4: Configuration provided through environment variable + file. Configuration from environment variable over-rides configuration from inline specification. Rest of the configuration is picked up from file.

Case 12: Case 4 + Case 5: Configuration provided through YAML inline specification + file. YAML inline specification over-rides values from file.

Case 13: Case 2 + Case 4 + Case 5: Configuration provided through individual flags + YAML inline specification + file. Individual flags have highest priority, then inline specification, then file.

Case 14: Case 3 + Case 4 + Case 5 i.e. configuration provided through environment variable + YAML inline specification + file. Configuration from environment variable over-rides configuration from inline specification. Configuration from inline specification over-rides configuration from file.

Case 15: Case 2 + Case 3 + Case 4 + Case 5 i.e. configuration provided through individual flags + environment variable + YAML inline specification + file. Configuration from Individual flags over-ride configuration from everything else.

Case 16: Case 2 + Case 3 + Case 4 + Case 5 i.e. configuration provided through individual flags + environment variable + YAML inline specification + file. Configuration is picked up from different sources based on priority.

Case 17: Invalid YAML configuration provided through inline specification --configuration. Error is expected.

Case 18: Invalid YAML configuration provided through file --configuration-file. Error is expected.

Case 19: Recursive YAML configuration through inline YAML configuration --configuration. Recursive YAML configuration through inline specification is not considered. No error. Default configuration is used.

Case 20: Recursive YAML configuration through file --configuration-file. Recursive YAML configuration through file is not considered. No error. Default configuration is used.

Case 21: Bogus key-value pair but valid YAML from inline specification. Bogus pair through inline specification is ignored. No error. Configuration should be picked up based on priority. In this case, default configuration is used.

Case 22: Bogus key-value pair but valid YAML from file. Bogus pair through file is ignored. No error. Configuration should be picked up based on priority. In this case, default configuration is used.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 6, 2026

🚨 Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation.

@github-actions github-actions bot requested review from kfswain and nilig March 6, 2026 10:41
@DhritiShikhar DhritiShikhar force-pushed the issue-634 branch 2 times, most recently from e6af92b to 2529a82 Compare March 6, 2026 10:52
Copy link
Copy Markdown
Collaborator

@elevran elevran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @DhritiShikhar

A couple of high level comments in addition to feedback directly on code:

  • we allow deprecation of user facing API/conifguration for at least one release cycle. Thus the PR should be adding configuration support without removing command line flags. In fact, we may want to keep both as options for the future.
  • YAML files are missing newline at the end of file.

@DhritiShikhar DhritiShikhar force-pushed the issue-634 branch 17 times, most recently from 42807eb to 98654d0 Compare March 25, 2026 12:39
@elevran
Copy link
Copy Markdown
Collaborator

elevran commented Mar 26, 2026

@DhritiShikhar please ensure you have a signing key uploaded to github for verifcation (note that when adding the key, select the type as signing and not authentication).

  • Commits should be signed (-s: Signed-Off-By) and cryptographically verifiable (-S flag). You should ammend previous commits or force push. The make presubmit target should give you local hints before pushing for confirmation.
  • Resolve failing tests (typos and lint - please do not bypass linter with comments).
  • Please also resolve conflicts and rebase on the latest.

}
switch {
case configurationMap1 != nil && configurationMap2 != nil:
opts.updateSidecarConfiguration(mergeYAMLConfigurations(configurationMap2, configurationMap1))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three call sites ignore the returned error. updateSidecarConfiguration returns an error on type-assertion failures (e.g. port: "not-a-number"). Curently these errors are dropped and the process continues with partial/corrupt configuration.

Each call site must propagate the error up.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

return errors.New("Type assertion failed for decoder-use-tls: " + fmt.Sprintf("%v", configurationMap["decoder-use-tls"]))
}
}
if configurationMap["tls-insecure-skip-verify"] != nil {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flag --tls-insecure-skip-verify is registered as a StringSlice (list of stage names). Here it is treated as bool?
Note that the Options/Configuration structures been cleaned up considerably on main and this PR shuold be rebased.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixed. DONE

return errors.New("Type assertion failed for ec-connector: " + fmt.Sprintf("%v", configurationMap["ec-connector"]))
}
}
if configurationMap["enable-tls"] != nil {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other field check opts.isDefault(flagName) before applying the YAML value. enable-tls unconditionally appends to opts.EnableTLS. If a user sets --enable-tls=decoder on the CLI and also has enable-tls: [decoder, encoder] in YAML, both are applied.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixed. DONE

If a user sets --enable-tls=decoder on the CLI and also has enable-tls: [decoder, encoder] in YAML, both are NOT applied.

Individual flag is given precedence, then inline YAML specification, then file YAML.

func YAMLConfigurationFromInlineSpecification(config string) (map[string]any, error) {
var temp map[string]any
if err := yaml.Unmarshal([]byte(config), &temp); err != nil {
return nil, errors.New("Failed to unmarshal sidecar configuration")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both YAMLConfigurationFromInlineSpecification (here) and YAMLConfigurationFromFile wrap errors with a fixed string, losing the underlying cause. Use fmt.Errorf with %w so the operator is aware of the underlying cause.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

// extractYAMLConfiguration extracts sidecar configuration (if provided)
// from `--configuration` and `--configuration-file` parameters
func (opts *Options) extractYAMLConfiguration(configuration string, configurationFile string) error {
var configurationMap1, configurationMap2 configurationMap
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-descriptive variable names make the merge logic hard to follow.
I think configurationMap1 = inline spec, configurationMap2 = file, so consider rename to inlineConfig / fileConfig to make code self-documenting.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

port: 8083
vllm-port: 8201
data-parallel-size: 5
kv-connector: "sglang"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate key data-parallel-size in test YAML fixture - behavior is undefined/misleading.
The sigs.k8s.io/yaml parsing converts to JSON first; last-key-wins semantics means this resolves to 6, yet fileDataParallelSize = 5 and case 4 asserts that value.

Remove the duplicate. If 6 was the intended value, update fileDataParallelSize to match.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

fileInferencePool := "pool3"
enableTLSDecode := []string{decodeStage}
enableTLSPrefill := []string{prefillStage}
configuration := "{port: 8200, vllm-port: 7200, kv-connector: sglang, enableTLS: 'prefiller,decoder'}"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enableTLS (camelCase) key in inline config is silently ignored. The TLS assertions for cases 3/5/7/8 are never actually verified since updateSidecarConfiguration looks for the key enable-tls (kebab-case). The key enableTLS does not match, so TLS is never set from the inline config.

Further, the test assertions for expectedUseTLSForPrefiller/Decoder/Encoder are inside if tt.inputFlags["enable-tls"] != nil, which is false for cases 3/5/7/8, so those expectations are never checked.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this. DONE.

},
expectedPort: port,
expectedVLLMPort: vllmPort,
expectedDataParallelSize: defaultDataParallelSize,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Case 7 (inline + file) correctly expects fileDataParallelSize. Case 8 adds only port and vllm-port CLI flags on top and neither affects data-parallel-size. Yet case 8 expects defaultDataParallelSize (1) instead of fileDataParallelSize.
This looks like a copy-paste error?

Copy link
Copy Markdown
Author

@DhritiShikhar DhritiShikhar Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out. Added a test case for this.
DONE.

"github.com/stretchr/testify/require"
)

var validSidecarfilePath, invalidSidecarfilePath string
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createSidecarConfigurationYAMLFiles writes package-level globals, so any two tests calling it concurrently would race. It would be safer to return the paths from the helper instead.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this. DONE.

case *string:
require.NoError(t, testFlagSet.Set(k, *v))
case *[]string:
require.NoError(t, testFlagSet.Set(k, strings.Join(*v, " ")))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*[]string flags joined with space instead of comma. The pflag package will treat the whole string as one value. Since pflag's StringSlice splits on commas, change to strings.Join(*v, ",").

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this. DONE.

@DhritiShikhar DhritiShikhar force-pushed the issue-634 branch 4 times, most recently from 49f5e26 to 6862b48 Compare April 7, 2026 14:07
elevran pushed a commit to elevran/llm-d-inference-scheduler that referenced this pull request Apr 8, 2026
* Add unit test coverage for pod APIs under datastore/pkg

* Add unit test coverage for pod APIs under datastore/pkg

* Add unit test coverage for pod APIs under datastore/pkg

* Add unit test coverage for pod APIs under datastore/pkg

* EPP Architecture proposal (llm-d#683)

* initial changes

* Adding to proposal to give a quick barebones definition to refactor

* feedback changes

* more feedback addressing

* removed unused Fake struct (llm-d#723)

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* epp: return correct response for trailers (llm-d#726)

This looks like a copy paste error.

* Refactor scheduler to run plugins (llm-d#677)

* Refactor scheduler to run plugins

* Add scheduler plugin latency metric

* Address comments

* Address comments

* Complete the InferencePool documentation (llm-d#673)

* Initial guide for inference pool

* Add extensionReference to the InferencePool spec

* Fix list formatting

* Remove unused labels

* Autogenerate the spec

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <rob.scott87@gmail.com>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <rob.scott87@gmail.com>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <rob.scott87@gmail.com>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <rob.scott87@gmail.com>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <rob.scott87@gmail.com>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <rob.scott87@gmail.com>

* Rename llm-pool names in rollout example

* Add use cases for replacing an inference pool

* Rewording the background section

* Create replacing-inference-pool.md

* Replace instructions with a link for how to replace an inference pool

* Update replacing-inference-pool.md

* Update mkdocs.yml

* Update replacing-inference-pool.md

* Update inferencemodel_types.go

* Update inferencepool.md

* Update site-src/guides/replacing-inference-pool.md

Co-authored-by: Rob Scott <rob.scott87@gmail.com>

---------

Co-authored-by: Rob Scott <rob.scott87@gmail.com>

* reduce log level in metrics logger not to trash the log (llm-d#708)

* reduce log level in metrics logger not to trash the log

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* rename flush metrics to refresh metrics

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* revert log level

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* few updates in datastore (llm-d#713)

* few updates in datastore

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* PoolSet documentation

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* error phrasing

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* removed unused pool arg from PodUpdateOrAddIfNotExist

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* linter

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* scheduler refactoring (llm-d#730)

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* filter irrelevant pod in pod_reconciler (llm-d#696)

* EPP: Update GetRandomPod() to return nil if no pods exist (llm-d#731)

Signed-off-by: Daneyon Hansen <daneyon.hansen@solo.io>

* Move filter and scorer plugins registration to a separate file (llm-d#729)

* Move filters and scorers registration to filter/scorer specific files

* Default scheduler config contains empty list of scorers

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Default plugin is not a scorer any more

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* fix scheduler test + lint comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

---------

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

* Update issue templates (llm-d#738)

* Update issue templates

* Updates artifacts for v0.3.0-rc.1 release

Signed-off-by: Kellen Swain <kfswain@google.com>

* Updates bbr chart for v0.3.0-rc.1 release

Signed-off-by: Kellen Swain <kfswain@google.com>

* Updates artifacts for v0.3.0 release

Signed-off-by: Kellen Swain <kfswain@google.com>

* Adding blank issue template so that all issues start with  label

---------

Signed-off-by: Kellen Swain <kfswain@google.com>

* Add unit test coverage for pod APIs under datastore/pkg

* few updates in datastore (llm-d#713)

* few updates in datastore

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* PoolSet documentation

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* error phrasing

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* removed unused pool arg from PodUpdateOrAddIfNotExist

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* linter

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* few updates in datastore (llm-d#713)

* few updates in datastore

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* PoolSet documentation

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* error phrasing

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* removed unused pool arg from PodUpdateOrAddIfNotExist

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* linter

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* Add unit test coverage for pod APIs under datastore/pkg

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Signed-off-by: Daneyon Hansen <daneyon.hansen@solo.io>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Kellen Swain <kfswain@google.com>
Co-authored-by: Kellen Swain <kfswain@google.com>
Co-authored-by: Nir Rozenbaum <nirro@il.ibm.com>
Co-authored-by: John Howard <john.howard@solo.io>
Co-authored-by: Cong Liu <conliu@google.com>
Co-authored-by: Nicole Xin <nxin@google.com>
Co-authored-by: Rob Scott <rob.scott87@gmail.com>
Co-authored-by: nayihz <smartczy@outlook.com>
Co-authored-by: Daneyon Hansen <daneyon.hansen@solo.io>
Co-authored-by: Maya Barnea <mayab@il.ibm.com>
@DhritiShikhar DhritiShikhar force-pushed the issue-634 branch 3 times, most recently from 6a62543 to 82e108f Compare April 8, 2026 09:15
Signed-off-by: Dhriti Shikhar <dhriti.shikhar.rokz@gmail.com>
Copy link
Copy Markdown
Collaborator

@elevran elevran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DhritiShikhar!

Good progress addressing the previous round of comments. Most prior feedback is resolved, but a few items remain unaddressed, and the new implementation has grown considerably larger than initially expected. See inline comments for specifics.

I think it would be ok to simplfy by:

  1. drop provenance (tracking from where each value is set)
  2. allow only one of inline config or file (not both) and
  3. simplify so that CLI > config file/string > defaults (drop ENV, do not allow both direct string and file based YAML)

"github.com/spf13/pflag"
uberzap "go.uber.org/zap"
"go.uber.org/zap/zapcore"
"gopkg.in/yaml.v3"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use sigs.k8s.io/yaml (already a transitive dependency of the project).
Note that gopkg.in/yaml.v3 repository was archived by the owner on Apr 1, 2025. It is now read-only and no longer maintained.

loggingOptions zap.Options // loggingOptions holds the zap logging configuration
}
loggingOptions zap.Options // loggingOptions holds the zap logging configuration
FlagSet *pflag.FlagSet
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exporting FlagSet *pflag.FlagSet leaks implementation detail. Since FlagSet is only set to allow isParsed to work, the cleanest fix is to pass it through AddFlags:

func (opts *Options) AddFlags(fs *pflag.FlagSet) {
    opts.flagSet = fs   // store as an unexported reference
    // ... rest unchanged
}

Then isParsed can use opts.flagSet. The main.go assignment opts.FlagSet = pflag.CommandLine and the test assignment can both go away.

// "--configuration={port: 8085, vllm-port: 8203}"
func (opts *Options) YAMLConfigurationFromInlineSpecification() (map[string]any, error) {
var temp map[string]any
if err := yaml.Unmarshal([]byte(opts.configurationFromInlineSpecification), &temp); err != nil {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error from yaml.Unmarshal is discarded, losing the actual parse failure message.
You should return a wrapped error instead:

// wrap with %w so the cause propagates:
return nil, fmt.Errorf("failed to unmarshal inline sidecar configuration: %w", err)

YAMLConfigurationFromFile was correctly fixed with %w; this function was not.

if err := yaml.Unmarshal([]byte(opts.configurationFromInlineSpecification), &temp); err != nil {
return nil, errors.New("failed to unmarshal sidecar configuration")
}
for key := range temp {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This colon-splitting loop is dead code in practice and potentially harmful.

After yaml.Unmarshal, keys in the resulting map[string]any are plain strings without embedded colons, as the YAML parser handles all syntax. A key like "port: 8085" would only appear if the raw YAML string was not valid YAML and the library returned it as a literal string key, which would already have caused an unmarshal error above.
BTW, if the loop runs on a valid YAML key that legitimately contains a colon (e.g., "http://something"), it would silently corrupt the key.
Please remove this block entirely.

return err
}
}
switch {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The three switch arms are nearly identical and can be collapsed to ~15 lines, eliminating mergeYAMLConfigurations entirely.

Suggested replacement for the entire extractYAMLConfiguration body:

func (opts *Options) extractYAMLConfiguration() error {
    var fileCfg, inlineCfg configurationMap

    if opts.configurationFromFile != "" {
        var err error
        fileCfg, err = opts.YAMLConfigurationFromFile()
        if err != nil {
            return err
        }
    }
    if opts.configurationFromInlineSpecification != "" {
        var err error
        inlineCfg, err = opts.YAMLConfigurationFromInlineSpecification()
        if err != nil {
            return err
        }
    }

    // Merge: file is the base; inline overrides with higher priority.
    merged := make(configurationMap)
    for k, v := range fileCfg {
        merged[k] = v
        appendItem(&opts.ConfigurationState.FromFile, k)
    }
    for k, v := range inlineCfg {
        merged[k] = v // inline wins
        appendItem(&opts.ConfigurationState.FromInline, k)
        removeItems(&opts.ConfigurationState.FromFile, []string{k})
    }

    if len(merged) == 0 {
        return nil
    }
    if err := opts.updateSidecarConfiguration(merged); err != nil {
        return err
    }

    // Higher-priority sources shadow lower ones in the state log.
    removeDuplicates(&opts.ConfigurationState.FromFile, opts.ConfigurationState.FromFlags)
    removeDuplicates(&opts.ConfigurationState.FromFile, opts.ConfigurationState.FromEnv)
    removeDuplicates(&opts.ConfigurationState.FromInline, opts.ConfigurationState.FromFlags)
    removeDuplicates(&opts.ConfigurationState.FromInline, opts.ConfigurationState.FromEnv)
    removeDuplicates(&opts.ConfigurationState.Defaults, opts.ConfigurationState.FromInline)
    removeDuplicates(&opts.ConfigurationState.Defaults, opts.ConfigurationState.FromFile)
    return nil
}

This also removes the need for mergeYAMLConfigurations (delete that function).

// 1. Default values
// 2. Environment variables
// 3. Individual flags
func (opts *Options) GetConfigurationState() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetConfigurationState and extractYAMLConfiguration share responsibility for populating ConfigurationState, but are called at different points in main.go, creating a fragile ordering contract.

GetConfigurationState (called before Complete in main.go) handles Defaults/Flags/Env. Then Complete calls extractYAMLConfiguration handles FromInline/FromFile and also mutates Defaults/Flags/Env via removeDuplicates.

Suggestion: fold GetConfigurationState into Complete, so there is one call that populates the full state correctly in sequence. This removes the ordering constraint from callers entirely:

func (opts *Options) Complete() error {
    // populate flags/env/defaults state first
    opts.getDefaults()
    opts.getEnvVars()
    opts.getFlags()

    // then process YAML (which can further narrow the lists)
    if err := opts.extractYAMLConfiguration(); err != nil {
        return err
    }
    // ... rest of Complete unchanged
}

Remove the exported GetConfigurationState method, and update main.go to remove the explicit call.

}

// getDefaults updates `ConfigurationState` with configuration keys which contain default values
func (opts *Options) getDefaults() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enablePrefillerSampling is missing from getDefaults()?

Every other non-env-var-only flag is listed here, but enablePrefillerSampling is absent. When the binary starts without ENABLE_PREFILLER_SAMPLING set and --enable-prefiller-sampling not passed, the flag uses a false default but won't appear in ConfigurationState.Defaults log output. Add it to the list for consistent logging.

}()
}
// GetConfigurationState calculates whether configuration is from default values, flags or environment variables
opts.GetConfigurationState()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetConfigurationState() must be called before Complete() but this constraint is invisible to callers and not checked.

The YAML configuration (inline/file) state is populated inside Complete() by calling extractYAMLConfiguration(), which then calls removeDuplicates on the Defaults/Flags/Env lists that GetConfigurationState populated. If the order is reversed (or GetConfigurationState is called twice), the state will be wrong with no error.

See the comment on options.go for the suggested fix: fold GetConfigurationState into Complete so there's a single, safe call sequence.

return
}

if len(opts.ConfigurationState.Defaults) != 0 {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are five separate logger.Info calls, each checking a different len(...) != 0 condition, is noisy at the call site and generates multiple log lines for what is conceptually a single "here is the configuration provenance" event.

Consider a single structured log call after Complete() that logs all non-empty categories together:

logger.Info("Sidecar configuration provenance",
    "defaults", opts.ConfigurationState.Defaults,
    "from_flags", opts.ConfigurationState.FromFlags,
    "from_env", opts.ConfigurationState.FromEnv,
    "from_inline", opts.ConfigurationState.FromInline,
    "from_file", opts.ConfigurationState.FromFile,
)

This also removes the per-category if len(...) != 0 guards, an empty slice renders as [] which is clear and unambiguous in structured logs.
Side note: might be too much information. Might be sufficient to only log the final configuration, regadless of source.

// 1. new flag set for test
// 2. new Options struct initialized with default values
func newTestOptions(t *testing.T) (*Options, *pflag.FlagSet) {
flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ContinueOnError)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newTestOptions resets the global flag.CommandLine:

flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ContinueOnError)

This is a global mutation. If tests using newTestOptions ever run in parallel (or if any other test in the process relies on flag.CommandLine), this causes a data race and unpredictable failures. If t.Parallel() is ever added to this test or a sibling test, this will break silently.

Use flag.NewFlagSet only for the local test scope and don't assign it to flag.CommandLine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants