Skip to content

feature: Optimize the placeholder resolution logic in plugin configuration#3383

Open
AYue-94 wants to merge 1 commit intoalibaba:mainfrom
AYue-94:feature/ingress_tpl
Open

feature: Optimize the placeholder resolution logic in plugin configuration#3383
AYue-94 wants to merge 1 commit intoalibaba:mainfrom
AYue-94:feature/ingress_tpl

Conversation

@AYue-94
Copy link

@AYue-94 AYue-94 commented Jan 22, 2026

Ⅰ. Describe what this PR did

When the kubernetes secret cannot be found, keep ${} unchanged, and monitor secret changes normally.

Ⅱ. Does this pull request fix one issue?

#3382

Ⅲ. Why don't you add test cases (unit test/integration test)?

Ⅳ. Describe how to verify it

unit test

Ⅴ. Special notes for reviews

Ⅵ. AI Coding Tool Usage Checklist (if applicable)

Please check all applicable items:

  • For new standalone features (e.g., new wasm plugin or golang-filter plugin):

    • I have created a design/ directory in the plugin folder
    • I have added the design document to the design/ directory
    • I have included the AI Coding summary below
  • For regular updates/changes (not new plugins):

    • I have provided the prompts/instructions I gave to the AI Coding tool below
    • I have included the AI Coding summary below

AI Coding Prompts (for regular updates)

AI Coding Summary

…ation(alibaba#3382)

When the kubernetes secret cannot be found, keep the placeholder unchanged

Signed-off-by: ayue <ericyu0421@163.com>
@CLAassistant
Copy link

CLAassistant commented Jan 22, 2026

CLA assistant check
All committers have signed the CLA.

@lingma-agents
Copy link

lingma-agents bot commented Jan 22, 2026

🔍 代码审查进行中

⏳ 正在审查

⏰️ 剩余时间:约需数分钟

🔄 分支流向: feature/ingress_tplmain

📦 提交: 审查当前PR从a2eb599c34daf7的提交。


📒 文件清单 (2 个文件)
📝 变更: 2 个文件

📝 变更文件:

  • pkg/ingress/config/ingress_template.go
  • pkg/ingress/config/ingress_template_test.go

Copy link
Collaborator

@johnlanni johnlanni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不建议这样修改,这是把控制面风险变成数据面风险,例如因为secret误删除,导致下发错误的配置,从而产生403。
建议在配置下发流程上做管理,控制链路提前确保对应的配置项存在。结合日志观测,在出现相应错误时及时介入。

@AYue-94
Copy link
Author

AYue-94 commented Jan 26, 2026

问题在于,如果单个secret配置错误,会导致整个插件的占位符不进行替换,瞬间导致整个ai-proxy或key-auth相关的路由全部报错。这个影响范围太大了,虽然可以在配的时候校验存在,但正如你所说,如果secret误删除,也会导致整个插件占位符不替换


The problem is that if a single secret is configured incorrectly, the placeholders of the entire plug-in will not be replaced, causing all ai-proxy or key-auth-related routes to report errors in an instant. The scope of this influence is too large. Although the existence can be verified during configuration, as you said, if the secret is accidentally deleted, it will also cause the entire plug-in placeholder to not be replaced

@AYue-94
Copy link
Author

AYue-94 commented Jan 26, 2026

@johnlanni 举个例子,因为配了一个不存在的占位符,下面这种情况,整个ai-proxy插件里的占位符就不替换了,期望占位符正确的应该能正常替换。

又或者原来配了n个对的,误操作删了1个,也会造成整个插件的占位符不替换。

Step1,正确配置ai代理,使用存在的secret${secret.higress-system/llm.aliyun-token}

$ kubectl get wasmplugin ai-proxy.internal -n higress-system -oyaml
apiVersion: extensions.higress.io/v1alpha1
kind: WasmPlugin
metadata:
  name: ai-proxy.internal
  namespace: higress-system
spec:
  defaultConfig:
    providers:
    - apiTokens:
      - ${secret.higress-system/llm.aliyun-token}
      failover:
        enabled: false
      id: aliyun
      openaiCustomUrl: https://dashscope.aliyuncs.com/compatible-mode/v1
      openaiExtraCustomUrls: []
      retryOnFailure:
        enabled: false
      type: openai
  defaultConfigDisable: false
  failStrategy: FAIL_OPEN
  imagePullPolicy: UNSPECIFIED_POLICY
  matchRules:
  - config:
      activeProviderId: aliyun
    configDisable: false
    service:
    - llm-aliyun.internal.dns
  phase: UNSPECIFIED_PHASE
  priority: 100
  url: oci://higress-registry.cn-hangzhou.cr.aliyuncs.com/plugins/ai-proxy:2.0.0

获取envoy配置

$ kubectl exec higress-gateway-865ffb6bc4-2qhxs -n higress-system -- curl -s http://127.0.0.1:15000/config_dump | yq eval -P - > envoy.yaml
{"_rules_":[{"_match_service_":["llm-aliyun.internal.dns"],"activeProviderId":"aliyun"}],"providers":[{"apiTokens":["正确的密钥"],"failover":{"enabled":false},"id":"aliyun","openaiCustomUrl":"https://dashscope.aliyuncs.com/compatible-mode/v1","openaiExtraCustomUrls":[],"retryOnFailure":{"enabled":false},"type":"openai"}]}

Step2,增加一个ai供应商,配置不存在的secret占位符${secret.higress-system/llm.aliyun-token-invalid}

image

higress-core会有异常日志:

error    ingress    Failed to process template for config higress-system/ai-proxy.internal: failed to get secret value for higress-system/llm.aliyun-token-invalid: key aliyun-token-invalid not found in secret higress-system/llm

查看wasmplugin配置正常:

$ kubectl get wasmplugin ai-proxy.internal -n higress-system -oyaml
apiVersion: extensions.higress.io/v1alpha1
kind: WasmPlugin
metadata:
  name: ai-proxy.internal
  namespace: higress-system
spec:
  defaultConfig:
    providers:
    - apiTokens:
      - ${secret.higress-system/llm.aliyun-token}
      failover:
        enabled: false
      id: aliyun
      openaiCustomUrl: https://dashscope.aliyuncs.com/compatible-mode/v1
      openaiExtraCustomUrls: []
      retryOnFailure:
        enabled: false
      type: openai
    - apiTokens:
      - ${secret.higress-system/llm.aliyun-token-invalid}
      failover:
        enabled: false
      id: aliyun-invalid
      openaiCustomUrl: https://dashscope.aliyuncs.com/compatible-mode/v1
      openaiExtraCustomUrls: []
      retryOnFailure:
        enabled: false
      type: openai
  defaultConfigDisable: false
  failStrategy: FAIL_OPEN
  imagePullPolicy: UNSPECIFIED_POLICY
  matchRules:
  - config:
      activeProviderId: aliyun-invalid
    configDisable: false
    service:
    - llm-aliyun-invalid.internal.dns
  - config:
      activeProviderId: aliyun
    configDisable: false
    service:
    - llm-aliyun.internal.dns
  phase: UNSPECIFIED_PHASE
  priority: 100
  url: oci://higress-registry.cn-hangzhou.cr.aliyuncs.com/plugins/ai-proxy:2.0.0

查看envoy配置,全是占位符,但是期望至少让正常配置的${secret.higress-system/llm.aliyun-token}能替换:

$ kubectl exec higress-gateway-865ffb6bc4-2qhxs -n higress-system -- curl -s http://127.0.0.1:15000/config_dump | yq eval -P - > envoy.yaml

{"_rules_":[{"_match_service_":["llm-aliyun-invalid.internal.dns"],"activeProviderId":"aliyun-invalid"},{"_match_service_":["llm-aliyun.internal.dns"],"activeProviderId":"aliyun"}],"providers":[{"apiTokens":["${secret.higress-system/llm.aliyun-token}"],"failover":{"enabled":false},"id":"aliyun","openaiCustomUrl":"https://dashscope.aliyuncs.com/compatible-mode/v1","openaiExtraCustomUrls":[],"retryOnFailure":{"enabled":false},"type":"openai"},{"apiTokens":["${secret.higress-system/llm.aliyun-token-invalid}"],"failover":{"enabled":false},"id":"aliyun-invalid","openaiCustomUrl":"https://dashscope.aliyuncs.com/compatible-mode/v1","openaiExtraCustomUrls":[],"retryOnFailure":{"enabled":false},"type":"openai"}]}

@johnlanni
Copy link
Collaborator

问题在于,如果单个secret配置错误,会导致整个插件的占位符不进行替换,瞬间导致整个ai-proxy或key-auth相关的路由全部报错。这个影响范围太大了,虽然可以在配的时候校验存在,但正如你所说,如果secret误删除,也会导致整个插件占位符不替换

The problem is that if a single secret is configured incorrectly, the placeholders of the entire plug-in will not be replaced, causing all ai-proxy or key-auth-related routes to report errors in an instant. The scope of this influence is too large. Although the existence can be verified during configuration, as you said, if the secret is accidentally deleted, it will also cause the entire plug-in placeholder to not be replaced

这个不太符合预期,如果secret变量替换失败,理论上应该阻塞当前配置下发才对

@AYue-94
Copy link
Author

AYue-94 commented Jan 28, 2026

问题在于,如果单个secret配置错误,会导致整个插件的占位符不进行替换,瞬间导致整个ai-proxy或key-auth相关的路由全部报错。这个影响范围太大了,虽然可以在配的时候校验存在,但正如你所说,如果secret误删除,也会导致整个插件占位符不替换
The problem is that if a single secret is configured incorrectly, the placeholders of the entire plug-in will not be replaced, causing all ai-proxy or key-auth-related routes to report errors in an instant. The scope of this influence is too large. Although the existence can be verified during configuration, as you said, if the secret is accidentally deleted, it will also cause the entire plug-in placeholder to not be replaced

这个不太符合预期,如果secret变量替换失败,理论上应该阻塞当前配置下发才对

@johnlanni 如果重启了,这个配置也不下发的话,也会有问题吧

@johnlanni
Copy link
Collaborator

johnlanni commented Jan 28, 2026

@AYue-94 嗯,这块可以考虑逻辑调整下,改成可以实现配置解析失败,阻塞启动。
现在没有阻塞是吧?


@AYue-94 Well, this area can be adjusted logically so that configuration parsing fails and startup is blocked.
There's no blockage now, right?

@AYue-94
Copy link
Author

AYue-94 commented Jan 28, 2026

@AYue-94 嗯,这块可以考虑逻辑调整下,改成可以实现配置解析失败,阻塞启动。 现在没有阻塞是吧?

@AYue-94 Well, this area can be adjusted logically so that configuration parsing fails and startup is blocked. There's no blockage now, right?

@johnlanni 对的,controller是可以正常启动的

@johnlanni
Copy link
Collaborator

johnlanni commented Feb 1, 2026

@AYue-94 controller肯定不会阻塞的,要看下现在是否会阻塞配置推送给gateway,如果不推送,gateway重启的时候就会阻塞住,不会提供基于错误配置的服务


@AYue-94 The controller will definitely not be blocked. We need to see if it will block the configuration push to the gateway. If not, the gateway will be blocked when it restarts and will not provide services based on incorrect configurations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants