Skip to content

feat(providers): add tls_ca_cert_path support for custom inference providers#5797

Merged
Audacity88 merged 14 commits into
zeroclaw-labs:masterfrom
andybab:feature/tls-ca-cert-support
Jun 14, 2026
Merged

feat(providers): add tls_ca_cert_path support for custom inference providers#5797
Audacity88 merged 14 commits into
zeroclaw-labs:masterfrom
andybab:feature/tls-ca-cert-support

Conversation

@andybab

@andybab andybab commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Base branch target: master
  • Problem: ZeroClaw cannot connect to custom inference endpoints that use a private/corporate PKI for TLS, resulting in "invalid peer certificate: UnknownIssuer" errors.
  • Why it matters: Corporate and self-hosted deployments frequently sit behind a proxy or PKI that issues certificates from an internal CA not in the system root store.
  • What changed: Added tls_ca_cert_path config option to ModelProviderConfig; wired the certificate through to the chat API HTTP client via apply_compat_options in factory.rs. The onboard model-discovery hunk (previously targeting onboard/mod.rs) was dropped — that file was replaced by quickstart/mod.rs in feat(integration): introduce zerocode TUI, RPC socket transport, DenyWithEdit approval, and beta-2 integration #6848; the runtime discovery path now goes through the regular OpenAiCompatibleModelProvider which already carries the cert.
  • What did not change: No changes to authentication, existing provider behavior, or any default TLS policy. All other providers are unaffected.

Label Snapshot (required)

  • Risk label: risk: high
  • Size label: auto-managed
  • Scope labels: config, provider, core
  • Module labels: none (no single provider targeted)
  • Contributor tier label: auto-managed

Change Metadata

  • Change type: feature
  • Primary scope: provider

Linked Issue

Supersede Attribution (required when Supersedes # is used)

N/A — this PR does not supersede another.

Validation Evidence (required)

Branch was refreshed onto current master again before merge; current head is 7ce4afb08818. The local validation below was run on the earlier reviewed head 1834b1ad6, and GitHub CI has now re-run successfully on refreshed head 7ce4afb08818.

$ cargo fmt --all -- --check
(no output — clean)

$ cargo clippy --workspace --exclude zeroclaw-desktop --all-targets --features ci-all -- -D warnings
    Checking zeroclaw-config v0.8.0-beta-2
    Checking zeroclaw-memory v0.8.0-beta-2
    Checking zeroclaw-providers v0.8.0-beta-2
    Checking zeroclaw-tools v0.8.0-beta-2
    Checking zeroclaw-runtime v0.8.0-beta-2
    Checking zeroclaw-hardware v0.8.0-beta-2
    Checking zeroclaw-channels v0.8.0-beta-2
    Checking zeroclaw-gateway v0.8.0-beta-2
    Checking zeroclawlabs v0.8.0-beta-2
    Checking xtask v0.8.0-beta-2
    Checking fill-translations v0.8.0-beta-2
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1m 10s

$ cargo test -p zeroclaw-providers --lib tls_ca_cert
    Finished `test` profile [unoptimized + debuginfo] target(s) in 25.55s
     Running unittests src/lib.rs (target/debug/deps/zeroclaw_providers-ca4d112b70af4634)

running 3 tests
test compatible::tests::with_tls_ca_cert_path_missing_file_leaves_pem_none ... ok
test tests::provider_runtime_options_from_config_propagates_tls_ca_cert_path ... ok
test compatible::tests::with_tls_ca_cert_path_invalid_pem_stores_bytes_and_http_client_still_builds ... ok

test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 901 filtered out; finished in 0.40s

$ cargo test -p zeroclaw-providers --lib ollama
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.19s
     Running unittests src/lib.rs (target/debug/deps/zeroclaw_providers-ca4d112b70af4634)

running 68 tests
test ollama::tests::capabilities_disable_native_tools_and_enable_vision ... ok
test multimodal::tests::extract_ollama_image_payload_supports_data_uris ... ok
test ollama::tests::cloud_suffix_strips_model_name ... ok
test ollama::tests::cloud_suffix_preserved_for_private_remote_with_api_key ... ok
test ollama::tests::cloud_suffix_preserved_for_private_remote_without_api_key ... ok
test factory::tests::ollama_factory_normalizes_host_root_to_openai_compat_base ... ok
test factory::tests::ollama_factory_uses_no_credential_when_key_absent ... ok
test factory::tests::ollama_factory_preserves_typed_api_key_for_official_cloud ... ok
test factory::tests::ollama_factory_normalizes_legacy_api_path_to_openai_compat_base ... ok
test ollama::tests::custom_url_no_trailing_slash ... ok
test ollama::tests::custom_url_strips_api_chat_suffix ... ok
test ollama::tests::api_response_parses_without_eval_counts ... ok
test ollama::tests::api_response_parses_eval_counts ... ok
test ollama::tests::cloud_suffix_without_api_key_errors ... ok
test ollama::tests::cloud_suffix_with_unspecified_local_endpoint_errors ... ok
test ollama::tests::cloud_suffix_with_local_endpoint_errors ... ok
test ollama::tests::build_chat_request_with_think_emits_explicit_options ... ok
test ollama::tests::custom_url_strips_api_suffix ... ok
test ollama::tests::custom_url_trailing_slash ... ok
test ollama::tests::default_url ... ok
test ollama::tests::effective_content_falls_back_to_thinking_field ... ok
test ollama::tests::effective_content_returns_none_when_both_empty ... ok
test ollama::tests::effective_content_strips_think_and_returns_rest ... ok
test ollama::tests::effective_content_prefers_content_over_thinking ... ok
test ollama::tests::effective_content_uses_thinking_when_content_is_think_only ... ok
test ollama::tests::empty_url_uses_empty ... ok
test ollama::tests::extract_tool_name_handles_nested_tool_call ... ok
test ollama::tests::extract_tool_name_handles_normal_call ... ok
test ollama::tests::extract_tool_name_handles_prefixed_name ... ok
test ollama::tests::fallback_text_for_empty_content_without_thinking_is_generic ... ok
test ollama::tests::local_endpoint_auth_disabled_even_with_key ... ok
test ollama::tests::normalize_response_text_rejects_whitespace_only_content ... ok
test ollama::tests::normalize_response_text_rejects_think_only_content ... ok
test ollama::tests::normalize_response_text_strips_think_tags ... ok
test ollama::tests::qwen_think_with_tool_call_in_content_preserved ... ok
test ollama::tests::format_tool_calls_produces_valid_json ... ok
test ollama::tests::convert_messages_parses_native_assistant_tool_calls ... ok
test ollama::tests::convert_messages_extracts_images_from_user_marker ... ok
test ollama::tests::qwen_thinking_field_with_tool_call_xml_extracted ... ok
test ollama::tests::remote_endpoint_auth_enabled_when_key_present ... ok
test ollama::tests::convert_messages_maps_tool_result_call_id_to_tool_name ... ok
test ollama::tests::remote_endpoint_with_api_suffix_still_allows_cloud_models ... ok
test ollama::tests::request_includes_default_num_ctx_and_num_predict ... ok
test ollama::tests::request_includes_overridden_tuning ... ok
test ollama::tests::request_includes_think_when_reasoning_configured ... ok
test ollama::tests::request_omits_think_when_reasoning_not_configured ... ok
test ollama::tests::response_deserializes ... ok
test ollama::tests::response_with_empty_content ... ok
test ollama::tests::response_with_missing_content_defaults_to_empty ... ok
test ollama::tests::response_with_thinking_field_extracts_content ... ok
test ollama::tests::strip_think_tags_handles_unclosed_block ... ok
test ollama::tests::strip_think_tags_preserves_text_without_tags ... ok
test ollama::tests::retry_path_carries_options ... ok
test ollama::tests::strip_think_tags_removes_single_block ... ok
test ollama::tests::strip_think_tags_removes_multiple_blocks ... ok
test ollama::tests::strip_think_tags_returns_empty_for_think_only ... ok
test ollama::tests::temperature_override_replaces_per_call_temperature ... ok
test ollama::tests::response_with_tool_calls_parses_correctly ... ok
test ollama::tests::temperature_override_unset_passes_per_call_temperature ... ok
test tests::ollama_alias_tuning_defaults_leave_temperature_override_unset ... ok
test tests::ollama_alias_tuning_fields_populate_tuning_struct ... ok
test tests::ollama_uses_resolved_url_from_runtime_options ... ok
test tests::ollama_cloud_with_custom_url ... ok
test tests::ollama_with_custom_url ... ok
test tests::factory_ollama ... ok
test ollama::tests::chat_with_tool_specs_omits_native_tools_payload ... ok
test tests::ollama_private_remote_lists_models_without_auth ... ok
test tests::ollama_private_remote_cloud_request_omits_auth_and_preserves_model ... ok

test result: ok. 68 passed; 0 failed; 0 ignored; 0 measured; 836 filtered out; finished in 0.28s

CI on prior head (ab05beca): Quality Gate ✅, Validate PR title ✅, PR Path Labeler ✅
CI on reviewed head (1834b1ad6): Quality Gate ✅ (completed 2026-06-05), Validate PR title ✅, PR Path Labeler ✅
CI on refreshed head (7ce4afb08818): Quality Gate ✅, Validate PR title ✅, PR Path Labeler ✅ (completed 2026-06-14)

  • Evidence provided: manual testing against a corporate proxy requiring custom CA (70 models fetched, chat API succeeds, agent mode no panic)
  • If any command is intentionally skipped: none skipped

Security Impact (required)

  • New permissions/capabilities? No
  • New external network calls? No (only affects how existing TLS connections are configured)
  • Secrets/tokens handling changed? No
  • File system access scope changed? Yes — reads one user-specified file path at provider construction time
  • Description: tls_ca_cert_path is read from user-controlled config (~/.zeroclaw/config.toml). This is within the existing config trust model — if you control the config file, you control what the agent trusts. No path traversal or privilege escalation beyond the existing config trust boundary. Read errors and parse failures log a warning and fall back to system roots rather than crashing.

Privacy and Data Hygiene (required)

  • Data-hygiene status: pass
  • Redaction/anonymization notes: no personal data, real names, credentials, or private URLs in code, tests, or fixtures
  • Neutral wording confirmation: all test and example strings use project-scoped placeholders

Compatibility / Migration

  • Backward compatible? Yes — new optional field with #[serde(default)], existing configs unchanged
  • Config/env changes? Yes — new tls_ca_cert_path field under [model_providers.<name>]
  • Migration needed? No
  • Upgrade steps: none required; to use, add tls_ca_cert_path = "/path/to/ca.pem" to the relevant provider block

i18n Follow-Through (required when docs or user-facing wording changes)

  • i18n follow-through triggered? No — no user-facing wording or doc changes in this PR

Human Verification (required)

  • Verified scenarios: model refresh with custom CA (70 models fetched), chat API call with custom CA (SSL handshake succeeds), agent mode startup (no panic), missing cert file (warning logged, falls back to system roots)
  • Edge cases checked: cert file does not exist → warning logged; cert file is not valid PEM → warning logged; custom: URL with trailing slash → normalized correctly
  • What was not verified: cert rotation (hot-reload without restart), Windows cert store interaction

Side Effects / Blast Radius (required)

  • Affected subsystems: zeroclaw-config (schema), zeroclaw-providers (HTTP client construction)
  • Potential unintended effects: none — cert is loaded at provider construction; a bad cert path logs a warning and the provider falls back to system roots
  • Guardrails: warning logs on both read failure and parse failure make misconfiguration diagnosable

Agent Collaboration Notes (recommended)

Rollback Plan (required)

  • Fast rollback: git revert HEAD — the tls_ca_cert_path field is optional and defaults to None; reverting restores prior behavior with no config migration needed
  • Feature flags or config toggles: the field itself acts as the toggle — omitting it disables the behavior
  • Observable failure symptoms: "UnknownIssuer" TLS errors (same as before this PR) if rollback is needed

Risks and Mitigations

  • Risk: cert file is read at provider construction time — if the file is deleted after startup, a process restart will log a warning and fall back to system roots
    • Mitigation: warning is logged clearly; fallback to system roots is the safe degradation path
  • Risk: custom: URL matching is case-sensitive
    • Mitigation: consistent casing is the expected norm for URLs in config files

@github-actions github-actions Bot added core Auto scope: root src/*.rs files changed. config Auto scope: src/config/** changed. cron Auto scope: src/cron/** changed. memory Auto scope: src/memory/** changed. provider Auto scope: src/providers/** changed. labels Apr 16, 2026
@andybab

andybab commented Apr 16, 2026

Copy link
Copy Markdown
Contributor Author

Review: Needs Author Action

Comprehension Summary

What: Adds tls_ca_cert_path config field to ModelProviderConfig, threads it through OpenAiCompatibleProvider's HTTP client builder and the wizard model-refresh path. Also fixes custom: URL-based provider lookup in fallback_provider() and registers the CLI channel in agent mode.

Why: Custom/corporate PKI endpoints fail TLS with "UnknownIssuer" without a custom CA bundle.

Blast radius: zeroclaw-config (Beta — schema change), zeroclaw-providers (Beta — HTTP client construction), zeroclaw-runtime wizard (Experimental — model refresh), src/main.rs (CLI channel registration). Config change is backward-compatible (new optional field, #[serde(default)]).

Security/performance assessment: Cert path comes from user-controlled config — within existing trust model, no privilege escalation. http_client() is called on every API request (9 call sites in compatible.rs) and add_tls_cert does std::fs::read() each time. With tls_ca_cert_path set, this is a filesystem read per inference call.


Blockers

[blocking] CI: cargo fmt failing on 6 files

The PR reordered imports in a direction that doesn't match rustfmt's expected order. CI shows diffs in src/commands/update.rs, src/cron/mod.rs, src/main.rs (two hunks), src/memory/cli.rs, src/migration.rs, src/providers/traits.rs. These are incidental formatting changes mixed into a feature PR. Run cargo fmt --all and commit, or revert these files to their pre-PR state.

[blocking] Silent cert error swallowing — crates/zeroclaw-providers/src/compatible.rs:346-354

let add_tls_cert = |builder: ClientBuilder| -> ClientBuilder {
    if let Some(ref cert_path) = self.tls_ca_cert_path {
        if let Ok(cert_content) = std::fs::read(cert_path) {       // silent fail
            if let Ok(cert) = reqwest::Certificate::from_pem(&cert_content) {  // silent fail
                return builder.add_root_certificate(cert);
            }
        }
    }
    builder
};

Both fs::read and Certificate::from_pem failures are silently swallowed. The client is built without the cert and the original "UnknownIssuer" error returns — with no hint that the cert config was ignored. The wizard.rs version correctly uses with_context and propagates errors. Apply the same discipline: log tracing::warn! on each failure path with cert path + error.

[blocking] http_client() reads cert from disk on every API request

http_client() is invoked at 9 call sites, once per request. With tls_ca_cert_path set, each call does std::fs::read(cert_path) + PEM parse. Fix: read and parse the cert once during provider construction (in with_tls_ca_cert_path or via a lazy OnceLock) and store the reqwest::Certificate directly, not the path string.

[blocking] PR template not followed

PR body uses freeform layout instead of .github/pull_request_template.md. Missing required sections: Label Snapshot, Change Metadata, Validation Evidence (with actual command output), Security Impact (yes/no fields), Privacy and Data Hygiene, Compatibility/Migration, Human Verification, Side Effects/Blast Radius, Rollback Plan, Risks and Mitigations.


Suggestions

[suggestion] tracing::debug! in hot provider lookup path — crates/zeroclaw-config/src/providers.rs:33-44

fallback_provider() is called on every provider operation. The debug lines will be noisy. Remove or reduce. The diagnostic tracing::debug! in wizard.rs (one-off setup path) are fine.

[suggestion] setup_provider always passes None for tls_ca_cert_pathcrates/zeroclaw-runtime/src/onboard/wizard.rs:3029

During interactive onboarding, if tls_ca_cert_path is already configured, model fetch will still fail TLS. If intentional, add a comment. If not, thread the cert path through.

[question] Case-sensitive custom: URL matching

fallback_provider() compares base_url values case-sensitively. custom:https://My-Proxy.example.com/v1 vs https://my-proxy.example.com/v1 won't match. Acceptable?


Privacy note

PR body mentions "Seznam corporate proxy" — names your employer. Per pr-discipline.md, use neutral placeholders ("corporate proxy", "example.com") in PR artifacts before merge.


Feature is well-scoped and the config approach is correct. Address the four blockers above, run cargo fmt --all locally before pushing, and this will be in good shape.

@andybab andybab force-pushed the feature/tls-ca-cert-support branch from 5e8acaf to 15b95ed Compare April 16, 2026 08:08
@github-actions github-actions Bot added the tests Auto scope: tests/** changed. label Apr 16, 2026
@andybab andybab force-pushed the feature/tls-ca-cert-support branch from 15b95ed to 7b56680 Compare April 16, 2026 08:10
@github-actions github-actions Bot removed config Auto scope: src/config/** changed. cron Auto scope: src/cron/** changed. memory Auto scope: src/memory/** changed. provider Auto scope: src/providers/** changed. labels Apr 16, 2026
@andybab andybab force-pushed the feature/tls-ca-cert-support branch 3 times, most recently from 64da74a to 8ebf996 Compare April 16, 2026 08:29
@singlerider singlerider requested a review from Audacity88 April 29, 2026 08:17
@singlerider singlerider added enhancement New feature or request risk: high Auto risk: security/runtime/gateway/tools/workflows. size: M Auto size: 251-500 non-doc changed lines. provider Auto scope: src/providers/** changed. needs-author-action Author action required before merge labels Apr 29, 2026

@Audacity88 Audacity88 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed current head e4dee80 against the live PR state, PR body, linked #1458 / #1676 context, prior top-level review comment, changed-file list, current origin/master, and a local read-only merge simulation. I did not do a full high-risk changed-line review because the PR is currently DIRTY and one touched runtime onboarding file no longer exists on master.

✅ Resolved — the earlier TLS-specific blockers look addressed on this head

The previous top-level review comment called out four main blockers: formatting, silent cert-error swallowing, per-request cert-file reads, and an incomplete PR template.

On current head, those look materially improved:

  • CI is green on e4dee80, including lint, test, security, and the required gate.
  • OpenAiCompatibleProvider::with_tls_ca_cert_path now logs a warning when the configured CA file cannot be read.
  • http_client() now logs a warning when PEM parsing fails instead of silently falling back to system roots.
  • The implementation no longer reads the certificate file from disk per request; it loads PEM bytes during provider construction and reuses them when building clients.
  • The PR body now follows the current template shape with validation, security/privacy, compatibility, blast radius, rollback, and risk notes.

🔴 Blocking for review — the branch still conflicts with the current onboarding architecture

GitHub reports mergeStateStatus: DIRTY, and local git merge-tree origin/master origin/pr/5797 confirms the conflict:

CONFLICT (modify/delete): crates/zeroclaw-runtime/src/onboard/wizard.rs deleted in origin/master and modified in origin/pr/5797.

That matters because the PR still wires model-refresh TLS support through crates/zeroclaw-runtime/src/onboard/wizard.rs, but current master has replaced that file with the newer onboarding implementation under crates/zeroclaw-runtime/src/onboard/mod.rs and crates/zeroclaw-runtime/src/onboard/ui/*. So the provider/config side may be close, but the onboarding/model-refresh portion needs to be rebased or reworked onto the current onboarding flow before this can be meaningfully reviewed or merged.

🟡 Follow-up validation after rebase

After the rebase/rework, please refresh the validation evidence on the new head. Because this touches provider TLS behavior and config/runtime plumbing, the useful evidence would be:

  • cargo fmt --all -- --check
  • workspace clippy / tests, or a clearly scoped equivalent if full workspace validation is too expensive
  • a focused check that the current onboarding model-refresh path still uses tls_ca_cert_path
  • a focused provider-path check that a bad cert path and a bad PEM still warn and fall back as described

🟢 What is worth keeping

The core feature is still valuable. Current master does not appear to expose tls_ca_cert_path, and #1458 is closed via the older #1676 thread even though #1676 itself is closed rather than merged. So this PR may still be the right recovery path for custom corporate/internal PKI support; it just needs to land on the current crate/onboarding architecture.

Verdict

COMMENT for now. I would not spend full high-risk review time or approve this branch while it is conflicted against deleted onboarding code, but the current TLS-provider changes look substantially closer than the older review comment.

@github-actions github-actions Bot removed provider Auto scope: src/providers/** changed. tests Auto scope: tests/** changed. labels May 7, 2026
@github-actions github-actions Bot added provider:gemini Auto module: provider/gemini changed. channel:discord Auto module: channel/discord changed. channel:matrix Auto module: channel/matrix changed. channel:whatsapp Auto module: channel/whatsapp changed. channel:lark Auto module: channel/lark changed. channel:email labels Jun 1, 2026
@singlerider

Copy link
Copy Markdown
Collaborator

@andybab This diff is wild. I don't think this is the branch you intend on showcasing for this change.

@WareWolf-MoonWall WareWolf-MoonWall left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed at head 1dbeed2 — CI 13/13 green. The needs-author-action label has been removed; the rebase resolves the conflict @Audacity88 noted on the previous head.

✅ Resolved — wizard.rs conflict (from @Audacity88's review)

@Audacity88's prior comment was blocked on crates/zeroclaw-runtime/src/onboard/wizard.rs, which master had deleted. The rebased diff now touches crates/zeroclaw-runtime/src/onboard/mod.rs — the current onboarding implementation. TLS support for model-refresh discovery is correctly wired through discover_openai_compat_models(..., tls_ca_cert_pem.as_deref()) inside prompt_model. No wizard.rs in the diff.

🟢 What looks good — cert loaded once at construction, not per request

with_tls_ca_cert_path reads and stores PEM bytes during OpenAiCompatibleModelProvider construction. Subsequent http_client() calls parse the in-memory bytes into a reqwest::Certificate — no disk I/O on the hot path. This is the correct performance contract for a high-frequency HTTP client builder.

🟢 What looks good — graceful error handling at both failure boundaries

File-read failure: logs WARN with path and error fields, leaves tls_ca_cert_pem = None (falls back to system roots). PEM-parse failure in add_tls_cert: also logs WARN and continues with the unmodified builder. Neither path panics or bubbles an error to the caller. The right behaviour for a user-configured optional override.

🟢 What looks good — backward compatible by design

tls_ca_cert_path: Option<String> with #[serde(default, skip_serializing_if = "Option::is_none")]. Existing config.toml files require no changes. The feature activates only when the user explicitly adds the field.

🟡 Warning — risk: high label is correct; PR body self-assessment should be updated

The PR body's Label Snapshot says risk: medium, but the auto-labeler has correctly set risk: high — the PR touches crates/zeroclaw-runtime/src/onboard/mod.rs and crates/zeroclaw-providers/src/compatible.rs, both in the high-risk path per AGENTS.md. The live label stands. PR body can be updated to reflect the actual label snapshot before merge.

🟡 Warning — onboarding cert read has no explicit lifecycle comment

In onboard/mod.rs, std::fs::read(path).ok() runs inside prompt_model — a one-time onboarding call, so the per-invocation disk read is acceptable. If prompt_model is ever called in a retry loop (e.g. after a provider selection failure), the read fires each time. A brief // one-time read during onboarding setup comment at that line makes the intent explicit and prevents a future caller from assuming the bytes are cached.

🔵 Suggestion — document the tls_ca_cert_pem field lifecycle

tls_ca_cert_pem: Option<Vec<u8>> is a construction-time materialisation of the path-resolved cert bytes — same lifecycle as api_key, timeout_secs, and other config-derived fields on this struct. A one-line doc comment ("loaded from tls_ca_cert_path at construction; not refreshed after config reload") would prevent a future "why doesn't this hot-reload?" question and make the scope explicit.

#6848 sanity check ✓

The change is strictly additive to apply_compat_options in factory.rs — a new if let Some(ref cert_path) guard at the end of the function. No model-provider ordering, selection, or fallback logic is touched. Compatible.

@tidux tidux left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed at head 1dbeed2 against current upstream/master (40be773). Cross-checked the diff against the local tree (this PR's branch is the worktree), ran a local git merge-tree upstream/master HEAD simulation, and read FND-005, AGENTS.md risk-tier table, and the prior reviews from @andybab, @Audacity88, and @WareWolf-MoonWall.

The provider-side work is in good shape and the earlier TLS-specific blockers (fmt, silent swallowing, per-request disk reads, PR template) all read as resolved on this head. The block here is purely a fresh master-side conflict: PR #6848 landed on master a few hours ago (commit 2c91478, merged 2026-06-02 23:54Z) and deleted the entire crates/zeroclaw-runtime/src/onboard/ module, replacing it with the new crates/zeroclaw-runtime/src/quickstart/mod.rs flow. The 5th hunk of this PR patches crates/zeroclaw-runtime/src/onboard/mod.rs, which no longer exists on master — so the branch is mergeable_state: dirty again, same shape of issue as the older wizard.rs conflict @Audacity88 caught on e4dee80, just one rebase deeper.

The four provider-side hunks (zeroclaw-config/src/schema.rs, zeroclaw-providers/src/compatible.rs, zeroclaw-providers/src/factory.rs, zeroclaw-providers/src/lib.rs) still auto-merge cleanly against current master — only the runtime hunk conflicts. That's the path I'd take this branch down.

✅ Resolved — earlier TLS blockers (from @andybab's first review)

The four blockers from @andybab's original review all land on this head:

  • cargo fmt --all -- --check is clean per the PR body's validation block; CI was 13/13 green at this commit.
  • OpenAiCompatibleProvider::with_tls_ca_cert_path now logs a WARN event when the configured CA file cannot be read (compatible.rs:312-318) instead of silently swallowing the error.
  • add_tls_cert inside http_client() now logs a WARN on PEM parse failure (compatible.rs:447-453) rather than silently dropping the cert.
  • The cert is read from disk once during provider construction and stored as tls_ca_cert_pem: Option<Vec<u8>>http_client() only does an in-memory Certificate::from_pem on the hot path. The 9-call-site concern is no longer relevant.
  • PR body now follows .github/pull_request_template.md.

✅ Resolved — wizard.rs conflict (from @Audacity88's prior review)

The older block on crates/zeroclaw-runtime/src/onboard/wizard.rs (deleted on master at the time, modified on the PR) was resolved by the rebase onto onboard/mod.rs. That fix was correct for master as it stood when @WareWolf-MoonWall reviewed.

🔴 Blocking — branch conflicts with current master: crates/zeroclaw-runtime/src/onboard/mod.rs is deleted

GitHub now reports mergeable_state: dirty. Local git merge-tree upstream/master HEAD confirms:

Auto-merging crates/zeroclaw-config/src/schema.rs
Auto-merging crates/zeroclaw-providers/src/compatible.rs
Auto-merging crates/zeroclaw-providers/src/factory.rs
Auto-merging crates/zeroclaw-providers/src/lib.rs
CONFLICT (modify/delete): crates/zeroclaw-runtime/src/onboard/mod.rs deleted in upstream/master and modified in HEAD.

PR #6848 (commit 2c91478, merged today) retired the entire onboard/ module in favor of a new quickstart/ flow under crates/zeroclaw-runtime/src/quickstart/mod.rs. The PR's runtime hunk — which threads tls_ca_cert_pem through discover_openai_compat_models inside prompt_model — is now patching a deleted file. The other four hunks still auto-merge cleanly.

What I'd recommend (in order, lightest first):

  1. Drop the runtime hunk entirely if it isn't needed. I grepped the new quickstart/mod.rs and api_quickstart.rs and could not find an equivalent of discover_openai_compat_models — the in-flow /v1/models probe from the old onboarding does not appear to have a direct replacement in the new quickstart. If model discovery now goes through the regular OpenAiCompatibleModelProvider (which already wears the cert via with_tls_ca_cert_path after this PR), the runtime hunk is redundant and the PR can ship as a pure provider/config change.
  2. If quickstart does have a discovery path that still bypasses the provider client, the TLS-aware discovery helper needs to be rebuilt there with the same tls_ca_cert_pem plumbing the PR currently puts in onboard/mod.rs. The shape of the patch from discover_openai_compat_models_with_timeout is reusable verbatim.

Either way, a rebase onto current master is required before this can merge.

🟡 Warning — PR body Label Snapshot still says risk: medium while live label is risk: high

@WareWolf-MoonWall already flagged this on the previous head and it still stands. The auto-labeler is correct — touching crates/zeroclaw-providers/src/compatible.rs and (in this PR) crates/zeroclaw-runtime/src/** puts it in the high-risk band per AGENTS.md §risk tiers. Please update the Label Snapshot in the PR body to match.

🟡 Warning — once rebased, refresh validation evidence

The current Validation Evidence block was captured on 1dbeed2, before #6848 landed. After rebasing onto the new quickstart/ layout, please re-run:

  • cargo fmt --all -- --check
  • cargo clippy --all-targets -- -D warnings
  • cargo test -p zeroclaw-providers (the four provider files that auto-merged still need to be exercised against any incidental API surface changes from #6848)
  • cargo test -p zeroclaw-runtime if the runtime hunk gets re-pointed at quickstart rather than dropped

A focused check that a bad cert path and a bad PEM still produce WARN events and fall back to system roots would also be helpful evidence after the rebase.

🔵 Suggestion — one-line doc comment on tls_ca_cert_pem

Carrying @WareWolf-MoonWall's earlier suggestion forward — a one-line comment on the tls_ca_cert_pem: Option<Vec<u8>> field along the lines of // loaded from tls_ca_cert_path at construction; not refreshed across config reloads would head off future "why doesn't this hot-reload?" questions. The field's lifecycle is currently inferable from with_tls_ca_cert_path, but stating it explicitly costs one line.

🟢 What looks good — load-once, fail-soft, backward-compatible

Independent of where the discovery path ends up, the provider-side design choices are the right ones:

  • PEM bytes are materialised at construction (with_tls_ca_cert_path), so http_client() stays disk-I/O-free on the hot path. The performance contract for a per-request client builder is preserved.
  • Both failure boundaries (file read in with_tls_ca_cert_path, PEM parse in add_tls_cert) WARN with structured attrs (path, error) and fall back to system roots rather than panicking or bubbling errors. That's the right shape for a user-configured optional override.
  • tls_ca_cert_path: Option<String> with #[serde(default, skip_serializing_if = "Option::is_none")] keeps existing config.toml files untouched. Per FND-005, the feature activates only when the operator opts in — and the trust model (user-owned ~/.zeroclaw/config.toml decides what the agent trusts) is unchanged.

Verdict

REQUEST_CHANGES on the master-side merge conflict. The fix path is small — most likely "drop the runtime hunk after rebasing" — but the branch can't merge as-is and the right shape after rebase needs to be decided rather than auto-merged.

Once the rebase lands and Validation Evidence is refreshed, I expect this to be a quick re-review.

@Audacity88 Audacity88 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-reviewed current head 43aa60e against the live PR state, current master, linked #1458 / #1676 context, the prior reviews from @Audacity88, @WareWolf-MoonWall, and @tidux, the current four-file diff, and the PR body's refreshed validation/security notes.

✅ Resolved — the current head clears the stale onboarding conflict

The earlier block from @tidux looks resolved on this head. The runtime/onboarding hunk is gone, the live changed-file list is now limited to config and provider code, and a local read-only merge simulation against current master no longer reports a conflict. CI is also green on 43aa60e.

The previous body drift is mostly cleaned up too: the Label Snapshot now reflects risk: high, the implementation comment on tls_ca_cert_pem documents its construction-time lifecycle, and the feature is back to the provider/config scope that still matches #1458.

🟡 Warning — add a focused regression for the TLS CA path

The remaining gap I see is test coverage around the new config-to-provider path. The PR body has useful manual evidence, including a corporate-proxy smoke and bad-path / bad-PEM checks, but the diff does not add a focused regression that would fail if tls_ca_cert_path stopped flowing from ModelProviderConfig into ModelProviderRuntimeOptions and then through apply_compat_options into the OpenAI-compatible provider.

Because this is a high-risk provider/TLS config path, I would prefer at least one small automated check before merge. The most direct version would assert that a resolved ModelProviderConfig entry with tls_ca_cert_path = "/tmp/example-ca.pem" produces runtime options with that same path. If the provider-side client behavior is easy to test without a real network call, a second focused check for missing-path or invalid-PEM warn-and-fallback behavior would be even better. The important part is giving this feature a regression guard at the config/runtime boundary, not just relying on full CI plus manual smoke.

🔵 Suggestion — clarify literal path semantics or support ~

One small user-facing edge: with_tls_ca_cert_path passes the configured string directly to std::fs::read. That means ~/.zeroclaw/ca.crt will not expand, even though that is the kind of path users often reach for when configuring local CA files. This does not need to block the PR if absolute paths are the intended contract, but either documenting "absolute or relative path; no shell expansion" or expanding ~ before reading would prevent a common setup surprise.

🟢 What looks good — provider TLS behavior

The provider-side shape is much better than the older branch: the cert file is read once during provider construction, missing files and invalid PEMs warn instead of silently disappearing, default TLS behavior is unchanged when the field is unset, and the PR does not add an insecure TLS bypass. That is the right direction for custom corporate/internal PKI support.

Verdict

COMMENT from me for now. I do not want to approve over an active CHANGES_REQUESTED review, but the stale merge-conflict blocker appears resolved on the current head. The only item I would still want addressed or explicitly accepted by maintainers is the focused regression coverage above.

@tidux tidux left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed at head ab05bec against current upstream/master (746c616, post-#7231). Read the new five commits since my prior block, cross-checked the diff against the local tree, ran a local merge simulation (git merge upstream/master on the PR head), re-read FND-005 and the AGENTS.md risk-tier table, and looked at @Audacity88's fresh COMMENT review on the intermediate head 43aa60e.

Big picture: the previous master-side conflict is resolved correctly. The runtime hunk was dropped — that was the right call given #6848's quickstart refactor, and the four provider/config hunks land cleanly. Documentation, test coverage, and the PR body Label Snapshot have all moved in the right direction. Two of the three carried-forward items from the prior reviews are resolved on this head and one is still open.

The new wrinkle is in ollama.rs. The author rebased onto a1b641e01 — the broken-build commit that #7095 created and #7231 (746c616e4, merged ~9h ago) reverted. To make the rebase compile, this PR reintroduces the unwrap_or line and wraps Some(temperature) at the send_request callsite. Master has since taken the other side of that fence, so the merge silently reintroduces the regression #7231 just removed.

✅ Resolved — master-side runtime conflict from #6848

The five-file diff is now zeroclaw-config/src/schema.rs, zeroclaw-providers/src/{compatible.rs, factory.rs, lib.rs}, plus zeroclaw-providers/src/ollama.rs (the new rebase-fix hunk discussed below). The crates/zeroclaw-runtime/src/onboard/mod.rs hunk that I blocked on in my prior review is gone — author confirmed in the PR body that the runtime path now goes through the regular OpenAiCompatibleModelProvider which already wears the cert via with_tls_ca_cert_path. Local git merge upstream/master no longer reports a conflict for the runtime tree; the only conflict shown is the ollama.rs one called out below.

✅ Resolved — PR body Label Snapshot now says risk: high

The carried-forward 🟡 from @WareWolf-MoonWall about Label Snapshot drift is addressed — the PR body now reads Risk label: risk: high and matches the live label.

✅ Resolved — tls_ca_cert_pem field lifecycle comment

Commit e8e7def adds the doc comment Wolf and I both asked for: // Loaded from disk once at construction; not refreshed across config reloads. at compatible.rs:32-34. Exactly the right framing — the field's lifecycle is now stated rather than inferable.

✅ Resolved — automated regression coverage for the config-to-provider path

@Audacity88's 🟡 from earlier today on 43aa60e is addressed at 4ec7bb0 + the two compatible.rs tests:

  • provider_runtime_options_from_config_propagates_tls_ca_cert_pathlib.rs test that round-trips tls_ca_cert_path = "/tmp/example-ca.pem" through model_provider_runtime_options_from_model_provider_entry and asserts it surfaces on ModelProviderRuntimeOptions.
  • with_tls_ca_cert_path_missing_file_leaves_pem_nonecompatible.rs test that a nonexistent path keeps tls_ca_cert_pem = None.
  • with_tls_ca_cert_path_invalid_pem_stores_bytes_and_http_client_still_buildscompatible.rs test that a readable file with garbage bytes still populates tls_ca_cert_pem, and that http_client() builds cleanly (PEM parse fails internally, warns, falls back to system roots).

All three pass locally on ab05bec: cargo test -p zeroclaw-providers --lib tls_ca_cert → 3 passed, full cargo test -p zeroclaw-providers --lib ollama → 68 passed.

🔴 Blocking — ollama.rs rebase fix reintroduces the regression #7231 just reverted

crates/zeroclaw-providers/src/ollama.rs:1049,1058 (commit ab05bec, last commit on the branch) reintroduces this on OllamaModelProvider::chat:

let temperature = temperature.unwrap_or(self.default_temperature());
// ...
.send_request(api_messages, &normalized_model, Some(temperature), should_auth, None)

That is the exact pattern #7095 left in master, that #7231 reverted ~9h ago at 746c616. The PR's rebase-base is a1b641e01 — the parent of #7231 — so on that base the line was needed to keep the build compiling. On current master chat() takes temperature: Option<f64> and threads it straight through to send_request(temperature: Option<f64>) without unwrapping, which is the behaviour #7231 was deliberately restoring: when the caller passes None, Ollama gets no temperature field at all (Options::temperature is #[serde(skip_serializing_if = "Option::is_none")] at ollama.rs:142-143), so the Ollama model's own default applies.

After this PR merges, that path becomes "NoneSome(self.default_temperature())Some(TEMPERATURE_DEFAULT = 0.8)" — so every caller that passed None will start pinning Ollama's temperature to 0.8 unconditionally. That's the precise regression the v0.8.0 release train just took a fix PR for.

Mechanically, I reproduced the merge locally with git merge upstream/master on the PR head: git auto-resolves the conflict in a way that compiles but keeps the unwrap_or + Some(...) shape. So this won't surface as a CI failure on the PR (CI runs against the merge-base, not the merged state) and won't surface as a visible conflict marker either. It will silently regress behaviour at merge time.

The fix is trivial — rebase onto current master (746c616 or newer) and either delete the two lines my prior Some(temperature) and let temperature = …unwrap_or(…) adjusted, or, equivalently, take master's version of that block verbatim. Once the rebase is on top of #7231, chat() goes back to:

.send_request(api_messages, &normalized_model, temperature, should_auth, None)

and the let temperature = temperature.unwrap_or(self.default_temperature()); line goes away. The TLS hunks in this PR are independent of that block — there's no logic interaction.

I want to make sure this isn't read as a re-litigation of architecture; the TLS work itself is fine. This is a pure rebase-staleness fix: the master snapshot the author rebased onto is no longer current, and the line the author added to make the older snapshot compile is now actively wrong against the new snapshot. A git pull --rebase upstream master (or equivalent) on this branch makes this finding disappear in one shot.

🟡 Warning — refresh Validation Evidence after the rebase

The PR body's Validation Evidence block says "Branch rebased onto current master (2026-06-03, head 40be773)" — that snapshot is now ~36h stale and predates both #7231 (ollama temperature revert) and a handful of other v0.8.0/v0.8.1 train commits. Once the rebase onto current master is done, please re-run and paste the new evidence:

  • cargo fmt --all -- --check
  • cargo clippy --workspace --exclude zeroclaw-desktop --all-targets --features ci-all -- -D warnings
  • cargo test -p zeroclaw-providers --lib tls_ca_cert (the three new tests added at 4ec7bb0 + compatible.rs — small, fast, regression-focused)
  • cargo test -p zeroclaw-providers --lib ollama (broad sanity that the rebase didn't disturb the ollama provider)

The CI gate will catch anything I missed, but having a refreshed Validation Evidence block paste in the PR body is the convention for risk: high per FND-005.

🔵 Suggestion — tls_ca_cert_path is missing a #[tab(...)] attribute

schema.rs:768-770 — 15 of the 17 fields on ModelProviderConfig carry a #[tab(Connection)] / #[tab(Model)] / #[tab(Advanced)] attribute that drives the dashboard / TUI schema grouping. The new tls_ca_cert_path field doesn't carry one (the only other field without it is kind, which is a discriminator). #[tab(Connection)] reads as the right home given the field's intent. Non-blocking — the field still deserialises and threads through — but it will likely show up unclassified in any generated config-editor surface until labelled.

🔵 Suggestion — ~-expansion or explicit literal-path docs

Seconding @Audacity88's 🔵 from earlier today: with_tls_ca_cert_path passes the configured string straight to std::fs::read, so tls_ca_cert_path = "~/.zeroclaw/ca.pem" will fail at startup with a warn-and-fallback, not a useful error. A one-line doc on the schema field along the lines of /// Path to a PEM-encoded CA certificate. Must be an absolute path; shell expansion (e.g. ~) is not performed. would head off the common setup mistake. Either that or an explicit shellexpand::tilde at the read site — operator preference.

🟢 What looks good — provider-side design is well-shaped

Recapping what I said in the prior review since the relevant code is unchanged at ab05bec:

  • PEM bytes materialise at construction in with_tls_ca_cert_path and http_client() only does an in-memory Certificate::from_pem on the hot path — no per-request disk I/O.
  • Both failure boundaries (std::fs::read at construction, Certificate::from_pem inside http_client()'s add_tls_cert closure) WARN with structured attrs (path, error) and fall back to system roots. The right behaviour for a user-configured optional override per FND-005's "graceful degradation over abrupt failure" norm.
  • tls_ca_cert_path: Option<String> with #[serde(default, skip_serializing_if = "Option::is_none")] keeps every existing config.toml deserialising byte-for-byte. The feature activates only when the operator opts in — the trust model (user-owned ~/.zeroclaw/config.toml decides what the agent trusts) is unchanged.

🟢 What looks good — test coverage matches the risk band

Three tests added at 4ec7bb0 + compatible.rs (named above) cover the three boundaries that matter for a risk: high TLS config path: config → runtime options, missing-path fallback, invalid-PEM fallback. The http_client() build-cleanly assertion in the invalid-PEM test is a particularly nice choice — it exercises the warn-and-fallback in add_tls_cert without having to mock a TLS handshake. All three pass locally in 0.01s.

Verdict

REQUEST_CHANGES again, but a much smaller block than the prior one — the entire delta from this review can ride on a single git rebase upstream/master once #7231's resolution is on the branch. Nothing about the TLS work itself blocks: provider design, error handling, test coverage, doc comment, and PR body all clear on this head. The block is the silent ollama.rs regression that the stale rebase introduces.

Once the rebase onto current master is done and Validation Evidence is refreshed, I expect this to be a quick approve.

Andrej Babolčai and others added 12 commits June 5, 2026 06:11
Add support for custom CA certificates (tls_ca_cert_path) for private/corporate PKI

inference providers. This enables ZeroClaw to connect to custom endpoints that use

a local PKI for TLS (e.g., corporate proxies).

Changes:

- Add tls_ca_cert_path field to ModelProviderConfig

- Fix provider lookup for custom: URLs to support TLS settings

- Use custom CA certificate in model refresh HTTP client

- Use custom CA certificate in chat API HTTP client

- Fix missing CLI channel registration in agent mode

- Convert eprintln debug messages to tracing::debug
…ms (zeroclaw-labs#5623)

- 8623156 feat(openrouter): add extra_body passthrough for generic request params
- d49721e fix(openrouter): resolve conflicts, wire provider_extra, fix clippy/fmt
- 7748e57 fix(openrouter): guard non-object extra_body, add early return for None
…ave (zeroclaw-labs#6099)

Two related defects in the providers resolution path produced silent
config-corruption symptoms:

1. Config CLI get/set divergence. `Config::apply_named_model_provider_profile`,
   invoked from `apply_env_overrides` during every `load_or_init`, mutated
   `self.providers.fallback` to a "canonical" key derived from the matched
   profile's `name` field, `wire_api`, or `base_url`. Because that mutation
   landed before any subsequent CLI surface, `zeroclaw config get
   providers.fallback` returned the rewritten in-memory value while the
   on-disk config kept the user's literal key. `zeroclaw config set
   providers.fallback <X>` would persist `<X>` on save, but the next load
   re-mutated it in memory, so a follow-up `get` reported a stale or
   unrelated value. The daemon then logged "providers.fallback references
   '<rewritten>' which does not exist in providers.models" and downstream
   resolution failed with WARN/ERROR routing through a non-existent
   provider key.

2. Silent vendor-default model fallback. When `fallback_provider().model`
   was unset (the routine end-state when (1) orphaned the entry), three
   sites in the runtime substituted a hardcoded vendor model identifier
   via `.unwrap_or("anthropic/claude-sonnet-4-20250514")`. Requests then
   went out as `provider="<broken-key>" model="<hardcoded-vendor>"`,
   producing 4xx responses and obscuring the real configuration error.

Fix:

- `apply_named_model_provider_profile` no longer rewrites
  `providers.fallback`. The user's literal key is preserved end-to-end so
  `config get`, `config set`, and `save()` all agree. Codex-app-server
  compatibility is preserved by mirroring the resolved entry under any
  canonical alias keys (`openai-codex` for `wire_api = "responses"`, the
  profile's own `name = "..."` field if it differs from the key,
  `custom:<base_url>` when a `base_url` is set), so any callsite that
  looks providers up by either the user's key or the canonical alias
  resolves to the same entry.

- Add `ProvidersConfig::resolve_default_model()`: returns the fallback
  provider's `model` if set, otherwise the first model from any
  configured `[providers.models.*]`, otherwise `None`.

- Replace `.unwrap_or("anthropic/claude-sonnet-4-20250514")` in
  `Agent::from_config` (`agent.rs`) and `process_message` (`loop_.rs`)
  with a three-step resolution: (a) fallback provider's model, (b)
  `resolve_default_model` with a WARN log naming the surfaced model and
  the config key to set to silence the warning, (c) hard-fail with a
  clear actionable error if no provider has any model configured. The
  telemetry path in `agent::run` and the `AgentBuilder` default also stop
  emitting a real vendor model identifier — they fall back to
  `<unresolved>` / `<unconfigured>` sentinels that surface as obvious 4xx
  routing failures rather than misrouting to a real provider.

Refs zeroclaw-labs#5989 (recommended exactly this drop), addresses
the still-open zeroclaw-labs#5815 llamacpp regression, and
extends commit 321e96f (which mirrored under the canonical key but did
not address the rewrite itself).

Tests:

- New: `apply_env_overrides_preserves_user_supplied_fallback_key` —
  asserts the literal fallback key survives `apply_env_overrides`.
- New: `fallback_round_trips_through_load_apply_serialize` — full
  TOML-deserialize → apply_env_overrides → serialize round trip.
- New: `set_prop_then_get_prop_round_trips_for_fallback` — direct CLI
  surface contract.
- New: `resolve_default_model_prefers_fallback_then_first_available` —
  helper-function semantics.
- Updated: `model_provider_profile_maps_to_custom_endpoint` and
  `model_provider_profile_responses_uses_openai_codex_and_openai_key`
  now assert that the literal fallback survives and the canonical alias
  is mirrored under `providers.models`, instead of asserting the rewrite.

`cargo fmt --all -- --check`, `cargo clippy -p zeroclaw-config -p
zeroclaw-runtime --all-targets -- -D warnings`, and `cargo test -p
zeroclaw-config --lib` all clean. `cargo test -p zeroclaw-runtime --lib`
shows the same two pre-existing failures as upstream master
(`cron::scheduler::tests::persist_job_result_delivery_stubbed_succeeds`,
`onboard::wizard::tests::backend_key_from_choice_maps_supported_backends`),
unrelated to this change.

Signed-off-by: Jason Perlow <jperlow@gmail.com>
…derConfig (zeroclaw-labs#6380)

Groq's blanket `.without_native_tools()` (zeroclaw-labs#5848) was added because
llama-family Groq models reject native tool calls with HTTP 400. The
disable applies to every Groq model, including ones that do support
native tool calling.

Add a `native_tools: Option<bool>` field to `ModelProviderConfig` that
operators can set per-profile to override the default. Plumb the value
through `ProviderRuntimeOptions` so the Groq factory branch sees it,
and skip the unconditional `.without_native_tools()` when the operator
opts in via `[providers.models.<alias>] native_tools = true`.

Default behavior is preserved exactly (text-fallback). The override is
tri-state — `None` honors the provider default, `Some(true)` forces
native on, `Some(false)` is explicit text-fallback. Currently only the
Groq factory consults the option; other providers default to enabled
and have no need for it yet.

Closes zeroclaw-labs#5932.
…law-labs#6357)

Adds optional `pricing: Option<ModelPricing>` to `ModelProviderConfig` so
operators can pin per-provider input/output rates when the same model is
served by different providers at different costs. Cost tracking previously
consulted only `[cost.prices.<key>]`, which keys on `<provider>/<model>`
strings and could not disambiguate two providers serving the same model
name at different rates.

`Config::combined_pricing()` returns a merged map: top-level entries take
precedence on exact-key conflict (operator overrides are never silently
shadowed); per-provider pricing fills `<provider_id>/<model>` slots
top-level did not already cover. Wired through both production cost-context
build sites (`crates/zeroclaw-runtime/src/agent/loop_.rs`,
`crates/zeroclaw-channels/src/orchestrator/mod.rs`).

Lookup-order fix in `record_tool_loop_cost_usage`: flipped from
`bare → qualified → suffix` to `qualified → bare → suffix` so per-provider
pricing actually wins when both keys are present. Bare-only and suffix-only
paths still resolve unchanged for operators who have not adopted
per-provider pricing.

Adds `PropKind::Object` (struct-shaped scalar, distinct from `String` and
`ObjectArray`) and classifies `ModelPricing` as `Object`. The dashboard,
JSON-patch, and CLI path round-trip
`providers.models.<id>.pricing = {input, output}` through `Config::set_prop`
into a typed `ModelPricing` instead of failing the serde round-trip on a
TOML string. `parse_prop_value` lands the JSON object as a TOML inline
table, `coerce_for_set_prop` rejects non-object payloads at the wire
boundary, and the onboard TUI gets a JSON-text fallback.

Schema is additive (`Option<T>` defaults to `None`,
`skip_serializing_if = "Option::is_none"`); existing serialised configs
are unchanged on round-trip.

Closes zeroclaw-labs#6251

Co-authored-by: Shane Engelman <contact@shane.gg>
…aster

- Remove stale ProvidersConfig impl (replaced by typed ModelProviders)
- Remove ensure_fallback_provider (old provider lookup architecture)
- Remove duplicate native_tools field in ModelProviderConfig
- Remove duplicate PropKind::Object match arm in helpers.rs
- Replace tracing::warn! with zeroclaw_log::record! (project convention)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tifact

Old extra_body tests using OpenRouterProvider (pre-rename) were
replayed during rebase on top of the current tests using
OpenRouterModelProvider, causing duplicate definition errors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ross config reloads

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…to-provider flow

Three focused checks:
- provider_runtime_options_from_config_propagates_tls_ca_cert_path: asserts that
  tls_ca_cert_path on a ModelProviderConfig entry reaches ModelProviderRuntimeOptions
  via model_provider_runtime_options_from_model_provider_entry (config/runtime boundary)
- with_tls_ca_cert_path_missing_file_leaves_pem_none: non-existent path must not
  panic — tls_ca_cert_pem stays None (fall back to system roots)
- with_tls_ca_cert_path_invalid_pem_stores_bytes_and_http_client_still_builds: invalid
  PEM bytes are stored (read succeeds), http_client() builds without panic (warn path)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…cert_path

Address two suggestions from tidux and Audacity88:
- Add #[tab(Connection)] so the field is classified in generated config-editor surfaces
- Expand doc comment to note that ~ shell expansion is not performed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@Audacity88 Audacity88 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a re-review of the custom CA support after the branch refresh. I focused on whether the earlier review blockers are still live on current head 1834b1ad.

I checked the live PR state, current origin/master, the prior reviews from @WareWolf-MoonWall and @tidux, my earlier review comment, the current four-file diff, and the refreshed PR body / CI state.

✅ Resolved — stale branch and Ollama rebase blocker look cleared

The current PR diff is back to the intended provider/config scope:

  • crates/zeroclaw-config/src/schema.rs
  • crates/zeroclaw-providers/src/compatible.rs
  • crates/zeroclaw-providers/src/factory.rs
  • crates/zeroclaw-providers/src/lib.rs

I no longer see the old onboarding/runtime hunk or the temporary ollama.rs rebase fix in the current diff. GitHub no longer reports the earlier dirty/conflicting state; the PR is now blocked by review state. The latest visible CI is green.

✅ Resolved — TLS CA regression coverage was added

My earlier warning about coverage is addressed. The current diff includes a config-to-provider regression for tls_ca_cert_path propagation into ModelProviderRuntimeOptions, plus provider-side tests for missing certificate files and invalid PEM bytes. That covers the boundaries I wanted guarded for this high-risk TLS config path.

✅ Resolved — config-editor grouping and path semantics are now explicit

The tls_ca_cert_path schema field now has #[tab(Connection)], and the doc comment states that the path must be absolute and that shell expansion such as ~ is not performed. That resolves the two small user-facing/schema-surface suggestions I had carried forward.

🟢 What looks good — provider-side shape is still right

The main design still looks sound to me: the CA file is read once at provider construction, PEM parse failures warn and fall back to system roots, default TLS behavior is unchanged when the field is unset, and the feature does not add an insecure TLS-disable path. That is the right shape for custom internal/corporate PKI support.

Verdict

--comment from me. I do not want to approve over @tidux's active CHANGES_REQUESTED review, but from my side the items I had open now look addressed on the current head. I think this is ready for @tidux to re-review the current branch.

@tidux tidux left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All prior blockers are resolved on the current head; the two carried-forward 🔵s are now implemented in the diff.


✅ Resolved — prior REQUEST_CHANGES from me (ollama.rs regression + stale Validation Evidence)

The current diff (1834b1a) is back to the intended four-file provider/config scope:

  • crates/zeroclaw-config/src/schema.rs
  • crates/zeroclaw-providers/src/compatible.rs
  • crates/zeroclaw-providers/src/factory.rs
  • crates/zeroclaw-providers/src/lib.rs

No ollama.rs rebase-fix hunk, no runtime/onboard hunk. The branch is now a clean additive change on top of current master. GitHub reports mergeable_state: DIRTY only because of the prior review state, not because of code conflicts.

✅ Resolved — @Audacity88's 🟡 (regression coverage for config→provider path)

The three tests added at 4ec7bb0 + compatible.rs are present and pass locally:

  • provider_runtime_options_from_config_propagates_tls_ca_cert_path (lib.rs)
  • with_tls_ca_cert_path_missing_file_leaves_pem_none (compatible.rs)
  • with_tls_ca_cert_path_invalid_pem_stores_bytes_and_http_client_still_builds (compatible.rs)

All three exercise the exact boundaries that matter for a risk: high TLS config path.

✅ Resolved — both 🔵s I carried forward are now in the diff

  • tls_ca_cert_path schema field now carries #[tab(Connection)] (schema.rs:768)
  • Doc comment explicitly states: "Must be an absolute path — shell expansion (e.g. ~) is not performed." (schema.rs:9-11)

🟢 What looks good — provider-side design is well-shaped

Recapping the core invariants that remain unchanged from prior reviews:

  • PEM bytes are materialised at construction in with_tls_ca_cert_path (compatible.rs:56-68) — http_client() only does an in-memory Certificate::from_pem on the hot path. No per-request disk I/O.
  • Both failure boundaries (std::fs::read at construction, Certificate::from_pem inside http_client()) WARN with structured attrs (path, error) and fall back to system roots. The right behaviour per FND-005's "graceful degradation" norm.
  • tls_ca_cert_path: Option<String> with #[serde(default, skip_serializing_if = "Option::is_none")] keeps every existing config.toml byte-for-byte compatible. The feature activates only when the operator opts in.

🟢 What looks good — test coverage matches the risk band

Three focused regression tests + the broad cargo test -p zeroclaw-providers --lib ollama (68 passed) give the right guardrails for a high-risk TLS path without requiring a live corporate-proxy fixture in CI.

Template completeness

  • Summary ✓ | Validation evidence ✓ | Security & privacy ✓ | Compatibility ✓
  • Rollback ✓ (single revert of the four-file change)
  • risk: high label matches the live auto-label and the PR body Label Snapshot
  • feat: prefix — milestone alignment already handled in prior session (v0.8.1, #6970)

Bottom line: This is now a clean, well-tested, well-documented additive feature. All prior review feedback is addressed. Approving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

config Auto scope: src/config/** changed. enhancement New feature or request provider:compatible Auto module: provider/compatible changed. provider Auto scope: src/providers/** changed. risk: high Auto risk: security/runtime/gateway/tools/workflows. size: M Auto size: 251-500 non-doc changed lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Add support for local CA certificates for custom inference provider

7 participants