fix(security): warn at startup when 3P provider + permissive mode skips classifier (#244)#1144
fix(security): warn at startup when 3P provider + permissive mode skips classifier (#244)#11440xghost42 wants to merge 1 commit into
Conversation
…ps classifier modelSupportsAutoMode() returns false for non-firstParty providers when USER_TYPE !== 'ant' (utils/betas.ts:166), so the AI-based tool-call classifier never runs in that configuration. acceptEdits, bypassPermissions, and --dangerously-skip-permissions all auto-allow tool calls — combining them with a third-party provider means tool calls are gated only by static pattern checks, with no visible signal that the AI safety review layer is absent. Print a visible startup warning when the relevant conditions are all true so users can make an informed call before pointing a small/local 3P model at an untrusted codebase. Static pattern checks (bashSecurity, path constraints, dangerous-removal-paths, etc.) still run for all providers. Tracks issue Gitlawb#244, finding 1.
|
this would have major conflict with #1110 |
|
hi @jatmn thanks for flagging we can pause merging of this and discuss first |
techbrewboss
left a comment
There was a problem hiding this comment.
Review summary
I did not find code-level blockers in this PR. The change is narrowly scoped to a startup warning in src/setup.ts, and the condition matches the current external-provider auto-mode gate in modelSupportsAutoMode(): external users on non-firstParty providers cannot use the transcript classifier, while acceptEdits, bypassPermissions, and --dangerously-skip-permissions still allow permissive tool execution paths.
Given the existing maintainer comment that this has a major conflict with #1110, I would not treat this as independently merge-ready until the permission-mode direction in #1110 is settled. In particular, #1110 adds fullAccess and centralizes dangerous permission-mode transitions, so this warning likely needs to be reconciled with that newer permission-mode model rather than merged as a standalone startup check.
Findings
None from this PR diff.
Validation
Reviewed the PR metadata, src/setup.ts diff, provider detection in src/utils/model/providers.ts, the modelSupportsAutoMode() gate in src/utils/betas.ts, nearby permission-mode behavior, and the relevant #1110 context. CI is green for this PR (smoke-and-tests, web).
Recommendation: comment / needs maintainer decision. The implementation is low-risk on its own, but the overlap with #1110 should be resolved before merging.
BlockersNone. Non-Blocking
Looks Good
Verdict: Approve — clean security warning, no blockers. |
Vasanthdev2004
left a comment
There was a problem hiding this comment.
Clean security warning. No blockers.
gnanam1990
left a comment
There was a problem hiding this comment.
Thanks for tackling Finding 1 of #244 — the threat model is spot on, and I really like that this is purely additive (classifier behavior, static checks, and provider routing are all untouched). Close to mergeable; one change needed:
The process.env.USER_TYPE !== 'ant' clause is a blocker for us. We're deliberately stripping Anthropic fingerprint checks out of the codebase, and this reintroduces one (just reversed). It's also redundant here — getAPIProvider() !== 'firstParty' already scopes the warning to exactly the users we want to reach. Could you drop the USER_TYPE condition so the check reads purely as "3P provider + permissive mode"?
Separately, per the thread with @jatmn this overlaps #1110 (which reworks permission modes / adds fullAccess). Let's get the #1110 direction settled and reconcile this warning against that model before merge. A small unit test over the condition matrix would also be a welcome addition. Happy to re-review quickly once the fingerprint clause is gone — thanks for the solid work here.
|
Following up on my standing review — no new commits since, so the blocker still stands: the |
Summary
Addresses finding 1 of #244.
modelSupportsAutoMode()short-circuits tofalsefor non-firstPartyproviders whenUSER_TYPE !== 'ant'(src/utils/betas.ts:166), so the AI-based tool-call classifier never runs in that configuration.acceptEdits,bypassPermissions, and--dangerously-skip-permissionsall auto-allow tool calls — combining them with a third-party provider means tool calls are gated only by static pattern checks, with no visible signal that the AI safety review layer is absent.This is exactly the split-brain risk @auriti flagged in the issue: first-party Claude models are trained to resist prompt injection; a small/local 3P model has no such guarantee, so a crafted payload in a codebase can emit dangerous commands that pass the static pattern checks.
Change
In
src/setup.ts, after the existing root/sudo bypass-permissions safety check, print achalk.yellowwarning to stderr when all of the following are true:process.env.USER_TYPE !== 'ant'(external user — same gatemodelSupportsAutoModeuses)getAPIProvider() !== 'firstParty'(third-party provider)acceptEdits,bypassPermissions, or--dangerously-skip-permissionsWhat this does NOT change
bashSecurity, path traversal, dangerous-removal-paths, sandbox enforcement) — still run for all providers, unchanged.defaultpermission mode — no warning, since tool calls still prompt.USER_TYPE === 'ant') — no warning, since the classifier is available.Test plan
bash -n-style review of the conditional — no syntax issues.logoV2Utils.js→model/providers.js→nativeInstaller/index.js).bun testnot run locally (bun not installed on this machine); CI will run it.The change is one self-contained branch with three short conditions and a
console.warncall, no shared-state touches, no test fixtures to update.Related
--dangerously-skip-permissionssandbox gate was employee-only) appears to have been refactored out ofsrc/setup.ts; this PR intentionally leaves it alone.