Skip to content

fix(security): warn at startup when 3P provider + permissive mode skips classifier (#244)#1144

Open
0xghost42 wants to merge 1 commit into
Gitlawb:mainfrom
0xghost42:fix/244-3p-safety-classifier-warning
Open

fix(security): warn at startup when 3P provider + permissive mode skips classifier (#244)#1144
0xghost42 wants to merge 1 commit into
Gitlawb:mainfrom
0xghost42:fix/244-3p-safety-classifier-warning

Conversation

@0xghost42
Copy link
Copy Markdown
Contributor

Summary

Addresses finding 1 of #244.

modelSupportsAutoMode() short-circuits to false for non-firstParty providers when USER_TYPE !== 'ant' (src/utils/betas.ts:166), so the AI-based tool-call classifier never runs in that configuration. acceptEdits, bypassPermissions, and --dangerously-skip-permissions all auto-allow tool calls — combining them with a third-party provider means tool calls are gated only by static pattern checks, with no visible signal that the AI safety review layer is absent.

This is exactly the split-brain risk @auriti flagged in the issue: first-party Claude models are trained to resist prompt injection; a small/local 3P model has no such guarantee, so a crafted payload in a codebase can emit dangerous commands that pass the static pattern checks.

Change

In src/setup.ts, after the existing root/sudo bypass-permissions safety check, print a chalk.yellow warning to stderr when all of the following are true:

  • process.env.USER_TYPE !== 'ant' (external user — same gate modelSupportsAutoMode uses)
  • getAPIProvider() !== 'firstParty' (third-party provider)
  • permissive mode is active: acceptEdits, bypassPermissions, or --dangerously-skip-permissions

What this does NOT change

  • Static pattern checks (bashSecurity, path traversal, dangerous-removal-paths, sandbox enforcement) — still run for all providers, unchanged.
  • First-party / Bedrock / Vertex / Foundry behavior — unchanged.
  • default permission mode — no warning, since tool calls still prompt.
  • ANT internal builds (USER_TYPE === 'ant') — no warning, since the classifier is available.

Test plan

  • bash -n-style review of the conditional — no syntax issues.
  • Import order matches the surrounding alphabetical block (logoV2Utils.jsmodel/providers.jsnativeInstaller/index.js).
  • bun test not run locally (bun not installed on this machine); CI will run it.

The change is one self-contained branch with three short conditions and a console.warn call, no shared-state touches, no test fixtures to update.

Related

…ps classifier

modelSupportsAutoMode() returns false for non-firstParty providers when
USER_TYPE !== 'ant' (utils/betas.ts:166), so the AI-based tool-call
classifier never runs in that configuration. acceptEdits, bypassPermissions,
and --dangerously-skip-permissions all auto-allow tool calls — combining
them with a third-party provider means tool calls are gated only by static
pattern checks, with no visible signal that the AI safety review layer is
absent.

Print a visible startup warning when the relevant conditions are all true
so users can make an informed call before pointing a small/local 3P model
at an untrusted codebase. Static pattern checks (bashSecurity, path
constraints, dangerous-removal-paths, etc.) still run for all providers.

Tracks issue Gitlawb#244, finding 1.
@jatmn
Copy link
Copy Markdown
Collaborator

jatmn commented May 13, 2026

this would have major conflict with #1110
@kevincodex1 Neither of these can really be merged without further discussion

@kevincodex1
Copy link
Copy Markdown
Contributor

hi @jatmn thanks for flagging we can pause merging of this and discuss first

Copy link
Copy Markdown
Collaborator

@techbrewboss techbrewboss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary

I did not find code-level blockers in this PR. The change is narrowly scoped to a startup warning in src/setup.ts, and the condition matches the current external-provider auto-mode gate in modelSupportsAutoMode(): external users on non-firstParty providers cannot use the transcript classifier, while acceptEdits, bypassPermissions, and --dangerously-skip-permissions still allow permissive tool execution paths.

Given the existing maintainer comment that this has a major conflict with #1110, I would not treat this as independently merge-ready until the permission-mode direction in #1110 is settled. In particular, #1110 adds fullAccess and centralizes dangerous permission-mode transitions, so this warning likely needs to be reconciled with that newer permission-mode model rather than merged as a standalone startup check.

Findings

None from this PR diff.

Validation

Reviewed the PR metadata, src/setup.ts diff, provider detection in src/utils/model/providers.ts, the modelSupportsAutoMode() gate in src/utils/betas.ts, nearby permission-mode behavior, and the relevant #1110 context. CI is green for this PR (smoke-and-tests, web).

Recommendation: comment / needs maintainer decision. The implementation is low-risk on its own, but the overlap with #1110 should be resolved before merging.

@Vasanthdev2004
Copy link
Copy Markdown
Collaborator

Blockers

None.

Non-Blocking

  • Author couldn't run tests locally (no bun). Acceptable for a simple conditional warning — CI will catch issues.

Looks Good

  • 25 lines, zero deletions — minimal, focused change
  • Correct conditional logic matching modelSupportsAutoMode() behavior
  • Clear inline comment explaining the rationale and tracking the issue
  • Proper lint suppression with explanation
  • Import in alphabetical order
  • Doesn't change any behavior — just makes the absent safety layer visible
  • Security enhancement that helps users understand risk when using 3P providers with permissive modes

Verdict: Approve — clean security warning, no blockers.

Copy link
Copy Markdown
Collaborator

@Vasanthdev2004 Vasanthdev2004 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean security warning. No blockers.

Copy link
Copy Markdown
Collaborator

@gnanam1990 gnanam1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling Finding 1 of #244 — the threat model is spot on, and I really like that this is purely additive (classifier behavior, static checks, and provider routing are all untouched). Close to mergeable; one change needed:

The process.env.USER_TYPE !== 'ant' clause is a blocker for us. We're deliberately stripping Anthropic fingerprint checks out of the codebase, and this reintroduces one (just reversed). It's also redundant here — getAPIProvider() !== 'firstParty' already scopes the warning to exactly the users we want to reach. Could you drop the USER_TYPE condition so the check reads purely as "3P provider + permissive mode"?

Separately, per the thread with @jatmn this overlaps #1110 (which reworks permission modes / adds fullAccess). Let's get the #1110 direction settled and reconcile this warning against that model before merge. A small unit test over the condition matrix would also be a welcome addition. Happy to re-review quickly once the fingerprint clause is gone — thanks for the solid work here.

@gnanam1990
Copy link
Copy Markdown
Collaborator

Following up on my standing review — no new commits since, so the blocker still stands: the process.env.USER_TYPE !== 'ant' clause reintroduces an Anthropic fingerprint check (just reversed), which we're deliberately stripping out of the codebase. It's also redundant — getAPIProvider() !== 'firstParty' already scopes the warning to exactly the third-party users you want to reach. Dropping the USER_TYPE condition resolves it. The threat model and the purely-additive approach are solid otherwise. Still worth reconciling against #1110 (permission-mode rework / fullAccess) before merge, and a small unit test over the condition matrix would be welcome. Happy to re-review quickly once the fingerprint clause is gone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants