fix(security): warn at startup when 3P provider + permissive mode skips classifier (#244) by 0xghost42 · Pull Request #1144 · Gitlawb/openclaude

0xghost42 · 2026-05-13T10:22:50Z

Summary

Addresses finding 1 of #244.

modelSupportsAutoMode() short-circuits to false for non-firstParty providers when USER_TYPE !== 'ant' (src/utils/betas.ts:166), so the AI-based tool-call classifier never runs in that configuration. acceptEdits, bypassPermissions, and --dangerously-skip-permissions all auto-allow tool calls — combining them with a third-party provider means tool calls are gated only by static pattern checks, with no visible signal that the AI safety review layer is absent.

This is exactly the split-brain risk @auriti flagged in the issue: first-party Claude models are trained to resist prompt injection; a small/local 3P model has no such guarantee, so a crafted payload in a codebase can emit dangerous commands that pass the static pattern checks.

Change

In src/setup.ts, after the existing root/sudo bypass-permissions safety check, print a chalk.yellow warning to stderr when all of the following are true:

process.env.USER_TYPE !== 'ant' (external user — same gate modelSupportsAutoMode uses)
getAPIProvider() !== 'firstParty' (third-party provider)
permissive mode is active: acceptEdits, bypassPermissions, or --dangerously-skip-permissions

What this does NOT change

Static pattern checks (bashSecurity, path traversal, dangerous-removal-paths, sandbox enforcement) — still run for all providers, unchanged.
First-party / Bedrock / Vertex / Foundry behavior — unchanged.
default permission mode — no warning, since tool calls still prompt.
ANT internal builds (USER_TYPE === 'ant') — no warning, since the classifier is available.

Test plan

bash -n-style review of the conditional — no syntax issues.
Import order matches the surrounding alphabetical block (logoV2Utils.js → model/providers.js → nativeInstaller/index.js).
bun test not run locally (bun not installed on this machine); CI will run it.

The change is one self-contained branch with three short conditions and a console.warn call, no shared-state touches, no test fixtures to update.

Finding 3 of security: 3P providers lack AI safety classifier and sandbox enforcement #244 already landed in fix: run dangerous path check before auto-allowing rm/rmdir in acceptEdits mode #246.
Finding 2 (--dangerously-skip-permissions sandbox gate was employee-only) appears to have been refactored out of src/setup.ts; this PR intentionally leaves it alone.

…ps classifier modelSupportsAutoMode() returns false for non-firstParty providers when USER_TYPE !== 'ant' (utils/betas.ts:166), so the AI-based tool-call classifier never runs in that configuration. acceptEdits, bypassPermissions, and --dangerously-skip-permissions all auto-allow tool calls — combining them with a third-party provider means tool calls are gated only by static pattern checks, with no visible signal that the AI safety review layer is absent. Print a visible startup warning when the relevant conditions are all true so users can make an informed call before pointing a small/local 3P model at an untrusted codebase. Static pattern checks (bashSecurity, path constraints, dangerous-removal-paths, etc.) still run for all providers. Tracks issue Gitlawb#244, finding 1.

jatmn · 2026-05-13T14:26:02Z

this would have major conflict with #1110
@kevincodex1 Neither of these can really be merged without further discussion

kevincodex1 · 2026-05-13T14:59:51Z

hi @jatmn thanks for flagging we can pause merging of this and discuss first

techbrewboss

Review summary

I did not find code-level blockers in this PR. The change is narrowly scoped to a startup warning in src/setup.ts, and the condition matches the current external-provider auto-mode gate in modelSupportsAutoMode(): external users on non-firstParty providers cannot use the transcript classifier, while acceptEdits, bypassPermissions, and --dangerously-skip-permissions still allow permissive tool execution paths.

Given the existing maintainer comment that this has a major conflict with #1110, I would not treat this as independently merge-ready until the permission-mode direction in #1110 is settled. In particular, #1110 adds fullAccess and centralizes dangerous permission-mode transitions, so this warning likely needs to be reconciled with that newer permission-mode model rather than merged as a standalone startup check.

Findings

None from this PR diff.

Validation

Reviewed the PR metadata, src/setup.ts diff, provider detection in src/utils/model/providers.ts, the modelSupportsAutoMode() gate in src/utils/betas.ts, nearby permission-mode behavior, and the relevant #1110 context. CI is green for this PR (smoke-and-tests, web).

Recommendation: comment / needs maintainer decision. The implementation is low-risk on its own, but the overlap with #1110 should be resolved before merging.

Vasanthdev2004 · 2026-05-16T09:59:46Z

Blockers

None.

Non-Blocking

Author couldn't run tests locally (no bun). Acceptable for a simple conditional warning — CI will catch issues.

Looks Good

25 lines, zero deletions — minimal, focused change
Correct conditional logic matching modelSupportsAutoMode() behavior
Clear inline comment explaining the rationale and tracking the issue
Proper lint suppression with explanation
Import in alphabetical order
Doesn't change any behavior — just makes the absent safety layer visible
Security enhancement that helps users understand risk when using 3P providers with permissive modes

Verdict: Approve — clean security warning, no blockers.

Vasanthdev2004

Clean security warning. No blockers.

gnanam1990

Thanks for tackling Finding 1 of #244 — the threat model is spot on, and I really like that this is purely additive (classifier behavior, static checks, and provider routing are all untouched). Close to mergeable; one change needed:

The process.env.USER_TYPE !== 'ant' clause is a blocker for us. We're deliberately stripping Anthropic fingerprint checks out of the codebase, and this reintroduces one (just reversed). It's also redundant here — getAPIProvider() !== 'firstParty' already scopes the warning to exactly the users we want to reach. Could you drop the USER_TYPE condition so the check reads purely as "3P provider + permissive mode"?

Separately, per the thread with @jatmn this overlaps #1110 (which reworks permission modes / adds fullAccess). Let's get the #1110 direction settled and reconcile this warning against that model before merge. A small unit test over the condition matrix would also be a welcome addition. Happy to re-review quickly once the fingerprint clause is gone — thanks for the solid work here.

gnanam1990 · 2026-05-17T07:32:12Z

Following up on my standing review — no new commits since, so the blocker still stands: the process.env.USER_TYPE !== 'ant' clause reintroduces an Anthropic fingerprint check (just reversed), which we're deliberately stripping out of the codebase. It's also redundant — getAPIProvider() !== 'firstParty' already scopes the warning to exactly the third-party users you want to reach. Dropping the USER_TYPE condition resolves it. The threat model and the purely-additive approach are solid otherwise. Still worth reconciling against #1110 (permission-mode rework / fullAccess) before merge, and a small unit test over the condition matrix would be welcome. Happy to re-review quickly once the fingerprint clause is gone.

kevincodex1 approved these changes May 13, 2026

View reviewed changes

kevincodex1 requested review from Vasanthdev2004, gnanam1990, jatmn and techbrewboss May 13, 2026 10:37

techbrewboss reviewed May 13, 2026

View reviewed changes

Vasanthdev2004 approved these changes May 16, 2026

View reviewed changes

gnanam1990 requested changes May 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(security): warn at startup when 3P provider + permissive mode skips classifier (#244)#1144

fix(security): warn at startup when 3P provider + permissive mode skips classifier (#244)#1144
0xghost42 wants to merge 1 commit into
Gitlawb:mainfrom
0xghost42:fix/244-3p-safety-classifier-warning

0xghost42 commented May 13, 2026

Uh oh!

jatmn commented May 13, 2026

Uh oh!

kevincodex1 commented May 13, 2026

Uh oh!

techbrewboss left a comment

Uh oh!

Vasanthdev2004 commented May 16, 2026

Uh oh!

Vasanthdev2004 left a comment

Uh oh!

gnanam1990 left a comment

Uh oh!

gnanam1990 commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

0xghost42 commented May 13, 2026

Summary

Change

What this does NOT change

Test plan

Related

Uh oh!

jatmn commented May 13, 2026

Uh oh!

kevincodex1 commented May 13, 2026

Uh oh!

techbrewboss left a comment

Choose a reason for hiding this comment

Review summary

Findings

Validation

Uh oh!

Vasanthdev2004 commented May 16, 2026

Blockers

Non-Blocking

Looks Good

Uh oh!

Vasanthdev2004 left a comment

Choose a reason for hiding this comment

Uh oh!

gnanam1990 left a comment

Choose a reason for hiding this comment

Uh oh!

gnanam1990 commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants