Skip to content

feat(security): strict TLS mode#5603

Open
rhdedgar wants to merge 2 commits intoogx-ai:mainfrom
rhdedgar:tls_mode
Open

feat(security): strict TLS mode#5603
rhdedgar wants to merge 2 commits intoogx-ai:mainfrom
rhdedgar:tls_mode

Conversation

@rhdedgar
Copy link
Copy Markdown
Contributor

What does this PR do?

Providing an optional mode to enforce secure connections for llama-stack, using only FIPS-approved TLS cipher suites.

  • Add security_mode setting (development / production) to ServerConfig that enforces TLS with FIPS-approved cipher suites in production mode
  • Fix JWT algorithm confusion vulnerability in OAuth2 auth provider by replacing untrusted header algorithm with a static FIPS-approved
    allowlist (RS256/PS256/ES256 families)
  • Fix introspection endpoint defaulting to ssl_ctxt=False (no TLS verification) — now defaults to system CA verification (True)
  • Add HSTS header middleware for all HTTPS responses
  • Add --security-mode and --insecure CLI flags to override config at startup
  • Add production-mode guard that rejects verify_tls=False in auth provider config

Closes RHAIENG-1865

If accepted, I expect to open a couple follow-up PRs for things like backend URL validation and HTTP-to-HTTPS redirect + some remaining config hardening. I split this effort into a couple segments to make it a more manageable size to review at one time.

Test Plan

  • Unit tests added for SecurityMode validation, HSTS middleware, FIPS cipher constants, and JWT algorithm allowlist
  • Existing OAuth2/JWKS tests updated to use RSA key pairs instead of HS256 symmetric keys
  • uv run pytest tests/unit/server/test_tls_config.py tests/unit/server/test_auth.py -x

Additional manual checks

  • Pre-commit checks looked good
  • The PR checks ran against a PR on my own fork without errors.
  • I tested against an OpenShift cluster with a custom build of llama-stack-k8s-operator that was configured specifically to use this new TLS-only mode, and it worked as expected.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 21, 2026
Copy link
Copy Markdown
Contributor

@skamenan7 skamenan7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice security features, Doug! Looks good except these comments.

Comment thread src/ogx/cli/stack/run.py

# Apply CLI security mode overrides BEFORE model validation,
# so --insecure can downgrade production mode without triggering the validator.
if args.insecure or args.security_mode:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems CLI overrides get lost along the way. _run_stack_run_cmd() validates the original file before this branch, and then create_app() re-reads LLAMA_STACK_CONFIG from disk anyway. so --insecure can't actually rescue a production config missing certs, and --security-mode production never reaches the verify_tls=False guard on the app side. happy to walk through the call chain if that's helpful.

Comment thread src/ogx/core/datatypes.py
raise ValueError(
"Production security mode requires TLS: set 'tls_certfile' and 'tls_keyfile' in server config, "
"or use '--insecure' / security_mode='development' to run without TLS."
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current validator leaves a hole in the FIPS-only guarantee. We only auto-fill the allowlist when tls_config.ciphers is None, and _uvicorn_run() forwards any explicit list as-is. Would it make sense to validate that production-mode ciphers are a non-empty subset of FIPS_APPROVED_CIPHERS, or reject custom overrides entirely in production?

Comment thread tests/unit/server/test_tls_config.py Outdated


class TestProductionVerifyTlsFalse:
def test_production_mode_rejects_verify_tls_false(self):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test doesn't exercise the production startup guard. it builds an OAuth2TokenAuthConfig and asserts verify_tls stayed False, but never calls create_app(), so deleting the SystemExit branch in server.py would still leave it green. worth driving create_app() or extracting that guard into a testable helper so the test covers the real failure path.

Signed-off-by: Doug Edgar <dedgar@redhat.com>
Signed-off-by: Doug Edgar <dedgar@redhat.com>
@rhdedgar
Copy link
Copy Markdown
Contributor Author

Ok, I've added a new commit, which should address the comments on original commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants