Skip to content

fix(security): sanitize secrets in user-provided logs before LLM injection#268

Open
Adar5 wants to merge 4 commits intojenkinsci:mainfrom
Adar5:fix/sanitize-log-secrets
Open

fix(security): sanitize secrets in user-provided logs before LLM injection#268
Adar5 wants to merge 4 commits intojenkinsci:mainfrom
Adar5:fix/sanitize-log-secrets

Conversation

@Adar5
Copy link
Contributor

@Adar5 Adar5 commented Mar 10, 2026

Description

Fixes #265

This PR addresses a security vulnerability where user-pasted Jenkins build logs were injected directly into the LLM prompt without sanitization. Although sanitize_logs() existed in the codebase, it was not being utilized in the production pipeline.

Changes

  • Imported sanitize_logs from api.tools.sanitizer into prompt_builder.py.
  • Updated build_prompt() to process log_context through the sanitizer before string interpolation.

Verification Results

I verified this fix locally by creating a test script that passed a "poisoned" log containing mock AWS keys and passwords.

Before fix: Secrets were visible in the generated prompt string.
After fix: All sensitive patterns were successfully replaced with [REDACTED] or [REDACTED_AWS_KEY].

Checklist

  • Code follows the style guidelines of this project
  • Verified that sanitize_logs handles common secret patterns (AWS, Passwords, Tokens)
  • Local tests passed

@Adar5 Adar5 requested a review from a team as a code owner March 10, 2026 07:12
Copy link
Contributor

@berviantoleo berviantoleo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't mix with unrelated changes

@Adar5 Adar5 force-pushed the fix/sanitize-log-secrets branch from 38a0464 to 6451d3d Compare March 10, 2026 07:34
@Adar5
Copy link
Contributor Author

Adar5 commented Mar 10, 2026

@berviantoleo Apologies for the confusion! I had a branch management issue on my end that pulled in some previous experimental commits. I have just force-pushed a clean version of the branch. The PR now contains only the sanitize_logs implementation as requested

@sharma-sugurthi
Copy link
Contributor

verified the fix logic,sanitize_logs() is correctly called before log_context is interpolated into the prompt string, so secrets are redacted before reaching the LLM.

i think one thing worth checking is the LOG_ANALYSIS_INSTRUCTION format in prompts.py also injects {log_data} separately,is that path also sanitized, or does it only go through build_prompt()?
if _generate_search_query_from_logs() in chat_service.py calls the LLM with raw log text, that's a separate unsanitized path.

otherwise the change looks clean....

@berviantoleo
Copy link
Contributor

berviantoleo commented Mar 10, 2026

I prefer the producer to sanitize it, instead of the consumer. In addition, needs to provide either unit/integration test.

@sharma-sugurthi
Copy link
Contributor

@Adar5
@berviantoleo 's feedback makes sense architecturally,the sanitization should happen in chat_service.py right after LOG_ANALYSIS_PATTERN extracts the log text, before it is passed as log_context to build_prompt().that way prompt_builder.py stays clean as a pure formatting layer

and for the test, you can add a case in test_chat_service.py that verifies secrets in a userpasted log are redacted before reaching the prompt string
happy to review the updated version...

@Yugansh5013
Copy link
Contributor

agree with @berviantoleo on the producer-side sanitization and just to add some context, sanitize_logs() was introduced in PR #89 where the log scraping flow was first set up. The way it was designed, getConsoleLogContext() on the frontend scrapes the raw logs and passes them straight into the prompt, so ideally sanitize_logs() should be called right there at that entry point before the data goes anywhere. That way build_prompt(), _generate_search_query_from_logs(), and LOG_ANALYSIS_INSTRUCTION all naturally receive clean data without each needing to handle it individually.

@Adar5
Copy link
Contributor Author

Adar5 commented Mar 10, 2026

@berviantoleo @sharma-sugurthi @Yugansh5013 Thanks for the excellent architectural feedback! Moving the sanitisation to the producer side in chat_service.py makes complete sense, so we don't leak raw data into other paths like _generate_search_query_from_logs().

I'll revert the change in prompt_builder.py, implement the sanitiser at the entry point in chat_service.py, and add the requested unit test. I'll push the updated commits shortly!

@Adar5 Adar5 force-pushed the fix/sanitize-log-secrets branch from 6451d3d to 24d1afd Compare March 10, 2026 12:04
@berviantoleo berviantoleo added the bug For changelog: Minor bug. Will be listed after features label Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug For changelog: Minor bug. Will be listed after features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] sanitize_logs() is never called - user-pasted secrets in build logs go raw into the LLM prompt

4 participants