Redact secrets during the workflow file upload#702
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughExtracts secret redaction into Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Jobs as Jobs\n(src/utils/job/jobs.py)
participant Workflow as WorkflowObjects\n(src/service/core/workflow/objects.py)
participant Redact as RedactUtility\n(src/lib/utils/redact.py)
participant Storage as File/YAMLStorage
Jobs->>Redact: redact_pod_spec_env(pod_spec)
Redact-->>Jobs: redacted_pod_spec
Jobs->>Storage: yaml.dump(redacted_pod_spec) -> create .spec
Workflow->>Redact: redact_secrets(yaml_lines)
Redact-->>Workflow: redacted_yaml_lines
Workflow->>Storage: upload/queue redacted files
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
Reduces the risk of secrets being uploaded to object storage.
5cf0523 to
2fbdad6
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/service/core/workflow/objects.py`:
- Around line 906-908: The code incorrectly does list(workflow_spec) which
splits a string into characters before calling redact_secrets; change to pass
lines with preserved endings by using workflow_spec.splitlines(keepends=True)
and then join the output of redact_secrets back into a single string (e.g.,
workflow_spec =
''.join(redact_secrets(workflow_spec.splitlines(keepends=True)))), ensuring you
update the assignment where workflow_spec is set and keep references to the
redact_secrets function and the workflow_spec variable consistent with the test
helper behavior.
- Around line 914-915: The line calling redact_secrets wraps
original_templated_spec in list(), which splits the string into characters;
change the call so redact_secrets receives the actual lines instead of
characters — e.g., pass original_templated_spec.splitlines(keepends=True) (or
just original_templated_spec if redact_secrets accepts a whole string) and
assign original_templated_spec = next(redact_secrets(...)) accordingly; update
the call site referencing original_templated_spec and redact_secrets to remove
the incorrect list(...) wrapping.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b769a5c0-7734-4b8c-b005-e62e1b89e507
📒 Files selected for processing (8)
src/lib/utils/BUILDsrc/lib/utils/redact.pysrc/lib/utils/tests/BUILDsrc/lib/utils/tests/test_redact_secrets.pysrc/service/core/workflow/BUILDsrc/service/core/workflow/objects.pysrc/service/core/workflow/tests/BUILDsrc/service/core/workflow/workflow_service.py
💤 Files with no reviewable changes (1)
- src/service/core/workflow/tests/BUILD
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/utils/redact.py`:
- Around line 59-73: The current redact_pod_spec_env function only masks env
values based on entropy, allowing low-entropy secret values to leak; update
redact_pod_spec_env to also inspect env_entry['name'] and mask when the name
matches a secret pattern (e.g., case-insensitive keywords like "secret",
"password", "passwd", "token", "key", "credential", "aws_secret", "aws_access",
etc., or specific names like AWS_SECRET_ACCESS_KEY) in addition to the existing
entropy check, while still skipping entries that use 'valueFrom'; modify the
masking condition in redact_pod_spec_env (and use or add a helper like
_is_secret_name if helpful) so that env_entry['value'] is set to '[MASKED]' when
either the entropy threshold is exceeded or the name matches the secret-name
pattern, and update/add unit tests for redact_pod_spec_env to cover low-entropy
secret names and ensure valueFrom entries remain untouched.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 993dbbda-00f4-4b1e-85fd-07b6083ce102
📒 Files selected for processing (5)
src/lib/utils/redact.pysrc/lib/utils/tests/test_redact_secrets.pysrc/service/core/workflow/objects.pysrc/utils/job/BUILDsrc/utils/job/jobs.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/service/core/workflow/objects.py
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/lib/utils/redact.py (1)
63-74:⚠️ Potential issue | 🟠 MajorMask secret-named env vars too, not just high-entropy values.
This still lets obvious secrets like
AWS_SECRET_ACCESS_KEY=changemeorTOKEN=abc123pass through unredacted because the predicate only looks at entropy. Please fold the env var name into the masking rule as well, while still leavingvalueFromentries alone, and add a regression test for a low-entropy secret value.🔧 Proposed fix
+_SECRET_ENV_NAME_RE = re.compile( + r'(?i)(?:access|auth|api|credential|creds|key|passw(?:or)?d|secret|token)' +) + def redact_pod_spec_env(pod_spec: Dict) -> Dict: @@ for container_list_key in ('containers', 'initContainers'): for container in pod_spec.get('spec', pod_spec).get(container_list_key, []): for env_entry in container.get('env', []): - if 'value' in env_entry and \ - _shannon_entropy(env_entry['value']) > _ENTROPY_THRESHOLD: + env_name = str(env_entry.get('name', '')) + env_value = env_entry.get('value') + if isinstance(env_value, str) and ( + _SECRET_ENV_NAME_RE.search(env_name) + or _shannon_entropy(env_value) > _ENTROPY_THRESHOLD + ): env_entry['value'] = '[MASKED]'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/utils/redact.py` around lines 63 - 74, The current loop in redact.py only masks env vars based on _shannon_entropy(env_entry['value']) > _ENTROPY_THRESHOLD and ignores obvious secret names; update the masking predicate in the env processing loop (the block iterating container_list_key over 'containers'/'initContainers' and env_entry in container.get('env', [])) to also mask when the env var name matches a secret-name pattern (e.g. case-insensitive regex matching words like 'secret', 'password', 'pass', 'token', 'key', 'aws', 'cred') while still skipping entries that use 'valueFrom'; then add a regression test that asserts a low-entropy value (e.g. "changeme" or "abc123") is masked when the env var name is secret-like but not masked when the name is non-secret and entropy is low.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/utils/redact.py`:
- Around line 93-97: The exception handler around the base64 decoding (the block
that builds padded from fragment and calls base64.b64decode) must also catch
binascii.Error so malformed base64 fragments are preserved instead of raising;
update the except clause that currently lists ValueError and UnicodeDecodeError
to include binascii.Error (or import binascii and reference binascii.Error) so
the try/except around padded, decoded, and base64.b64decode(...) handles all
relevant decode errors and returns fragment on failure.
---
Duplicate comments:
In `@src/lib/utils/redact.py`:
- Around line 63-74: The current loop in redact.py only masks env vars based on
_shannon_entropy(env_entry['value']) > _ENTROPY_THRESHOLD and ignores obvious
secret names; update the masking predicate in the env processing loop (the block
iterating container_list_key over 'containers'/'initContainers' and env_entry in
container.get('env', [])) to also mask when the env var name matches a
secret-name pattern (e.g. case-insensitive regex matching words like 'secret',
'password', 'pass', 'token', 'key', 'aws', 'cred') while still skipping entries
that use 'valueFrom'; then add a regression test that asserts a low-entropy
value (e.g. "changeme" or "abc123") is masked when the env var name is
secret-like but not masked when the name is non-secret and entropy is low.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4572b569-3bd9-41ee-a097-70a04d1a144a
📒 Files selected for processing (2)
src/lib/utils/redact.pysrc/lib/utils/tests/test_redact_secrets.py
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/lib/utils/redact.py (1)
108-112:⚠️ Potential issue | 🔴 CriticalCatch
binascii.Errorin this decode fallback too.
base64.b64decode(..., validate=True)can raisebinascii.Errorfor malformed base64-looking fragments, so this handler can still abort redaction instead of preserving the original token.Suggested fix
+import binascii import base64 import collections import copy @@ - except (ValueError, UnicodeDecodeError): + except (binascii.Error, ValueError, UnicodeDecodeError): return fragment#!/bin/bash python - <<'PY' import base64 fragment = "A" * 17 padded = fragment + "=" * (-len(fragment) % 4) try: try: base64.b64decode(padded, validate=True) except (ValueError, UnicodeDecodeError): print("caught_by_current_handler") else: print("no_exception") except Exception as exc: print("escapes_current_handler", type(exc).__module__, type(exc).__name__) PY🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/utils/redact.py` around lines 108 - 112, The except block handling base64 decoding should also catch binascii.Error so malformed base64 fragments don't escape the fallback; update the except tuple for the base64.b64decode(...) call that processes padded/fragment to include binascii.Error and add an import for the binascii module if it's not already imported, leaving the rest of the logic (return fragment) unchanged.
🧹 Nitpick comments (1)
src/lib/utils/tests/test_redact_secrets.py (1)
104-116: Use non-special fixtures here so these cases actually hit the entropy path.Line 106 is currently masked by the sensitive-name rule, and Line 113 is preserved by
_NEVER_MASK_VALUES, so an entropy regression would still leave both tests green.Suggested update
def test_masks_high_entropy_secret(self): pod_spec = self._make_pod_spec( - {'name': 'app', 'env': [{'name': 'AWS_SECRET_ACCESS_KEY', 'value': _AWS_SECRET_KEY}]}, + {'name': 'app', 'env': [{'name': 'APP_CONFIG_BLOB', 'value': _AWS_SECRET_KEY}]}, ) redacted = redact_pod_spec_env(pod_spec) self.assertEqual(redacted['containers'][0]['env'][0]['value'], '[MASKED]') def test_preserves_low_entropy_value(self): pod_spec = self._make_pod_spec( - {'name': 'app', 'env': [{'name': 'ENABLE_FEATURE', 'value': 'true'}]}, + {'name': 'app', 'env': [{'name': 'APP_MODE', 'value': 'devdevdev'}]}, ) redacted = redact_pod_spec_env(pod_spec) - self.assertEqual(redacted['containers'][0]['env'][0]['value'], 'true') + self.assertEqual(redacted['containers'][0]['env'][0]['value'], 'devdevdev')🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/utils/tests/test_redact_secrets.py` around lines 104 - 116, The tests are accidentally exercising name-based and NEVER_MASK_VALUES-based rules instead of the entropy path; update the fixtures in test_masks_high_entropy_secret and test_preserves_low_entropy_value so they use neutral env names and values that don't match sensitive-name patterns or _NEVER_MASK_VALUES (e.g., use env name 'FOO' and for the high-entropy case a long random-looking string, and for the low-entropy case a short benign string not in _NEVER_MASK_VALUES), then call redact_pod_spec_env as before and assert the high-entropy value is '[MASKED]' and the low-entropy value is preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/lib/utils/redact.py`:
- Around line 108-112: The except block handling base64 decoding should also
catch binascii.Error so malformed base64 fragments don't escape the fallback;
update the except tuple for the base64.b64decode(...) call that processes
padded/fragment to include binascii.Error and add an import for the binascii
module if it's not already imported, leaving the rest of the logic (return
fragment) unchanged.
---
Nitpick comments:
In `@src/lib/utils/tests/test_redact_secrets.py`:
- Around line 104-116: The tests are accidentally exercising name-based and
NEVER_MASK_VALUES-based rules instead of the entropy path; update the fixtures
in test_masks_high_entropy_secret and test_preserves_low_entropy_value so they
use neutral env names and values that don't match sensitive-name patterns or
_NEVER_MASK_VALUES (e.g., use env name 'FOO' and for the high-entropy case a
long random-looking string, and for the low-entropy case a short benign string
not in _NEVER_MASK_VALUES), then call redact_pod_spec_env as before and assert
the high-entropy value is '[MASKED]' and the low-entropy value is preserved.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8b8cdcf2-9811-43d9-9bba-79e4ebf21bb6
📒 Files selected for processing (2)
src/lib/utils/redact.pysrc/lib/utils/tests/test_redact_secrets.py
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/lib/utils/redact.py (1)
108-112:⚠️ Potential issue | 🔴 CriticalMissing
binascii.Errorin exception handler causes unhandled exceptions.
base64.b64decode(..., validate=True)raisesbinascii.Errorfor malformed base64 input (e.g., invalid characters), but the exception handler only catchesValueErrorandUnicodeDecodeError. This causes unhandled exceptions instead of preserving the fragment as-is.Proposed fix
Add the import at the top of the file:
+import binascii import base64 import collectionsUpdate the exception handler:
try: padded = fragment + '=' * (-len(fragment) % 4) decoded = base64.b64decode(padded, validate=True).decode('utf-8') - except (ValueError, UnicodeDecodeError): + except (binascii.Error, ValueError, UnicodeDecodeError): return fragment#!/bin/bash # Verify that binascii.Error is raised for invalid base64 input python3 - <<'PY' import base64 import binascii # Test with invalid base64 characters (e.g., '!' is not valid) fragment = "A!AAAAAAAAAAAAAAA" padded = fragment + "=" * (-len(fragment) % 4) try: base64.b64decode(padded, validate=True) print("ERROR: Expected exception was not raised") except binascii.Error as e: print(f"binascii.Error raised as expected: {e}") except Exception as e: print(f"Different exception raised: {type(e).__name__}: {e}") PY🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/utils/redact.py` around lines 108 - 112, The except block in the base64 decoding logic around base64.b64decode (in src/lib/utils/redact.py) misses binascii.Error, so malformed base64 can raise an unhandled exception; add "import binascii" at the top of the file and include binascii.Error in the exception tuple for the try/except that decodes the padded fragment (the block that currently catches ValueError and UnicodeDecodeError) so the function returns the original fragment on invalid base64 input.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/lib/utils/redact.py`:
- Around line 108-112: The except block in the base64 decoding logic around
base64.b64decode (in src/lib/utils/redact.py) misses binascii.Error, so
malformed base64 can raise an unhandled exception; add "import binascii" at the
top of the file and include binascii.Error in the exception tuple for the
try/except that decodes the padded fragment (the block that currently catches
ValueError and UnicodeDecodeError) so the function returns the original fragment
on invalid base64 input.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 730a8f20-545e-4761-938d-a38b15a4ed35
📒 Files selected for processing (1)
src/lib/utils/redact.py
Description
Reduces the risk of secrets being uploaded to object storage.
Issue #None
Checklist
Summary by CodeRabbit