Skip to content

feat: support DefaultDataCredential for workflow_log and workflow_data#751

Open
KeitaW wants to merge 2 commits intoNVIDIA:mainfrom
KeitaW:feat/support-default-data-credential
Open

feat: support DefaultDataCredential for workflow_log and workflow_data#751
KeitaW wants to merge 2 commits intoNVIDIA:mainfrom
KeitaW:feat/support-default-data-credential

Conversation

@KeitaW
Copy link
Copy Markdown

@KeitaW KeitaW commented Mar 29, 2026

Summary

Widens LogConfig, DataConfig, and related function signatures from StaticDataCredential to DataCredential so workflow log/data storage can use ambient credentials (IRSA, Pod Identity, instance metadata) instead of requiring static IAM access keys.

This picks up where #508 (Azure workload identity) left off — the DataCredential union, S3 backend, and storage.Client.create() already support DefaultDataCredential, but the workflow config types were still locked to StaticDataCredential.

Changes

Python — type annotations only (postgres.py, task.py):

  • LogConfig.credential: StaticDataCredential | NoneDataCredential | None
  • DataConfig.credential: StaticDataCredential | NoneDataCredential | None
  • get_all_data_creds return type: Dict[str, StaticDataCredential]Dict[str, DataCredential]
  • create_config_dict / data_endpoints: widened to DataCredential

Go — sidecar credential guard (data.go):

  • Skip os.Setenv("AWS_ACCESS_KEY_ID", ...) when keys are empty, preventing ambient credential clobbering

No changes needed

  • workflow_service.py — already delegates to Client.create() generically
  • client.py — already accepts DataCredential union
  • s3.py — already has DefaultDataCredential match arm
  • credentials.pyDataCredential union already defined

Fixes Issue #749

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Test plan

  • Configure workflow_log with endpoint + region only (no access keys): curl -X PATCH /api/configs/workflow -d '{"workflow_log": {"credential": {"endpoint": "s3://bucket", "region": "us-west-2"}}}'
  • Verify 200 OK (previously 422)
  • Submit workflow, verify logs upload to S3 using ambient credentials
  • Verify existing deployments with static keys still work (backwards compatible)
  • Verify Go sidecar mounts data correctly with both credential types

Summary by CodeRabbit

  • Bug Fixes

    • Improved credential handling so empty static values no longer override existing environment or ambient credential resolution.
  • Compatibility

    • Broadened credential types used in workflow/config handling and task orchestration to accept a wider range of credential formats, improving interoperability.

Widen LogConfig, DataConfig, and related function signatures from
StaticDataCredential to DataCredential (the union that includes
DefaultDataCredential). This allows workflow log/data storage to use
ambient credentials (IRSA, Pod Identity, instance metadata) instead
of requiring static IAM access keys.

Also guard the Go sidecar's env var override so empty static keys
don't clobber the SDK's ambient credential chain.

Python changes (postgres.py, task.py): type annotation only — all
downstream code already uses DataCredential-compatible interfaces
(to_decrypted_dict, storage.Client.create).

Go change (data.go): skip os.Setenv when AccessKeyId is empty.

Fixes NVIDIA#749
@KeitaW KeitaW requested a review from a team as a code owner March 29, 2026 00:09
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 29, 2026

📝 Walkthrough

Walkthrough

Broadened credential handling so workflow log/data configs accept the DataCredential union (static or default) and the Go sidecar (MountURL) only sets AWS env vars when corresponding credential fields are non-empty, avoiding clobbering the SDK's ambient credential resolution.

Changes

Cohort / File(s) Summary
Python config & task signatures
src/utils/connectors/postgres.py, src/utils/job/task.py
Widened Pydantic/type annotations and function parameter types from StaticDataCredentialDataCredential so workflow LogConfig/DataConfig and job/pod config accept default (ambient) credentials as well as static keys.
Go sidecar env handling
src/runtime/pkg/data/data.go
Adjusted MountURL to set AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_REGION only when the corresponding dataCredential fields are non-empty, preventing empty strings from overwriting the SDK's default credential chain.

Sequence Diagram(s)

sequenceDiagram
    rect rgba(135,206,250,0.5)
    Participant Client
    end
    rect rgba(144,238,144,0.5)
    Participant PythonService
    end
    rect rgba(255,182,193,0.5)
    Participant Sidecar(Go)
    end
    rect rgba(255,228,181,0.5)
    Participant AWS_SDK
    end

    Client->>PythonService: submit workflow config (DataCredential ⟶ endpoint/region or access keys)
    PythonService->>Sidecar: include credential payload in pod/task config
    Sidecar->>Sidecar: MountURL reads DataCredential
    alt AccessKeyId non-empty
        Sidecar->>Sidecar: set AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_REGION
    else AccessKeyId empty
        Sidecar->>Sidecar: do not set AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY
        Sidecar->>Sidecar: set AWS_REGION only if non-empty
    end
    Sidecar->>AWS_SDK: SDK uses env or ambient credential chain
    AWS_SDK->>AWS_SDK: authenticate using provided static keys or ambient/default chain
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped through configs, neat and spry,
The union let default creds fly,
The Go sidecar now checks before it sets,
No empty keys causing credential debts.
A little hop, a safer sky—hooray! 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main objective: adding support for DefaultDataCredential in workflow_log and workflow_data configurations instead of only StaticDataCredential.
Linked Issues check ✅ Passed The PR implements all required changes from issue #749: Python type broadening from StaticDataCredential to DataCredential in LogConfig, DataConfig, and function signatures; Go sidecar conditional env var setting to preserve ambient credentials.
Out of Scope Changes check ✅ Passed All changes directly address issue #749 objectives: Python type annotations and Go sidecar credential handling. No unrelated modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/utils/job/task.py (1)

2834-2839: ⚠️ Potential issue | 🟡 Minor

Type annotation inconsistency with data_endpoints parameter.

The fetch_creds function still uses StaticDataCredential in its signature, but callers (lines 2437, 2449, 2458) now pass data_endpoints typed as Dict[str, DataCredential]. This creates a type mismatch that would be flagged by type checkers.

🔧 Proposed fix to align types
 def fetch_creds(
     user: str,
-    data_creds: dict[str, credentials.StaticDataCredential],
+    data_creds: dict[str, credentials.DataCredential],
     path: str,
     disabled_data: list[str] | None = None,
-) -> credentials.StaticDataCredential | None:
+) -> credentials.DataCredential | None:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/job/task.py` around lines 2834 - 2839, The fetch_creds signature
uses credentials.StaticDataCredential but callers pass DataCredential-typed
objects; update the type annotation for the parameter data_creds (and the
function return) to use the correct DataCredential type (e.g., dict[str,
credentials.DataCredential] and -> credentials.DataCredential | None) and adjust
any imports if needed so fetch_creds, its callers (where data_endpoints is
passed), and return usage consistently reference credentials.DataCredential.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/runtime/pkg/data/data.go`:
- Around line 417-424: The current guard only checks dataCredential.AccessKeyId
before setting both AWS env vars, which can write an empty AWS_SECRET_ACCESS_KEY
and leave stale credentials between mounts; update the logic around
dataCredential.AccessKeyId and dataCredential.AccessKey so you only call
os.Setenv("AWS_ACCESS_KEY_ID", ...) and os.Setenv("AWS_SECRET_ACCESS_KEY", ...)
when both AccessKeyId and AccessKey are non-empty, and otherwise call
os.Unsetenv for both "AWS_ACCESS_KEY_ID" and "AWS_SECRET_ACCESS_KEY" to prevent
partial credentials and remove any previously set env vars; locate the code
using the dataCredential.AccessKeyId / dataCredential.AccessKey symbols to apply
this change.

---

Outside diff comments:
In `@src/utils/job/task.py`:
- Around line 2834-2839: The fetch_creds signature uses
credentials.StaticDataCredential but callers pass DataCredential-typed objects;
update the type annotation for the parameter data_creds (and the function
return) to use the correct DataCredential type (e.g., dict[str,
credentials.DataCredential] and -> credentials.DataCredential | None) and adjust
any imports if needed so fetch_creds, its callers (where data_endpoints is
passed), and return usage consistently reference credentials.DataCredential.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 65a77438-6ab8-4e85-878c-f83f22ed354a

📥 Commits

Reviewing files that changed from the base of the PR and between 2e44c97 and c0bfbf8.

📒 Files selected for processing (3)
  • src/runtime/pkg/data/data.go
  • src/utils/connectors/postgres.py
  • src/utils/job/task.py

- Revert get_all_data_creds return type to StaticDataCredential (it
  constructs StaticDataCredential explicitly, not DataCredential)
- Set AWS_REGION env var from dataCredential.Region when non-empty,
  so ambient credentials (IRSA/Pod Identity) can locate the correct
  S3 endpoint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants