feat: support DefaultDataCredential for workflow_log and workflow_data#751
feat: support DefaultDataCredential for workflow_log and workflow_data#751KeitaW wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
Widen LogConfig, DataConfig, and related function signatures from StaticDataCredential to DataCredential (the union that includes DefaultDataCredential). This allows workflow log/data storage to use ambient credentials (IRSA, Pod Identity, instance metadata) instead of requiring static IAM access keys. Also guard the Go sidecar's env var override so empty static keys don't clobber the SDK's ambient credential chain. Python changes (postgres.py, task.py): type annotation only — all downstream code already uses DataCredential-compatible interfaces (to_decrypted_dict, storage.Client.create). Go change (data.go): skip os.Setenv when AccessKeyId is empty. Fixes NVIDIA#749
📝 WalkthroughWalkthroughBroadened credential handling so workflow log/data configs accept the Changes
Sequence Diagram(s)sequenceDiagram
rect rgba(135,206,250,0.5)
Participant Client
end
rect rgba(144,238,144,0.5)
Participant PythonService
end
rect rgba(255,182,193,0.5)
Participant Sidecar(Go)
end
rect rgba(255,228,181,0.5)
Participant AWS_SDK
end
Client->>PythonService: submit workflow config (DataCredential ⟶ endpoint/region or access keys)
PythonService->>Sidecar: include credential payload in pod/task config
Sidecar->>Sidecar: MountURL reads DataCredential
alt AccessKeyId non-empty
Sidecar->>Sidecar: set AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_REGION
else AccessKeyId empty
Sidecar->>Sidecar: do not set AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY
Sidecar->>Sidecar: set AWS_REGION only if non-empty
end
Sidecar->>AWS_SDK: SDK uses env or ambient credential chain
AWS_SDK->>AWS_SDK: authenticate using provided static keys or ambient/default chain
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/utils/job/task.py (1)
2834-2839:⚠️ Potential issue | 🟡 MinorType annotation inconsistency with
data_endpointsparameter.The
fetch_credsfunction still usesStaticDataCredentialin its signature, but callers (lines 2437, 2449, 2458) now passdata_endpointstyped asDict[str, DataCredential]. This creates a type mismatch that would be flagged by type checkers.🔧 Proposed fix to align types
def fetch_creds( user: str, - data_creds: dict[str, credentials.StaticDataCredential], + data_creds: dict[str, credentials.DataCredential], path: str, disabled_data: list[str] | None = None, -) -> credentials.StaticDataCredential | None: +) -> credentials.DataCredential | None:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/job/task.py` around lines 2834 - 2839, The fetch_creds signature uses credentials.StaticDataCredential but callers pass DataCredential-typed objects; update the type annotation for the parameter data_creds (and the function return) to use the correct DataCredential type (e.g., dict[str, credentials.DataCredential] and -> credentials.DataCredential | None) and adjust any imports if needed so fetch_creds, its callers (where data_endpoints is passed), and return usage consistently reference credentials.DataCredential.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/runtime/pkg/data/data.go`:
- Around line 417-424: The current guard only checks dataCredential.AccessKeyId
before setting both AWS env vars, which can write an empty AWS_SECRET_ACCESS_KEY
and leave stale credentials between mounts; update the logic around
dataCredential.AccessKeyId and dataCredential.AccessKey so you only call
os.Setenv("AWS_ACCESS_KEY_ID", ...) and os.Setenv("AWS_SECRET_ACCESS_KEY", ...)
when both AccessKeyId and AccessKey are non-empty, and otherwise call
os.Unsetenv for both "AWS_ACCESS_KEY_ID" and "AWS_SECRET_ACCESS_KEY" to prevent
partial credentials and remove any previously set env vars; locate the code
using the dataCredential.AccessKeyId / dataCredential.AccessKey symbols to apply
this change.
---
Outside diff comments:
In `@src/utils/job/task.py`:
- Around line 2834-2839: The fetch_creds signature uses
credentials.StaticDataCredential but callers pass DataCredential-typed objects;
update the type annotation for the parameter data_creds (and the function
return) to use the correct DataCredential type (e.g., dict[str,
credentials.DataCredential] and -> credentials.DataCredential | None) and adjust
any imports if needed so fetch_creds, its callers (where data_endpoints is
passed), and return usage consistently reference credentials.DataCredential.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 65a77438-6ab8-4e85-878c-f83f22ed354a
📒 Files selected for processing (3)
src/runtime/pkg/data/data.gosrc/utils/connectors/postgres.pysrc/utils/job/task.py
- Revert get_all_data_creds return type to StaticDataCredential (it constructs StaticDataCredential explicitly, not DataCredential) - Set AWS_REGION env var from dataCredential.Region when non-empty, so ambient credentials (IRSA/Pod Identity) can locate the correct S3 endpoint
Summary
Widens
LogConfig,DataConfig, and related function signatures fromStaticDataCredentialtoDataCredentialso workflow log/data storage can use ambient credentials (IRSA, Pod Identity, instance metadata) instead of requiring static IAM access keys.This picks up where #508 (Azure workload identity) left off — the
DataCredentialunion, S3 backend, andstorage.Client.create()already supportDefaultDataCredential, but the workflow config types were still locked toStaticDataCredential.Changes
Python — type annotations only (postgres.py, task.py):
LogConfig.credential:StaticDataCredential | None→DataCredential | NoneDataConfig.credential:StaticDataCredential | None→DataCredential | Noneget_all_data_credsreturn type:Dict[str, StaticDataCredential]→Dict[str, DataCredential]create_config_dict/data_endpoints: widened toDataCredentialGo — sidecar credential guard (data.go):
os.Setenv("AWS_ACCESS_KEY_ID", ...)when keys are empty, preventing ambient credential clobberingNo changes needed
workflow_service.py— already delegates toClient.create()genericallyclient.py— already acceptsDataCredentialunions3.py— already hasDefaultDataCredentialmatch armcredentials.py—DataCredentialunion already definedFixes Issue #749
Checklist
Test plan
curl -X PATCH /api/configs/workflow -d '{"workflow_log": {"credential": {"endpoint": "s3://bucket", "region": "us-west-2"}}}'Summary by CodeRabbit
Bug Fixes
Compatibility