Skip to content

Plugins that help to pass credentials for S3 and GCS to remote cluster workers #438

Open
@dbalabka

Description

@dbalabka

I didn't find a simple way to pass credentials to remote workers, such as S3 and GCS, while both are widely used to store data frames.
In this ticket's scope, I propose creating plugins that will help distribute the required keys to remote workers.

GCP credentials
GCP credentials file path is stored in GOOGLE_APPLICATION_CREDENTIALS env variable. The plugin has to create a remote file and pass an env variable with a proper path to workers.

S3 credentials
Like GCP, we must update credential files and store them on each worker.

PR: #439

Activity

jacobtomlinson

jacobtomlinson commented on Oct 7, 2024

@jacobtomlinson
Member

Usually you would create an IAM instance role and profile that can access S3, then configure workers to have this role via the iam_instance_profile keyword argument.

The GCP equivalent is to create a service account that can access GCS and configure that with the service_account kwarg.

This way you don't have to pass credentials around. Is there a reason why you aren't doing it this way?

dbalabka

dbalabka commented on Feb 19, 2025

@dbalabka
ContributorAuthor

@jacobtomlinson, sorry for not being active the last few months because of workload and vacation. Thanks for the question.

You are correct that using a proper service account or IAM role/profile is the more secure way and preferable for production workloads. However, I have a few scenarios when dynamically uploading the key can be more convenient.

For local development, the recommended way for GCP cloud is to use Application Default Credentials acquired with gcloud auth application-default login command. Previously, I provided changes and a detailed description in #429. ADC is associated with developers user account that would be preferable to reuse in workers. Otherwise, we have to create a separate service account key for each developer or automate the creation of such a key.

If dask deployed on-prem Kubernetes during local development, uploading the key is the most convenient way to provide the key to workers. Otherwise, we have to keep them in Secrets and mount them separately. However, such an approach is more suitable for production workloads.

jacobtomlinson

jacobtomlinson commented on Feb 24, 2025

@jacobtomlinson
Member

I see. So we could create a plugin for the client which grabs those credentials and propagates them to the workers. Do you have any interest in implementing such a plugin?

dbalabka

dbalabka commented on Mar 1, 2025

@dbalabka
ContributorAuthor

@jacobtomlinson, right. I've submitted a PR #439. PR contains source of two separate plugins for AWS and GCP that we are already using. Both provide very convenient way to push required credentials to workers. Developer simply needs to add both plugins and no configuration needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    provider/aws/ec2Cluster provider for AWS EC2 Instancesprovider/gcp/vmCluster provider for GCP InstancesquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

      Participants

      @dbalabka@jacobtomlinson

      Issue actions

        Plugins that help to pass credentials for S3 and GCS to remote cluster workers · Issue #438 · dask/dask-cloudprovider