Skip to content

Use DefaultAzureCredential by default for Azure paths #497

Open
@daviewales

Description

@daviewales

Some libraries, such as polars and pandas, have an almost seamless method for interacting with cloud storage paths.

e.g.:

import polars as pl
pl.scan_csv('az://container/path/to/file.csv', storage_options={'account_name': 'mystorageaccount'}).collect()

This is nice, because I don't need to import any other libraries, setup credentials or blob clients, etc.
It automatically finds any available credentials in my local environment, presumably with something like DefaultAzureCredential.
This means that when testing locally, I just need to be authenticated with Azure CLI, and everything just works.
I don't even need to manually specify environment variables.
It also means that I can deploy the same code to the server, and it will automatically find the appropriate environment variables to authenticate as a service principal with AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, etc.

I may have missed something, but it seems that cloudpathlib has not enabled this kind of automatic credential detection with DefaultAzureCredential. Instead, I need to do the following to get an authenticated working CloudPath:

from azure.identity import DefaultAzureCredential
from cloudpathlib import CloudPath, AzureBlobClient

credential = DefaultAzureCredential()
client = AzureBlobClient(account_url="https://mystorageaccount.blob.core.windows.net", credential=credential)

path = CloudPath('az://container/path/to/file.csv', client=client)

Ideally, it would be nice to be able to do the setup automatically.
I'm imagining the following future state:

from cloudpathlib import CloudPath
path = CloudPath('az://container/path/to/file.csv', storage_options={'account_name': 'mystorageaccount'})

(There may be a nicer way to specify the account name. I'm just copying the API from polars and pandas here. I kind of wish that it was standard to include the account name in the path somehow, as passing the account name in separately feels clunky to me. It would be nice if we could use az://mystorageaccount/container/...)

See the documentation for DefaultAzureCredential. (There's a reason it's called Default!):

Note: If you are using fsspec + adlfs, adlfs requires the storage option anon=False to be set to enable DefaultAzureCredential.

For example, when using pandas, you must specify storage_options={'anon': False}.
When using fsspec directly, you need to pass it as follows:

fs = fsspec.filesystem('az', account_name='mystorageaccount', anon=False)

For more details, see:
https://github.com/fsspec/adlfs#setting-credentials

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions