AML Pipelines: add __str__() implementation or get_env_variable_name() to Dataset

In the `PipelineData` class, there is a `get_env_variable_name()` method which returns the name of the environment variable for this dataset (e.g. `"$AZUREML_DATAREFERENCE_my_pipelinedata"`). This is actually also the `__str__` implementation for `PipelineData`, so that it can easily be used in string formatting to pass it as an argument to a pipeline step, even if you use a custom format for arguments (such as the one of [hydra.cc](https://hydra.cc/), as also mentioned in https://github.com/MicrosoftDocs/azure-docs/issues/66599):

```python
my_pipelinedata = PipelineData("my_pipelinedata", datastore=datastore, is_directory=True)
train_step = PythonScriptStep(
    script_name="train.py",
    arguments=[
        f"dataset.path={my_pipelinedata}"
    ]
    # ...
)
```

Unfortunately, this is not the case if you want to consume a `Dataset`. The `DatasetConsumptionConfig` class does not provide a `get_env_variable_name()` method, and it doesn't have a custom `__str__()` implementation either. So, if you want to use it in string formatting for arguments, you have to manually construct the name of the environment variable, which is just a bit more code, but inconsistent with how it is done for `PipelineData`:

```python
def as_env_variable(dataset):
    return f"${dataset.name}"

my_dataset = (
    Dataset.get_by_name(workspace, name="my_dataset")
    .as_named_input("my_dataset")
    .as_mount()
)
train_step = PythonScriptStep(
    script_name="train.py",
    arguments=[
        f"dataset1.path={as_env_variable(my_dataset)}",
        f"dataset2.path={my_pipelinedata}"
    ]
    # ...
)
```

So adding a `__str__()` implementation to the `DatasetConsumptionConfig` class, or at least a `get_env_variable_name()` function would make such code more consistent.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AML Pipelines: add str() implementation or get_env_variable_name() to Dataset #1531

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AML Pipelines: add __str__() implementation or get_env_variable_name() to Dataset #1531

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

AML Pipelines: add str() implementation or get_env_variable_name() to Dataset #1531