Skip to content

Using ParallelRunConfig and DockerConfig not working #1979

Open
@VincentMcLoughlin

Description

@VincentMcLoughlin

Hello, I am attempting to use the ParallelRunStep for a set of python scripts that I currently run inside a docker container. I am trying to deploy this using an AzureML pipeline that reads from the docker image saved in our ACR. However, the init and run scripts in my parallel script never seem to be able to access the files I have moved into the docker container. Particularly when I try and search for a hydra config file in my init and run scripts, I am unable to find the file, even when I look for it explicitly. I have included a snippet that is based on the example code from this repo: example of parallel run config

Any idea what I am doing wrong? I have included a snippet of what I am trying below


docker_config = DockerConfiguration(use_docker=True, arguments=docker_args)
 
environment_name = "my-environment"
environment = Environment(environment_name)
base_image_name = os.getenv("ACR_BASE_IMAGE_NAME")
base_image_tag = os.getenv("ACR_IMAGE_TAG")
environment.docker.base_image = f"{base_image_name}:{base_image_tag}"

environment.docker.base_image_registry.address = f"{acr_name}.azurecr.io"

environment.docker.base_image_registry.username = os.getenv("ACR_USER")
environment.docker.base_image_registry.password = os.getenv("ACR_PASSWORD")
environment.python.user_managed_dependencies = True
environment.docker.enabled=True

run_config = RunConfiguration()
run_config.environment = environment
run_config.docker = docker_config

parallel_run_config = ParallelRunConfig(
        source_directory=".",
        entry_script="path/to/docker/script_1.py",  
        compute_target=compute_target,
        environment=environment,
        node_count=2,
        error_threshold=10,
        output_action="append_row",
        mini_batch_size=1,
        logging_level='DEBUG'
        )
step_parallel = ParallelRunStep(
        name="parallel-step",
        parallel_run_config=parallel_run_config,
        inputs=my_inputs,
        output=output_dir,
        arguments=args,                                   
        allow_reuse=True,
    )
    
step_parallel._runconfig.docker = docker_config #Tried with and without, does not seem to make a difference

pipeline_steps = StepSequence(steps=[step_parallel])
pipeline_run = Pipeline(workspace=ws, steps=pipeline_steps)

# Submit your pipeline run
submitted_pipeline_run = Experiment(ws, "Azure Pipeline").submit(pipeline_run, regenerate_outputs=True)```

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions