Open
Description
Hello, I am attempting to use the ParallelRunStep for a set of python scripts that I currently run inside a docker container. I am trying to deploy this using an AzureML pipeline that reads from the docker image saved in our ACR. However, the init and run scripts in my parallel script never seem to be able to access the files I have moved into the docker container. Particularly when I try and search for a hydra config file in my init and run scripts, I am unable to find the file, even when I look for it explicitly. I have included a snippet that is based on the example code from this repo: example of parallel run config
Any idea what I am doing wrong? I have included a snippet of what I am trying below
docker_config = DockerConfiguration(use_docker=True, arguments=docker_args)
environment_name = "my-environment"
environment = Environment(environment_name)
base_image_name = os.getenv("ACR_BASE_IMAGE_NAME")
base_image_tag = os.getenv("ACR_IMAGE_TAG")
environment.docker.base_image = f"{base_image_name}:{base_image_tag}"
environment.docker.base_image_registry.address = f"{acr_name}.azurecr.io"
environment.docker.base_image_registry.username = os.getenv("ACR_USER")
environment.docker.base_image_registry.password = os.getenv("ACR_PASSWORD")
environment.python.user_managed_dependencies = True
environment.docker.enabled=True
run_config = RunConfiguration()
run_config.environment = environment
run_config.docker = docker_config
parallel_run_config = ParallelRunConfig(
source_directory=".",
entry_script="path/to/docker/script_1.py",
compute_target=compute_target,
environment=environment,
node_count=2,
error_threshold=10,
output_action="append_row",
mini_batch_size=1,
logging_level='DEBUG'
)
step_parallel = ParallelRunStep(
name="parallel-step",
parallel_run_config=parallel_run_config,
inputs=my_inputs,
output=output_dir,
arguments=args,
allow_reuse=True,
)
step_parallel._runconfig.docker = docker_config #Tried with and without, does not seem to make a difference
pipeline_steps = StepSequence(steps=[step_parallel])
pipeline_run = Pipeline(workspace=ws, steps=pipeline_steps)
# Submit your pipeline run
submitted_pipeline_run = Experiment(ws, "Azure Pipeline").submit(pipeline_run, regenerate_outputs=True)```
Metadata
Metadata
Assignees
Labels
No labels