Skip to content

AML fastai example "custom docker image" does not work #1482

Open
@hamelsmu

Description

@hamelsmu

@keijik @cody-dkdc

Running the fastai custom docker example does not work

Things I tried:

  • Tried switching to CPU and changing the script to just print "hello world". Same error
  • Tried to use DockerConfiguration b/c I get a warning that docker.enabled = True is deprecated but there is no real documentation for that and it is opaque on how to use it (and many of your examples don't use it).
    • Can the warning be clarified? When will it be deprecated? It is not deprecated as of right now.
    • Can you provide an end to end example of the right way to use DockerConfiguration

Requests

  • Can you please correct this example as this is one of the only examples on how to use a custom Docker container?
  • If this notebook cannot be corrected can you replace it because it might cause lots of confusion for fastai students or people trying to do this in the wild. cc: @jph00

I've included the logs from my attempted AML Run from this notebook below 👇🏽

2021-05-22T15:06:36Z Successfully mounted a/an Blobfuse File System at /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/mounts/workspaceblobstore
2021-05-22T15:06:36Z Starting output-watcher...
2021-05-22T15:06:37Z IsDedicatedCompute == True, won't poll for Low Pri Preemption
2021-05-22T15:06:37Z Executing 'Copy ACR Details file' on 10.8.96.89
2021-05-22T15:06:37Z Copy ACR Details file succeeded on 10.8.96.89. Output: 
>>>   
>>>   
2021-05-22T15:06:52Z Running Docker Command attempt 1 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
. See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
2021-05-22T15:06:57Z Force Restart Docker Service
2021-05-22T15:06:57Z 
2021-05-22T15:06:57Z Waiting for docker daemon to come up.
2021-05-22T15:06:57Z Docker daemon is active
2021-05-22T15:06:57Z Retry Docker Command...
2021-05-22T15:07:12Z Running Docker Command attempt 2 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
. See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
2021-05-22T15:07:21Z Force Restart Docker Service
2021-05-22T15:07:21Z 
2021-05-22T15:07:22Z Waiting for docker daemon to come up.
2021-05-22T15:07:22Z Docker daemon is active
2021-05-22T15:07:22Z Retry Docker Command...
2021-05-22T15:07:37Z Running Docker Command attempt 3 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
. See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
2021-05-22T15:07:53Z Force Restart Docker Service
2021-05-22T15:07:53Z 
2021-05-22T15:07:53Z Waiting for docker daemon to come up.
2021-05-22T15:07:53Z Docker daemon is active
2021-05-22T15:07:53Z Retry Docker Command...
2021-05-22T15:08:09Z Running Docker Command attempt 4 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
. See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
2021-05-22T15:08:41Z Force Restart Docker Service
2021-05-22T15:08:42Z 
2021-05-22T15:08:42Z Waiting for docker daemon to come up.
2021-05-22T15:08:42Z Docker daemon is active
2021-05-22T15:08:42Z Retry Docker Command...
2021-05-22T15:08:57Z Running Docker Command attempt 5 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
. See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
2021-05-22T15:10:02Z Force Restart Docker Service
2021-05-22T15:10:02Z 
2021-05-22T15:10:02Z Waiting for docker daemon to come up.
2021-05-22T15:10:02Z Docker daemon is active
2021-05-22T15:10:02Z Retry Docker Command...
2021-05-22T15:10:17Z Running Docker Command attempt 6 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
. See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
2021-05-22T15:12:26Z Force Restart Docker Service
2021-05-22T15:12:26Z 
2021-05-22T15:12:26Z Waiting for docker daemon to come up.
2021-05-22T15:12:26Z Docker daemon is active
2021-05-22T15:12:26Z Retry Docker Command...
2021-05-22T15:12:41Z Running Docker Command attempt 7 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
. See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
2021-05-22T15:13:11Z Job environment preparation failed on 10.8.96.89. Output: 
>>>   2021/05/22 15:06:35 Starting App Insight Logger for task:  prepareJobEnvironment
>>>   2021/05/22 15:06:35 Version: 3.0.01597.0004 Branch: 2021-05-17-bing-hotfix Commit: 974f3e4
>>>   2021/05/22 15:06:35 runtime.GOOS linux
>>>   2021/05/22 15:06:35 Checking if '/tmp' exists
>>>   2021/05/22 15:06:35 Reading dyanamic configs
>>>   2021/05/22 15:06:35 Container sas url: https://baiscriptseastusprod.blob.core.windows.net/aihosttools?sv=2018-03-28&sr=c&si=aihosttoolspolicy&sig=gCpFfTbL8hPl%2BzV43hBdfOZC4SuKqZoJraIo10S4%2FYw%3D
>>>   2021/05/22 15:06:35 Failed to read from file /mnt/batch/tasks/startup/wd/az_resource/xdsenv.variable/azsecpack.variables, open /mnt/batch/tasks/startup/wd/az_resource/xdsenv.variable/azsecpack.variables: no such file or directory
>>>   2021/05/22 15:06:35 [in autoUpgradeFromJobNodeSetup] Is Azsecpack installer on host: false. Is Azsecpack enabled: false,
>>>   2021/05/22 15:06:35 Starting Azsecpack installation on machine: bf9f722d45714167beb968edcea13f1600000E#398a6654-997b-47e9-b12b-9515b896b4de#91095667-e119-4555-acea-1826488492f0#ds-tengri-resources-eastus#dds-ml-east#dds-ml
>>>   2021/05/22 15:06:35 Is Azsecpack enabled: false, GetDisableVsatlsscan: true
>>>   2021/05/22 15:06:35 Turning off azsecpack, if it is already running
>>>   2021/05/22 15:06:35 [doTurnOffAzsecpack] output:Unit mdsd.service could not be found.
>>>   ,err:exit status 1.
>>>   2021/05/22 15:06:35 OS patching disabled by dynamic configs. Skipping.
>>>   2021/05/22 15:06:35 Job: AZ_BATCHAI_JOB_NAME does not turn on the DetonationChamber
>>>   2021/05/22 15:06:35 Start to getting gpu count by running nvidia-smi command
>>>   2021/05/22 15:06:35 GPU count found on the node: 0
>>>   2021/05/22 15:06:35 AMLComputeXDSEndpoint:  https://6e64c585-4845-4356-b1e0-a28ca62f252a.workspace.eastus.cert.api.azureml.ms/xdsbatchai
>>>   2021/05/22 15:06:35 AMLComputeXDSApiVersion:  2018-02-01
>>>   2021/05/22 15:06:35 Creating directory /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/config
>>>   2021/05/22 15:06:35 This is not a aml-workstation (compute instance), current offer type: amlcompute. Starting identity responder as part of prepareJobEnvironment.
>>>   2021/05/22 15:06:35 Starting identity responder.
>>>   2021/05/22 15:06:35 Starting identity responder.
>>>   2021/05/22 15:06:35 Failed to open file /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/config/.batchai.IdentityResponder.envlist: open /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/config/.batchai.IdentityResponder.envlist: no such file or directory
>>>   2021/05/22 15:06:35 Logfile used for identity responder: /mnt/batch/tasks/workitems/051f9434-a110-4ced-be03-f37876075345/job-1/fastai-custom-image__f2308802-f3fd-4f69-9710-581502704959/IdentityResponderLog-tvmps_63c0616b393d93f50f271aee1053d8f6130f081c9a609118eb8f1295575dc40c_d.txt
>>>   2021/05/22 15:06:35 Logfile used for identity responder: /mnt/batch/tasks/workitems/051f9434-a110-4ced-be03-f37876075345/job-1/fastai-custom-image__f2308802-f3fd-4f69-9710-581502704959/IdentityResponderLog-tvmps_63c0616b393d93f50f271aee1053d8f6130f081c9a609118eb8f1295575dc40c_d.txt
>>>   2021/05/22 15:06:35 Started Identity Responder for job.
>>>   2021/05/22 15:06:35 Started Identity Responder for job.
>>>   2021/05/22 15:06:35 Creating directory /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/wd
>>>   2021/05/22 15:06:35 Creating directory /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/shared
>>>   2021/05/22 15:06:35 From the policy service, the filtering patterns is: , data store is 
>>>   2021/05/22 15:06:35 Mounting job level file systems
>>>   2021/05/22 15:06:35 Creating directory /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/mounts
>>>   2021/05/22 15:06:35 Attempting to read datastore credentials file: /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/config/.amlcompute.datastorecredentials
>>>   2021/05/22 15:06:35 Datastore credentials file not found, skipping.
>>>   2021/05/22 15:06:35 Attempting to read runtime sas tokens file: /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/config/.master.runtimesastokens
>>>   2021/05/22 15:06:35 Runtime sas tokens file not found, skipping.
>>>   2021/05/22 15:06:35 No NFS configured
>>>   2021/05/22 15:06:35 No Azure File Shares configured
>>>   2021/05/22 15:06:35 Mounting blob file systems
>>>   2021/05/22 15:06:35 Blobfuse runtime version 1.3.6
>>>   2021/05/22 15:06:35 Mounting azureml-blobstore-6e64c585-4845-4356-b1e0-a28ca62f252a container from ddsmleast9411768689 account at /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/mounts/workspaceblobstore
>>>   2021/05/22 15:06:35 Using Compute Identity to authenticate Blobfuse: false.
>>>   2021/05/22 15:06:35 Using Compute Identity to authenticate Blobfuse: false.
>>>   2021/05/22 15:06:35 Blobfuse cache size set to 11257 MB.
>>>   2021/05/22 15:06:35 Running following command: /bin/bash -c sudo blobfuse /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/mounts/workspaceblobstore --tmp-path=/mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/caches/workspaceblobstore --file-cache-timeout-in-seconds=1000000 --cache-size-mb=11257 -o nonempty -o allow_other --config-file=/mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/configs/workspaceblobstore.cfg --log-level=LOG_WARNING
>>>   2021/05/22 15:06:35 Successfully mounted a/an Blobfuse File System at /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/mounts/workspaceblobstore
>>>   2021/05/22 15:06:36 Waiting for blobfs to be mounted at /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/mounts/workspaceblobstore
>>>   2021/05/22 15:06:36 Successfully mounted azureml-blobstore-6e64c585-4845-4356-b1e0-a28ca62f252a container from ddsmleast9411768689 account at /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/mounts/workspaceblobstore
>>>   2021/05/22 15:06:36 Created run_id directory: /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/mounts/workspaceblobstore/azureml/fastai-custom-image_1621695915_a4bb441e
>>>   2021/05/22 15:06:36 No unmanaged file systems configured
>>>   2021/05/22 15:06:36 Start to getting gpu count by running nvidia-smi command
>>>   2021/05/22 15:06:36 From the policy service, the filtering patterns is: , data store is 
>>>   2021/05/22 15:06:36 Creating directory /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/mounts/workspaceblobstore/azureml/fastai-custom-image_1621695915_a4bb441e/azureml_compute_logs
>>>   2021/05/22 15:06:36 Creating directory /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/mounts/workspaceblobstore/azureml/fastai-custom-image_1621695915_a4bb441e/logs
>>>   2021/05/22 15:06:36 Creating directory /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/mounts/workspaceblobstore/azureml/fastai-custom-image_1621695915_a4bb441e/outputs
>>>   2021/05/22 15:06:36 Starting output-watcher...
>>>   2021/05/22 15:06:36 Single file input dataset is enabled.
>>>   2021/05/22 15:06:36 Start to pulling docker image: fastdotai/fastai:latest
>>>   2021/05/22 15:06:36 Start pull docker image: fastdotai
>>>   2021/05/22 15:06:36 Getting credentials for image fastdotai/fastai:latest with url 
>>>   2021/05/22 15:06:36 Container registry is not ACR.
>>>   2021/05/22 15:06:36 Skip getting ACR Credentials from Identity and will be getting it from EMS
>>>   2021/05/22 15:06:36 Getting ACR Credentials from EMS for environment fastai:Autosave_2021-05-22T15:05:19Z_b9284463
>>>   2021/05/22 15:06:36 Requesting XDS for registry details.
>>>   2021/05/22 15:06:36 Attempt 1 of http call to https://6e64c585-4845-4356-b1e0-a28ca62f252a.workspace.eastus.cert.api.azureml.ms/xdsbatchai/hosttoolapi/subscriptions/91095667-e119-4555-acea-1826488492f0/resourceGroups/ds-tengri-resources-eastus/workspaces/dds-ml-east/clusters/dds-ml/nodes/tvmps_63c0616b393d93f50f271aee1053d8f6130f081c9a609118eb8f1295575dc40c_d?api-version=2018-02-01
>>>   2021/05/22 15:06:37 Got container registry details from credentials service for registry address: .
>>>   2021/05/22 15:06:37 Writing ACR Details to file...
>>>   2021/05/22 15:06:37 Copying ACR Details file to worker nodes...
>>>   2021/05/22 15:06:37 Executing 'Copy ACR Details file' on 10.8.96.89
>>>   2021/05/22 15:06:37 Begin executing 'Copy ACR Details file' task on Node
>>>   2021/05/22 15:06:37 'Copy ACR Details file' task Node result: succeeded
>>>   2021/05/22 15:06:37 Copy ACR Details file succeeded on 10.8.96.89. Output: 
>>>   >>>   
>>>   >>>   
>>>   2021/05/22 15:06:37 EncryptedDockerRegistryPassword is empty.
>>>   2021/05/22 15:06:37 EMS returned empty credentials for environment fastai
>>>   2021/05/22 15:06:37 Save docker credentials for image fastdotai/fastai:latest in /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/wd/docker_login_6FE00B6271AD80D6
>>>   2021/05/22 15:06:37 The login info is empty, skipping login to the docker registry.
>>>   2021/05/22 15:06:37 Start run pull docker image command
>>>   2021/05/22 15:06:40 Not exporting to RunHistory as the exporter is either stopped or there is no data.
>>>   Stopped: false
>>>   OriginalData: 18
>>>   FilteredData: 0.
>>>   2021/05/22 15:06:52 Running Docker Command attempt 1 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>>   . See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
>>>   2021/05/22 15:06:52 Running Docker Command attempt 1 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>>   . See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
>>>   2021/05/22 15:06:57 Force Restart Docker Service
>>>   2021/05/22 15:06:57 Force Restart Docker Service
>>>   2021/05/22 15:06:57 
>>>   2021/05/22 15:06:57 
>>>   2021/05/22 15:06:57 Last 20 lines of Docker daemon log file, fetched after force restart:
>>>    time="2021-05-22T15:06:57.302348600Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/containerd/containerd.sock 0  <nil>}] <nil>}" module=grpc
>>>   time="2021-05-22T15:06:57.302735600Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
>>>   time="2021-05-22T15:06:57.303901500Z" level=info msg="parsed scheme: \"unix\"" module=grpc
>>>   time="2021-05-22T15:06:57.303923800Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
>>>   time="2021-05-22T15:06:57.303956200Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/containerd/containerd.sock 0  <nil>}] <nil>}" module=grpc
>>>   time="2021-05-22T15:06:57.303966600Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
>>>   time="2021-05-22T15:06:57.309015500Z" level=info msg="[graphdriver] using prior storage driver: overlay2"
>>>   time="2021-05-22T15:06:57.311528200Z" level=warning msg="Your kernel does not support swap memory limit"
>>>   time="2021-05-22T15:06:57.311564200Z" level=warning msg="Your kernel does not support cgroup rt period"
>>>   time="2021-05-22T15:06:57.311571500Z" level=warning msg="Your kernel does not support cgroup rt runtime"
>>>   time="2021-05-22T15:06:57.311577100Z" level=warning msg="Your kernel does not support cgroup blkio weight"
>>>   time="2021-05-22T15:06:57.311582400Z" level=warning msg="Your kernel does not support cgroup blkio weight_device"
>>>   time="2021-05-22T15:06:57.311699200Z" level=info msg="Loading containers: start."
>>>   time="2021-05-22T15:06:57.393926400Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
>>>   time="2021-05-22T15:06:57.424473200Z" level=info msg="Loading containers: done."
>>>   time="2021-05-22T15:06:57.439214700Z" level=warning msg="Not using native diff for overlay2, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled" storage-driver=overlay2
>>>   time="2021-05-22T15:06:57.440205600Z" level=info msg="Docker daemon" commit=7d75c1d40d88ddef08653dbd611f41df42bdf087 graphdriver(s)=overlay2 version=19.03.14+azure
>>>   time="2021-05-22T15:06:57.440454200Z" level=info msg="Daemon has completed initialization"
>>>   time="2021-05-22T15:06:57.463106300Z" level=info msg="API listen on /var/run/docker.sock"
>>>   Started Docker Application Container Engine.
>>>   
>>>   2021/05/22 15:06:57 Finished restarting docker service if needed
>>>   2021/05/22 15:06:57 Waiting for docker daemon to come up.
>>>   2021/05/22 15:06:57 Waiting for docker daemon to come up.
>>>   2021/05/22 15:06:57 Docker daemon is active
>>>   2021/05/22 15:06:57 Docker daemon is active
>>>   2021/05/22 15:06:57 Retry Docker Command...
>>>   2021/05/22 15:06:57 Retry Docker Command...
>>>   2021/05/22 15:07:12 Running Docker Command attempt 2 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>>   . See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
>>>   2021/05/22 15:07:12 Running Docker Command attempt 2 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>>   . See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
>>>   2021/05/22 15:07:21 Force Restart Docker Service
>>>   2021/05/22 15:07:21 Force Restart Docker Service
>>>   2021/05/22 15:07:21 
>>>   2021/05/22 15:07:21 
>>>   2021/05/22 15:07:21 Last 20 lines of Docker daemon log file, fetched after force restart:
>>>    time="2021-05-22T15:07:21.719788000Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/containerd/containerd.sock 0  <nil>}] <nil>}" module=grpc
>>>   time="2021-05-22T15:07:21.719798300Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
>>>   time="2021-05-22T15:07:21.721117500Z" level=info msg="parsed scheme: \"unix\"" module=grpc
>>>   time="2021-05-22T15:07:21.721284300Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
>>>   time="2021-05-22T15:07:21.721449400Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/containerd/containerd.sock 0  <nil>}] <nil>}" module=grpc
>>>   time="2021-05-22T15:07:21.721592000Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
>>>   time="2021-05-22T15:07:21.729531400Z" level=info msg="[graphdriver] using prior storage driver: overlay2"
>>>   time="2021-05-22T15:07:21.731169400Z" level=warning msg="Your kernel does not support swap memory limit"
>>>   time="2021-05-22T15:07:21.731189800Z" level=warning msg="Your kernel does not support cgroup rt period"
>>>   time="2021-05-22T15:07:21.731196600Z" level=warning msg="Your kernel does not support cgroup rt runtime"
>>>   time="2021-05-22T15:07:21.731202200Z" level=warning msg="Your kernel does not support cgroup blkio weight"
>>>   time="2021-05-22T15:07:21.731207600Z" level=warning msg="Your kernel does not support cgroup blkio weight_device"
>>>   time="2021-05-22T15:07:21.731574600Z" level=info msg="Loading containers: start."
>>>   time="2021-05-22T15:07:21.820365400Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
>>>   time="2021-05-22T15:07:21.858103900Z" level=info msg="Loading containers: done."
>>>   time="2021-05-22T15:07:21.874279400Z" level=warning msg="Not using native diff for overlay2, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled" storage-driver=overlay2
>>>   time="2021-05-22T15:07:21.874631700Z" level=info msg="Docker daemon" commit=7d75c1d40d88ddef08653dbd611f41df42bdf087 graphdriver(s)=overlay2 version=19.03.14+azure
>>>   time="2021-05-22T15:07:21.874679200Z" level=info msg="Daemon has completed initialization"
>>>   time="2021-05-22T15:07:21.887387900Z" level=info msg="API listen on /var/run/docker.sock"
>>>   Started Docker Application Container Engine.
>>>   
>>>   2021/05/22 15:07:22 Finished restarting docker service if needed
>>>   2021/05/22 15:07:22 Waiting for docker daemon to come up.
>>>   2021/05/22 15:07:22 Waiting for docker daemon to come up.
>>>   2021/05/22 15:07:22 Docker daemon is active
>>>   2021/05/22 15:07:22 Docker daemon is active
>>>   2021/05/22 15:07:22 Retry Docker Command...
>>>   2021/05/22 15:07:22 Retry Docker Command...
>>>   2021/05/22 15:07:37 Running Docker Command attempt 3 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>>   . See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
>>>   2021/05/22 15:07:37 Running Docker Command attempt 3 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>>   . See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
>>>   2021/05/22 15:07:53 Force Restart Docker Service
>>>   2021/05/22 15:07:53 Force Restart Docker Service
>>>   2021/05/22 15:07:53 
>>>   2021/05/22 15:07:53 
>>>   2021/05/22 15:07:53 Last 20 lines of Docker daemon log file, fetched after force restart:
>>>    time="2021-05-22T15:07:53.712328900Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/containerd/containerd.sock 0  <nil>}] <nil>}" module=grpc
>>>   time="2021-05-22T15:07:53.712338500Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
>>>   time="2021-05-22T15:07:53.714142100Z" level=info msg="parsed scheme: \"unix\"" module=grpc
>>>   time="2021-05-22T15:07:53.714241000Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
>>>   time="2021-05-22T15:07:53.714448200Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/containerd/containerd.sock 0  <nil>}] <nil>}" module=grpc
>>>   time="2021-05-22T15:07:53.714533300Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
>>>   time="2021-05-22T15:07:53.722682800Z" level=info msg="[graphdriver] using prior storage driver: overlay2"
>>>   time="2021-05-22T15:07:53.723940700Z" level=warning msg="Your kernel does not support swap memory limit"
>>>   time="2021-05-22T15:07:53.723958800Z" level=warning msg="Your kernel does not support cgroup rt period"
>>>   time="2021-05-22T15:07:53.723966800Z" level=warning msg="Your kernel does not support cgroup rt runtime"
>>>   time="2021-05-22T15:07:53.723972800Z" level=warning msg="Your kernel does not support cgroup blkio weight"
>>>   time="2021-05-22T15:07:53.723978700Z" level=warning msg="Your kernel does not support cgroup blkio weight_device"
>>>   time="2021-05-22T15:07:53.724092000Z" level=info msg="Loading containers: start."
>>>   time="2021-05-22T15:07:53.805591800Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
>>>   time="2021-05-22T15:07:53.843376900Z" level=info msg="Loading containers: done."
>>>   time="2021-05-22T15:07:53.856206300Z" level=warning msg="Not using native diff for overlay2, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled" storage-driver=overlay2
>>>   time="2021-05-22T15:07:53.856574000Z" level=info msg="Docker daemon" commit=7d75c1d40d88ddef08653dbd611f41df42bdf087 graphdriver(s)=overlay2 version=19.03.14+azure
>>>   time="2021-05-22T15:07:53.856712700Z" level=info msg="Daemon has completed initialization"
>>>   Started Docker Application Container Engine.
>>>   time="2021-05-22T15:07:53.871094800Z" level=info msg="API listen on /var/run/docker.sock"
>>>   
>>>   2021/05/22 15:07:53 Finished restarting docker service if needed
>>>   2021/05/22 15:07:53 Waiting for docker daemon to come up.
>>>   2021/05/22 15:07:53 Waiting for docker daemon to come up.
>>>   2021/05/22 15:07:53 Docker daemon is active
>>>   2021/05/22 15:07:53 Docker daemon is active
>>>   2021/05/22 15:07:53 Retry Docker Command...
>>>   2021/05/22 15:07:53 Retry Docker Command...
>>>   2021/05/22 15:08:09 Running Docker Command attempt 4 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>>   . See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
>>>   2021/05/22 15:08:09 Running Docker Command attempt 4 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>>   . See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
>>>   2021/05/22 15:08:41 Force Restart Docker Service
>>>   2021/05/22 15:08:41 Force Restart Docker Service
>>>   2021/05/22 15:08:42 
>>>   2021/05/22 15:08:42 
>>>   2021/05/22 15:08:42 Last 20 lines of Docker daemon log file, fetched after force restart:
>>>    time="2021-05-22T15:08:42.087658100Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/containerd/containerd.sock 0  <nil>}] <nil>}" module=grpc
>>>   time="2021-05-22T15:08:42.087667500Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
>>>   time="2021-05-22T15:08:42.089030500Z" level=info msg="parsed scheme: \"unix\"" module=grpc
>>>   time="2021-05-22T15:08:42.089058200Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
>>>   time="2021-05-22T15:08:42.089076100Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/containerd/containerd.sock 0  <nil>}] <nil>}" module=grpc
>>>   time="2021-05-22T15:08:42.089089600Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
>>>   time="2021-05-22T15:08:42.098016300Z" level=info msg="[graphdriver] using prior storage driver: overlay2"
>>>   time="2021-05-22T15:08:42.099754700Z" level=warning msg="Your kernel does not support swap memory limit"
>>>   time="2021-05-22T15:08:42.099778000Z" level=warning msg="Your kernel does not support cgroup rt period"
>>>   time="2021-05-22T15:08:42.099785600Z" level=warning msg="Your kernel does not support cgroup rt runtime"
>>>   time="2021-05-22T15:08:42.099791900Z" level=warning msg="Your kernel does not support cgroup blkio weight"
>>>   time="2021-05-22T15:08:42.099798100Z" level=warning msg="Your kernel does not support cgroup blkio weight_device"
>>>   time="2021-05-22T15:08:42.099938600Z" level=info msg="Loading containers: start."
>>>   time="2021-05-22T15:08:42.180795500Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
>>>   time="2021-05-22T15:08:42.212181600Z" level=info msg="Loading containers: done."
>>>   time="2021-05-22T15:08:42.228966500Z" level=warning msg="Not using native diff for overlay2, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled" storage-driver=overlay2
>>>   time="2021-05-22T15:08:42.229346600Z" level=info msg="Docker daemon" commit=7d75c1d40d88ddef08653dbd611f41df42bdf087 graphdriver(s)=overlay2 version=19.03.14+azure
>>>   time="2021-05-22T15:08:42.229412500Z" level=info msg="Daemon has completed initialization"
>>>   time="2021-05-22T15:08:42.242934500Z" level=info msg="API listen on /var/run/docker.sock"
>>>   Started Docker Application Container Engine.
>>>   
>>>   2021/05/22 15:08:42 Finished restarting docker service if needed
>>>   2021/05/22 15:08:42 Waiting for docker daemon to come up.
>>>   2021/05/22 15:08:42 Waiting for docker daemon to come up.
>>>   2021/05/22 15:08:42 Docker daemon is active
>>>   2021/05/22 15:08:42 Docker daemon is active
>>>   2021/05/22 15:08:42 Retry Docker Command...
>>>   2021/05/22 15:08:42 Retry Docker Command...
>>>   2021/05/22 15:08:57 Running Docker Command attempt 5 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>>   . See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
>>>   2021/05/22 15:08:57 Running Docker Command attempt 5 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>>   . See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
>>>   2021/05/22 15:10:02 Force Restart Docker Service
>>>   2021/05/22 15:10:02 Force Restart Docker Service
>>>   2021/05/22 15:10:02 
>>>   2021/05/22 15:10:02 
>>>   2021/05/22 15:10:02 Last 20 lines of Docker daemon log file, fetched after force restart:
>>>    time="2021-05-22T15:10:02.554674900Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/containerd/containerd.sock 0  <nil>}] <nil>}" module=grpc
>>>   time="2021-05-22T15:10:02.554683700Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
>>>   time="2021-05-22T15:10:02.555814300Z" level=info msg="parsed scheme: \"unix\"" module=grpc
>>>   time="2021-05-22T15:10:02.556471100Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
>>>   time="2021-05-22T15:10:02.556488600Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/containerd/containerd.sock 0  <nil>}] <nil>}" module=grpc
>>>   time="2021-05-22T15:10:02.556501300Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
>>>   time="2021-05-22T15:10:02.564340600Z" level=info msg="[graphdriver] using prior storage driver: overlay2"
>>>   time="2021-05-22T15:10:02.565848300Z" level=warning msg="Your kernel does not support swap memory limit"
>>>   time="2021-05-22T15:10:02.565884700Z" level=warning msg="Your kernel does not support cgroup rt period"
>>>   time="2021-05-22T15:10:02.565891600Z" level=warning msg="Your kernel does not support cgroup rt runtime"
>>>   time="2021-05-22T15:10:02.565896900Z" level=warning msg="Your kernel does not support cgroup blkio weight"
>>>   time="2021-05-22T15:10:02.565903600Z" level=warning msg="Your kernel does not support cgroup blkio weight_device"
>>>   time="2021-05-22T15:10:02.566121800Z" level=info msg="Loading containers: start."
>>>   time="2021-05-22T15:10:02.651460900Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
>>>   time="2021-05-22T15:10:02.682343000Z" level=info msg="Loading containers: done."
>>>   time="2021-05-22T15:10:02.698387100Z" level=warning msg="Not using native diff for overlay2, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled" storage-driver=overlay2
>>>   time="2021-05-22T15:10:02.698780400Z" level=info msg="Docker daemon" commit=7d75c1d40d88ddef08653dbd611f41df42bdf087 graphdriver(s)=overlay2 version=19.03.14+azure
>>>   time="2021-05-22T15:10:02.698851500Z" level=info msg="Daemon has completed initialization"
>>>   time="2021-05-22T15:10:02.715795400Z" level=info msg="API listen on /var/run/docker.sock"
>>>   Started Docker Application Container Engine.
>>>   
>>>   2021/05/22 15:10:02 Finished restarting docker service if needed
>>>   2021/05/22 15:10:02 Waiting for docker daemon to come up.
>>>   2021/05/22 15:10:02 Waiting for docker daemon to come up.
>>>   2021/05/22 15:10:02 Docker daemon is active
>>>   2021/05/22 15:10:02 Docker daemon is active
>>>   2021/05/22 15:10:02 Retry Docker Command...
>>>   2021/05/22 15:10:02 Retry Docker Command...
>>>   2021/05/22 15:10:17 Running Docker Command attempt 6 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>>   . See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
>>>   2021/05/22 15:10:17 Running Docker Command attempt 6 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>>   . See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
>>>   2021/05/22 15:12:26 Force Restart Docker Service
>>>   2021/05/22 15:12:26 Force Restart Docker Service
>>>   2021/05/22 15:12:26 
>>>   2021/05/22 15:12:26 
>>>   2021/05/22 15:12:26 Last 20 lines of Docker daemon log file, fetched after force restart:
>>>    time="2021-05-22T15:12:26.277283800Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/containerd/containerd.sock 0  <nil>}] <nil>}" module=grpc
>>>   time="2021-05-22T15:12:26.277294100Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
>>>   time="2021-05-22T15:12:26.279050800Z" level=info msg="parsed scheme: \"unix\"" module=grpc
>>>   time="2021-05-22T15:12:26.279075400Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
>>>   time="2021-05-22T15:12:26.279089200Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/containerd/containerd.sock 0  <nil>}] <nil>}" module=grpc
>>>   time="2021-05-22T15:12:26.279097600Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
>>>   time="2021-05-22T15:12:26.283940400Z" level=info msg="[graphdriver] using prior storage driver: overlay2"
>>>   time="2021-05-22T15:12:26.285551700Z" level=warning msg="Your kernel does not support swap memory limit"
>>>   time="2021-05-22T15:12:26.285572600Z" level=warning msg="Your kernel does not support cgroup rt period"
>>>   time="2021-05-22T15:12:26.285579900Z" level=warning msg="Your kernel does not support cgroup rt runtime"
>>>   time="2021-05-22T15:12:26.285585600Z" level=warning msg="Your kernel does not support cgroup blkio weight"
>>>   time="2021-05-22T15:12:26.285591500Z" level=warning msg="Your kernel does not support cgroup blkio weight_device"
>>>   time="2021-05-22T15:12:26.285715100Z" level=info msg="Loading containers: start."
>>>   time="2021-05-22T15:12:26.364858000Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
>>>   time="2021-05-22T15:12:26.396268400Z" level=info msg="Loading containers: done."
>>>   time="2021-05-22T15:12:26.412820700Z" level=warning msg="Not using native diff for overlay2, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled" storage-driver=overlay2
>>>   time="2021-05-22T15:12:26.413208200Z" level=info msg="Docker daemon" commit=7d75c1d40d88ddef08653dbd611f41df42bdf087 graphdriver(s)=overlay2 version=19.03.14+azure
>>>   time="2021-05-22T15:12:26.413260000Z" level=info msg="Daemon has completed initialization"
>>>   time="2021-05-22T15:12:26.432611400Z" level=info msg="API listen on /var/run/docker.sock"
>>>   Started Docker Application Container Engine.
>>>   
>>>   2021/05/22 15:12:26 Finished restarting docker service if needed
>>>   2021/05/22 15:12:26 Waiting for docker daemon to come up.
>>>   2021/05/22 15:12:26 Waiting for docker daemon to come up.
>>>   2021/05/22 15:12:26 Docker daemon is active
>>>   2021/05/22 15:12:26 Docker daemon is active
>>>   2021/05/22 15:12:26 Retry Docker Command...
>>>   2021/05/22 15:12:26 Retry Docker Command...
>>>   2021/05/22 15:12:41 Running Docker Command attempt 7 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>>   . See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
>>>   2021/05/22 15:12:41 Running Docker Command attempt 7 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>>   . See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
>>>   2021/05/22 15:12:41 Run docker command to pull public image failed with error: Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>>   .
>>>   2021/05/22 15:12:41 Docker config dir /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/wd/docker_login_6FE00B6271AD80D6 does not exist, skip removing it
>>>   2021/05/22 15:12:41 Pull docker image time: 6m4.6839193s
>>>   
>>>   2021/05/22 15:12:41 Get credentials or pull docker image failed with err: Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>>   , skipping start Docker Container
>>>   2021/05/22 15:12:41 Starting Container fail with err Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>>   
>>>   2021/05/22 15:12:41 Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>>   
>>>   2021/05/22 15:12:45 Attempt 1 of http call to https://6e64c585-4845-4356-b1e0-a28ca62f252a.workspace.eastus.api.azureml.ms/history/v1.0/private/subscriptions/91095667-e119-4555-acea-1826488492f0/resourceGroups/ds-tengri-resources-eastus/providers/Microsoft.MachineLearningServices/workspaces/DDS-ML-EAST/runs/fastai-custom-image_1621695915_a4bb441e/spans
>>>   2021/05/22 15:13:11 Time Out after 20 second retries for flushing the logs, doing another retry before exiting
>>>   2021/05/22 15:13:11 Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>>   
>>>   
2021-05-22T15:13:11Z PostJobNodeHealthCheck
2021-05-22T15:13:11Z Executing 'Post job node health check' on 10.8.96.89
2021-05-22T15:13:41Z Post job node health check succeeded on 10.8.96.89. Output: 
>>>   2021/05/22 15:13:11 Starting App Insight Logger for task:  postJobNodeHealthCheck
>>>   2021/05/22 15:13:11 Version: 3.0.01597.0004 Branch: 2021-05-17-bing-hotfix Commit: 974f3e4
>>>   2021/05/22 15:13:11 Start Post-job node health check
>>>   2021/05/22 15:13:11 PostJobNodeHealthCheck
>>>   2021/05/22 15:13:11 GetDBE: get DBE error
>>>   2021/05/22 15:13:11 No system error was found
>>>   2021/05/22 15:13:11 DBEOutput: 
>>>   2021/05/22 15:13:11 GetOOM: get OOM error
>>>   2021/05/22 15:13:11 No system error was found
>>>   2021/05/22 15:13:11 Skipping NCCL CUDA Error Check because it's not enabled in dynamic config
>>>   2021/05/22 15:13:11 This is a cpu cluster, skipping gpu usage check
>>>   2021/05/22 15:13:11 Not exporting to RunHistory as the exporter is either stopped or there is no data.
>>>   Stopped: false
>>>   OriginalData: 1
>>>   FilteredData: 0.
>>>   2021/05/22 15:13:11 Process Exiting with Code:  0
>>>   2021/05/22 15:13:41 Time Out after 20 second retries for flushing the logs, doing another retry before exiting
>>>   
2021-05-22T15:13:41Z Executing 'JobRelease task' on 10.8.96.89
2021-05-22T15:14:12Z JobRelease task succeeded on 10.8.96.89. Output: 
>>>   2021/05/22 15:13:41 Starting App Insight Logger for task:  jobRelease
>>>   2021/05/22 15:13:41 Version: 3.0.01597.0004 Branch: 2021-05-17-bing-hotfix Commit: 974f3e4
>>>   2021/05/22 15:13:42 Exit since job container is not in running state.
>>>   2021/05/22 15:14:12 Time Out after 20 second retries for flushing the logs, doing another retry before exiting
>>>   2021/05/22 15:14:12 App Insight Client has already been closed
>>>   2021/05/22 15:14:12 Not exporting to RunHistory as the exporter is either stopped or there is no data.
>>>   Stopped: false
>>>   OriginalData: 1
>>>   FilteredData: 0.
>>>   
2021-05-22T15:14:12Z Executing 'Collect error information from workers' on 10.8.96.89
2021-05-22T15:14:12Z Collect error information from workers succeeded on 10.8.96.89. Output: 
>>>   
>>>   
2021-05-22T15:14:12Z Executing 'Job environment clean-up' on 10.8.96.89
2021-05-22T15:14:12Z Removing container fastai-custom-image_1621695915_a4bb441e exited with 1, Error: No such container: fastai-custom-image_1621695915_a4bb441e



Metadata

Metadata

Assignees

No one assigned

    Labels

    ADOIssue is documented on MSFT ADO for internal trackingComputeEnvironmentsFailures to provision environmentbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions