Open
Description
Running the fastai custom docker example does not work
Things I tried:
- Tried switching to CPU and changing the script to just print "hello world". Same error
- Tried to use
DockerConfiguration
b/c I get a warning thatdocker.enabled = True
is deprecated but there is no real documentation for that and it is opaque on how to use it (and many of your examples don't use it).- Can the warning be clarified? When will it be deprecated? It is not deprecated as of right now.
- Can you provide an end to end example of the right way to use
DockerConfiguration
Requests
- Can you please correct this example as this is one of the only examples on how to use a custom Docker container?
- If this notebook cannot be corrected can you replace it because it might cause lots of confusion for fastai students or people trying to do this in the wild. cc: @jph00
I've included the logs from my attempted AML Run from this notebook below 👇🏽
2021-05-22T15:06:36Z Successfully mounted a/an Blobfuse File System at /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/mounts/workspaceblobstore
2021-05-22T15:06:36Z Starting output-watcher...
2021-05-22T15:06:37Z IsDedicatedCompute == True, won't poll for Low Pri Preemption
2021-05-22T15:06:37Z Executing 'Copy ACR Details file' on 10.8.96.89
2021-05-22T15:06:37Z Copy ACR Details file succeeded on 10.8.96.89. Output:
>>>
>>>
2021-05-22T15:06:52Z Running Docker Command attempt 1 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
. See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
2021-05-22T15:06:57Z Force Restart Docker Service
2021-05-22T15:06:57Z
2021-05-22T15:06:57Z Waiting for docker daemon to come up.
2021-05-22T15:06:57Z Docker daemon is active
2021-05-22T15:06:57Z Retry Docker Command...
2021-05-22T15:07:12Z Running Docker Command attempt 2 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
. See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
2021-05-22T15:07:21Z Force Restart Docker Service
2021-05-22T15:07:21Z
2021-05-22T15:07:22Z Waiting for docker daemon to come up.
2021-05-22T15:07:22Z Docker daemon is active
2021-05-22T15:07:22Z Retry Docker Command...
2021-05-22T15:07:37Z Running Docker Command attempt 3 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
. See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
2021-05-22T15:07:53Z Force Restart Docker Service
2021-05-22T15:07:53Z
2021-05-22T15:07:53Z Waiting for docker daemon to come up.
2021-05-22T15:07:53Z Docker daemon is active
2021-05-22T15:07:53Z Retry Docker Command...
2021-05-22T15:08:09Z Running Docker Command attempt 4 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
. See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
2021-05-22T15:08:41Z Force Restart Docker Service
2021-05-22T15:08:42Z
2021-05-22T15:08:42Z Waiting for docker daemon to come up.
2021-05-22T15:08:42Z Docker daemon is active
2021-05-22T15:08:42Z Retry Docker Command...
2021-05-22T15:08:57Z Running Docker Command attempt 5 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
. See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
2021-05-22T15:10:02Z Force Restart Docker Service
2021-05-22T15:10:02Z
2021-05-22T15:10:02Z Waiting for docker daemon to come up.
2021-05-22T15:10:02Z Docker daemon is active
2021-05-22T15:10:02Z Retry Docker Command...
2021-05-22T15:10:17Z Running Docker Command attempt 6 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
. See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
2021-05-22T15:12:26Z Force Restart Docker Service
2021-05-22T15:12:26Z
2021-05-22T15:12:26Z Waiting for docker daemon to come up.
2021-05-22T15:12:26Z Docker daemon is active
2021-05-22T15:12:26Z Retry Docker Command...
2021-05-22T15:12:41Z Running Docker Command attempt 7 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
. See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
2021-05-22T15:13:11Z Job environment preparation failed on 10.8.96.89. Output:
>>> 2021/05/22 15:06:35 Starting App Insight Logger for task: prepareJobEnvironment
>>> 2021/05/22 15:06:35 Version: 3.0.01597.0004 Branch: 2021-05-17-bing-hotfix Commit: 974f3e4
>>> 2021/05/22 15:06:35 runtime.GOOS linux
>>> 2021/05/22 15:06:35 Checking if '/tmp' exists
>>> 2021/05/22 15:06:35 Reading dyanamic configs
>>> 2021/05/22 15:06:35 Container sas url: https://baiscriptseastusprod.blob.core.windows.net/aihosttools?sv=2018-03-28&sr=c&si=aihosttoolspolicy&sig=gCpFfTbL8hPl%2BzV43hBdfOZC4SuKqZoJraIo10S4%2FYw%3D
>>> 2021/05/22 15:06:35 Failed to read from file /mnt/batch/tasks/startup/wd/az_resource/xdsenv.variable/azsecpack.variables, open /mnt/batch/tasks/startup/wd/az_resource/xdsenv.variable/azsecpack.variables: no such file or directory
>>> 2021/05/22 15:06:35 [in autoUpgradeFromJobNodeSetup] Is Azsecpack installer on host: false. Is Azsecpack enabled: false,
>>> 2021/05/22 15:06:35 Starting Azsecpack installation on machine: bf9f722d45714167beb968edcea13f1600000E#398a6654-997b-47e9-b12b-9515b896b4de#91095667-e119-4555-acea-1826488492f0#ds-tengri-resources-eastus#dds-ml-east#dds-ml
>>> 2021/05/22 15:06:35 Is Azsecpack enabled: false, GetDisableVsatlsscan: true
>>> 2021/05/22 15:06:35 Turning off azsecpack, if it is already running
>>> 2021/05/22 15:06:35 [doTurnOffAzsecpack] output:Unit mdsd.service could not be found.
>>> ,err:exit status 1.
>>> 2021/05/22 15:06:35 OS patching disabled by dynamic configs. Skipping.
>>> 2021/05/22 15:06:35 Job: AZ_BATCHAI_JOB_NAME does not turn on the DetonationChamber
>>> 2021/05/22 15:06:35 Start to getting gpu count by running nvidia-smi command
>>> 2021/05/22 15:06:35 GPU count found on the node: 0
>>> 2021/05/22 15:06:35 AMLComputeXDSEndpoint: https://6e64c585-4845-4356-b1e0-a28ca62f252a.workspace.eastus.cert.api.azureml.ms/xdsbatchai
>>> 2021/05/22 15:06:35 AMLComputeXDSApiVersion: 2018-02-01
>>> 2021/05/22 15:06:35 Creating directory /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/config
>>> 2021/05/22 15:06:35 This is not a aml-workstation (compute instance), current offer type: amlcompute. Starting identity responder as part of prepareJobEnvironment.
>>> 2021/05/22 15:06:35 Starting identity responder.
>>> 2021/05/22 15:06:35 Starting identity responder.
>>> 2021/05/22 15:06:35 Failed to open file /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/config/.batchai.IdentityResponder.envlist: open /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/config/.batchai.IdentityResponder.envlist: no such file or directory
>>> 2021/05/22 15:06:35 Logfile used for identity responder: /mnt/batch/tasks/workitems/051f9434-a110-4ced-be03-f37876075345/job-1/fastai-custom-image__f2308802-f3fd-4f69-9710-581502704959/IdentityResponderLog-tvmps_63c0616b393d93f50f271aee1053d8f6130f081c9a609118eb8f1295575dc40c_d.txt
>>> 2021/05/22 15:06:35 Logfile used for identity responder: /mnt/batch/tasks/workitems/051f9434-a110-4ced-be03-f37876075345/job-1/fastai-custom-image__f2308802-f3fd-4f69-9710-581502704959/IdentityResponderLog-tvmps_63c0616b393d93f50f271aee1053d8f6130f081c9a609118eb8f1295575dc40c_d.txt
>>> 2021/05/22 15:06:35 Started Identity Responder for job.
>>> 2021/05/22 15:06:35 Started Identity Responder for job.
>>> 2021/05/22 15:06:35 Creating directory /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/wd
>>> 2021/05/22 15:06:35 Creating directory /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/shared
>>> 2021/05/22 15:06:35 From the policy service, the filtering patterns is: , data store is
>>> 2021/05/22 15:06:35 Mounting job level file systems
>>> 2021/05/22 15:06:35 Creating directory /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/mounts
>>> 2021/05/22 15:06:35 Attempting to read datastore credentials file: /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/config/.amlcompute.datastorecredentials
>>> 2021/05/22 15:06:35 Datastore credentials file not found, skipping.
>>> 2021/05/22 15:06:35 Attempting to read runtime sas tokens file: /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/config/.master.runtimesastokens
>>> 2021/05/22 15:06:35 Runtime sas tokens file not found, skipping.
>>> 2021/05/22 15:06:35 No NFS configured
>>> 2021/05/22 15:06:35 No Azure File Shares configured
>>> 2021/05/22 15:06:35 Mounting blob file systems
>>> 2021/05/22 15:06:35 Blobfuse runtime version 1.3.6
>>> 2021/05/22 15:06:35 Mounting azureml-blobstore-6e64c585-4845-4356-b1e0-a28ca62f252a container from ddsmleast9411768689 account at /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/mounts/workspaceblobstore
>>> 2021/05/22 15:06:35 Using Compute Identity to authenticate Blobfuse: false.
>>> 2021/05/22 15:06:35 Using Compute Identity to authenticate Blobfuse: false.
>>> 2021/05/22 15:06:35 Blobfuse cache size set to 11257 MB.
>>> 2021/05/22 15:06:35 Running following command: /bin/bash -c sudo blobfuse /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/mounts/workspaceblobstore --tmp-path=/mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/caches/workspaceblobstore --file-cache-timeout-in-seconds=1000000 --cache-size-mb=11257 -o nonempty -o allow_other --config-file=/mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/configs/workspaceblobstore.cfg --log-level=LOG_WARNING
>>> 2021/05/22 15:06:35 Successfully mounted a/an Blobfuse File System at /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/mounts/workspaceblobstore
>>> 2021/05/22 15:06:36 Waiting for blobfs to be mounted at /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/mounts/workspaceblobstore
>>> 2021/05/22 15:06:36 Successfully mounted azureml-blobstore-6e64c585-4845-4356-b1e0-a28ca62f252a container from ddsmleast9411768689 account at /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/mounts/workspaceblobstore
>>> 2021/05/22 15:06:36 Created run_id directory: /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/mounts/workspaceblobstore/azureml/fastai-custom-image_1621695915_a4bb441e
>>> 2021/05/22 15:06:36 No unmanaged file systems configured
>>> 2021/05/22 15:06:36 Start to getting gpu count by running nvidia-smi command
>>> 2021/05/22 15:06:36 From the policy service, the filtering patterns is: , data store is
>>> 2021/05/22 15:06:36 Creating directory /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/mounts/workspaceblobstore/azureml/fastai-custom-image_1621695915_a4bb441e/azureml_compute_logs
>>> 2021/05/22 15:06:36 Creating directory /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/mounts/workspaceblobstore/azureml/fastai-custom-image_1621695915_a4bb441e/logs
>>> 2021/05/22 15:06:36 Creating directory /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/mounts/workspaceblobstore/azureml/fastai-custom-image_1621695915_a4bb441e/outputs
>>> 2021/05/22 15:06:36 Starting output-watcher...
>>> 2021/05/22 15:06:36 Single file input dataset is enabled.
>>> 2021/05/22 15:06:36 Start to pulling docker image: fastdotai/fastai:latest
>>> 2021/05/22 15:06:36 Start pull docker image: fastdotai
>>> 2021/05/22 15:06:36 Getting credentials for image fastdotai/fastai:latest with url
>>> 2021/05/22 15:06:36 Container registry is not ACR.
>>> 2021/05/22 15:06:36 Skip getting ACR Credentials from Identity and will be getting it from EMS
>>> 2021/05/22 15:06:36 Getting ACR Credentials from EMS for environment fastai:Autosave_2021-05-22T15:05:19Z_b9284463
>>> 2021/05/22 15:06:36 Requesting XDS for registry details.
>>> 2021/05/22 15:06:36 Attempt 1 of http call to https://6e64c585-4845-4356-b1e0-a28ca62f252a.workspace.eastus.cert.api.azureml.ms/xdsbatchai/hosttoolapi/subscriptions/91095667-e119-4555-acea-1826488492f0/resourceGroups/ds-tengri-resources-eastus/workspaces/dds-ml-east/clusters/dds-ml/nodes/tvmps_63c0616b393d93f50f271aee1053d8f6130f081c9a609118eb8f1295575dc40c_d?api-version=2018-02-01
>>> 2021/05/22 15:06:37 Got container registry details from credentials service for registry address: .
>>> 2021/05/22 15:06:37 Writing ACR Details to file...
>>> 2021/05/22 15:06:37 Copying ACR Details file to worker nodes...
>>> 2021/05/22 15:06:37 Executing 'Copy ACR Details file' on 10.8.96.89
>>> 2021/05/22 15:06:37 Begin executing 'Copy ACR Details file' task on Node
>>> 2021/05/22 15:06:37 'Copy ACR Details file' task Node result: succeeded
>>> 2021/05/22 15:06:37 Copy ACR Details file succeeded on 10.8.96.89. Output:
>>> >>>
>>> >>>
>>> 2021/05/22 15:06:37 EncryptedDockerRegistryPassword is empty.
>>> 2021/05/22 15:06:37 EMS returned empty credentials for environment fastai
>>> 2021/05/22 15:06:37 Save docker credentials for image fastdotai/fastai:latest in /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/wd/docker_login_6FE00B6271AD80D6
>>> 2021/05/22 15:06:37 The login info is empty, skipping login to the docker registry.
>>> 2021/05/22 15:06:37 Start run pull docker image command
>>> 2021/05/22 15:06:40 Not exporting to RunHistory as the exporter is either stopped or there is no data.
>>> Stopped: false
>>> OriginalData: 18
>>> FilteredData: 0.
>>> 2021/05/22 15:06:52 Running Docker Command attempt 1 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>> . See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
>>> 2021/05/22 15:06:52 Running Docker Command attempt 1 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>> . See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
>>> 2021/05/22 15:06:57 Force Restart Docker Service
>>> 2021/05/22 15:06:57 Force Restart Docker Service
>>> 2021/05/22 15:06:57
>>> 2021/05/22 15:06:57
>>> 2021/05/22 15:06:57 Last 20 lines of Docker daemon log file, fetched after force restart:
>>> time="2021-05-22T15:06:57.302348600Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/containerd/containerd.sock 0 <nil>}] <nil>}" module=grpc
>>> time="2021-05-22T15:06:57.302735600Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
>>> time="2021-05-22T15:06:57.303901500Z" level=info msg="parsed scheme: \"unix\"" module=grpc
>>> time="2021-05-22T15:06:57.303923800Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
>>> time="2021-05-22T15:06:57.303956200Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/containerd/containerd.sock 0 <nil>}] <nil>}" module=grpc
>>> time="2021-05-22T15:06:57.303966600Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
>>> time="2021-05-22T15:06:57.309015500Z" level=info msg="[graphdriver] using prior storage driver: overlay2"
>>> time="2021-05-22T15:06:57.311528200Z" level=warning msg="Your kernel does not support swap memory limit"
>>> time="2021-05-22T15:06:57.311564200Z" level=warning msg="Your kernel does not support cgroup rt period"
>>> time="2021-05-22T15:06:57.311571500Z" level=warning msg="Your kernel does not support cgroup rt runtime"
>>> time="2021-05-22T15:06:57.311577100Z" level=warning msg="Your kernel does not support cgroup blkio weight"
>>> time="2021-05-22T15:06:57.311582400Z" level=warning msg="Your kernel does not support cgroup blkio weight_device"
>>> time="2021-05-22T15:06:57.311699200Z" level=info msg="Loading containers: start."
>>> time="2021-05-22T15:06:57.393926400Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
>>> time="2021-05-22T15:06:57.424473200Z" level=info msg="Loading containers: done."
>>> time="2021-05-22T15:06:57.439214700Z" level=warning msg="Not using native diff for overlay2, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled" storage-driver=overlay2
>>> time="2021-05-22T15:06:57.440205600Z" level=info msg="Docker daemon" commit=7d75c1d40d88ddef08653dbd611f41df42bdf087 graphdriver(s)=overlay2 version=19.03.14+azure
>>> time="2021-05-22T15:06:57.440454200Z" level=info msg="Daemon has completed initialization"
>>> time="2021-05-22T15:06:57.463106300Z" level=info msg="API listen on /var/run/docker.sock"
>>> Started Docker Application Container Engine.
>>>
>>> 2021/05/22 15:06:57 Finished restarting docker service if needed
>>> 2021/05/22 15:06:57 Waiting for docker daemon to come up.
>>> 2021/05/22 15:06:57 Waiting for docker daemon to come up.
>>> 2021/05/22 15:06:57 Docker daemon is active
>>> 2021/05/22 15:06:57 Docker daemon is active
>>> 2021/05/22 15:06:57 Retry Docker Command...
>>> 2021/05/22 15:06:57 Retry Docker Command...
>>> 2021/05/22 15:07:12 Running Docker Command attempt 2 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>> . See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
>>> 2021/05/22 15:07:12 Running Docker Command attempt 2 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>> . See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
>>> 2021/05/22 15:07:21 Force Restart Docker Service
>>> 2021/05/22 15:07:21 Force Restart Docker Service
>>> 2021/05/22 15:07:21
>>> 2021/05/22 15:07:21
>>> 2021/05/22 15:07:21 Last 20 lines of Docker daemon log file, fetched after force restart:
>>> time="2021-05-22T15:07:21.719788000Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/containerd/containerd.sock 0 <nil>}] <nil>}" module=grpc
>>> time="2021-05-22T15:07:21.719798300Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
>>> time="2021-05-22T15:07:21.721117500Z" level=info msg="parsed scheme: \"unix\"" module=grpc
>>> time="2021-05-22T15:07:21.721284300Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
>>> time="2021-05-22T15:07:21.721449400Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/containerd/containerd.sock 0 <nil>}] <nil>}" module=grpc
>>> time="2021-05-22T15:07:21.721592000Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
>>> time="2021-05-22T15:07:21.729531400Z" level=info msg="[graphdriver] using prior storage driver: overlay2"
>>> time="2021-05-22T15:07:21.731169400Z" level=warning msg="Your kernel does not support swap memory limit"
>>> time="2021-05-22T15:07:21.731189800Z" level=warning msg="Your kernel does not support cgroup rt period"
>>> time="2021-05-22T15:07:21.731196600Z" level=warning msg="Your kernel does not support cgroup rt runtime"
>>> time="2021-05-22T15:07:21.731202200Z" level=warning msg="Your kernel does not support cgroup blkio weight"
>>> time="2021-05-22T15:07:21.731207600Z" level=warning msg="Your kernel does not support cgroup blkio weight_device"
>>> time="2021-05-22T15:07:21.731574600Z" level=info msg="Loading containers: start."
>>> time="2021-05-22T15:07:21.820365400Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
>>> time="2021-05-22T15:07:21.858103900Z" level=info msg="Loading containers: done."
>>> time="2021-05-22T15:07:21.874279400Z" level=warning msg="Not using native diff for overlay2, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled" storage-driver=overlay2
>>> time="2021-05-22T15:07:21.874631700Z" level=info msg="Docker daemon" commit=7d75c1d40d88ddef08653dbd611f41df42bdf087 graphdriver(s)=overlay2 version=19.03.14+azure
>>> time="2021-05-22T15:07:21.874679200Z" level=info msg="Daemon has completed initialization"
>>> time="2021-05-22T15:07:21.887387900Z" level=info msg="API listen on /var/run/docker.sock"
>>> Started Docker Application Container Engine.
>>>
>>> 2021/05/22 15:07:22 Finished restarting docker service if needed
>>> 2021/05/22 15:07:22 Waiting for docker daemon to come up.
>>> 2021/05/22 15:07:22 Waiting for docker daemon to come up.
>>> 2021/05/22 15:07:22 Docker daemon is active
>>> 2021/05/22 15:07:22 Docker daemon is active
>>> 2021/05/22 15:07:22 Retry Docker Command...
>>> 2021/05/22 15:07:22 Retry Docker Command...
>>> 2021/05/22 15:07:37 Running Docker Command attempt 3 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>> . See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
>>> 2021/05/22 15:07:37 Running Docker Command attempt 3 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>> . See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
>>> 2021/05/22 15:07:53 Force Restart Docker Service
>>> 2021/05/22 15:07:53 Force Restart Docker Service
>>> 2021/05/22 15:07:53
>>> 2021/05/22 15:07:53
>>> 2021/05/22 15:07:53 Last 20 lines of Docker daemon log file, fetched after force restart:
>>> time="2021-05-22T15:07:53.712328900Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/containerd/containerd.sock 0 <nil>}] <nil>}" module=grpc
>>> time="2021-05-22T15:07:53.712338500Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
>>> time="2021-05-22T15:07:53.714142100Z" level=info msg="parsed scheme: \"unix\"" module=grpc
>>> time="2021-05-22T15:07:53.714241000Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
>>> time="2021-05-22T15:07:53.714448200Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/containerd/containerd.sock 0 <nil>}] <nil>}" module=grpc
>>> time="2021-05-22T15:07:53.714533300Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
>>> time="2021-05-22T15:07:53.722682800Z" level=info msg="[graphdriver] using prior storage driver: overlay2"
>>> time="2021-05-22T15:07:53.723940700Z" level=warning msg="Your kernel does not support swap memory limit"
>>> time="2021-05-22T15:07:53.723958800Z" level=warning msg="Your kernel does not support cgroup rt period"
>>> time="2021-05-22T15:07:53.723966800Z" level=warning msg="Your kernel does not support cgroup rt runtime"
>>> time="2021-05-22T15:07:53.723972800Z" level=warning msg="Your kernel does not support cgroup blkio weight"
>>> time="2021-05-22T15:07:53.723978700Z" level=warning msg="Your kernel does not support cgroup blkio weight_device"
>>> time="2021-05-22T15:07:53.724092000Z" level=info msg="Loading containers: start."
>>> time="2021-05-22T15:07:53.805591800Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
>>> time="2021-05-22T15:07:53.843376900Z" level=info msg="Loading containers: done."
>>> time="2021-05-22T15:07:53.856206300Z" level=warning msg="Not using native diff for overlay2, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled" storage-driver=overlay2
>>> time="2021-05-22T15:07:53.856574000Z" level=info msg="Docker daemon" commit=7d75c1d40d88ddef08653dbd611f41df42bdf087 graphdriver(s)=overlay2 version=19.03.14+azure
>>> time="2021-05-22T15:07:53.856712700Z" level=info msg="Daemon has completed initialization"
>>> Started Docker Application Container Engine.
>>> time="2021-05-22T15:07:53.871094800Z" level=info msg="API listen on /var/run/docker.sock"
>>>
>>> 2021/05/22 15:07:53 Finished restarting docker service if needed
>>> 2021/05/22 15:07:53 Waiting for docker daemon to come up.
>>> 2021/05/22 15:07:53 Waiting for docker daemon to come up.
>>> 2021/05/22 15:07:53 Docker daemon is active
>>> 2021/05/22 15:07:53 Docker daemon is active
>>> 2021/05/22 15:07:53 Retry Docker Command...
>>> 2021/05/22 15:07:53 Retry Docker Command...
>>> 2021/05/22 15:08:09 Running Docker Command attempt 4 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>> . See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
>>> 2021/05/22 15:08:09 Running Docker Command attempt 4 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>> . See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
>>> 2021/05/22 15:08:41 Force Restart Docker Service
>>> 2021/05/22 15:08:41 Force Restart Docker Service
>>> 2021/05/22 15:08:42
>>> 2021/05/22 15:08:42
>>> 2021/05/22 15:08:42 Last 20 lines of Docker daemon log file, fetched after force restart:
>>> time="2021-05-22T15:08:42.087658100Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/containerd/containerd.sock 0 <nil>}] <nil>}" module=grpc
>>> time="2021-05-22T15:08:42.087667500Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
>>> time="2021-05-22T15:08:42.089030500Z" level=info msg="parsed scheme: \"unix\"" module=grpc
>>> time="2021-05-22T15:08:42.089058200Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
>>> time="2021-05-22T15:08:42.089076100Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/containerd/containerd.sock 0 <nil>}] <nil>}" module=grpc
>>> time="2021-05-22T15:08:42.089089600Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
>>> time="2021-05-22T15:08:42.098016300Z" level=info msg="[graphdriver] using prior storage driver: overlay2"
>>> time="2021-05-22T15:08:42.099754700Z" level=warning msg="Your kernel does not support swap memory limit"
>>> time="2021-05-22T15:08:42.099778000Z" level=warning msg="Your kernel does not support cgroup rt period"
>>> time="2021-05-22T15:08:42.099785600Z" level=warning msg="Your kernel does not support cgroup rt runtime"
>>> time="2021-05-22T15:08:42.099791900Z" level=warning msg="Your kernel does not support cgroup blkio weight"
>>> time="2021-05-22T15:08:42.099798100Z" level=warning msg="Your kernel does not support cgroup blkio weight_device"
>>> time="2021-05-22T15:08:42.099938600Z" level=info msg="Loading containers: start."
>>> time="2021-05-22T15:08:42.180795500Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
>>> time="2021-05-22T15:08:42.212181600Z" level=info msg="Loading containers: done."
>>> time="2021-05-22T15:08:42.228966500Z" level=warning msg="Not using native diff for overlay2, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled" storage-driver=overlay2
>>> time="2021-05-22T15:08:42.229346600Z" level=info msg="Docker daemon" commit=7d75c1d40d88ddef08653dbd611f41df42bdf087 graphdriver(s)=overlay2 version=19.03.14+azure
>>> time="2021-05-22T15:08:42.229412500Z" level=info msg="Daemon has completed initialization"
>>> time="2021-05-22T15:08:42.242934500Z" level=info msg="API listen on /var/run/docker.sock"
>>> Started Docker Application Container Engine.
>>>
>>> 2021/05/22 15:08:42 Finished restarting docker service if needed
>>> 2021/05/22 15:08:42 Waiting for docker daemon to come up.
>>> 2021/05/22 15:08:42 Waiting for docker daemon to come up.
>>> 2021/05/22 15:08:42 Docker daemon is active
>>> 2021/05/22 15:08:42 Docker daemon is active
>>> 2021/05/22 15:08:42 Retry Docker Command...
>>> 2021/05/22 15:08:42 Retry Docker Command...
>>> 2021/05/22 15:08:57 Running Docker Command attempt 5 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>> . See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
>>> 2021/05/22 15:08:57 Running Docker Command attempt 5 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>> . See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
>>> 2021/05/22 15:10:02 Force Restart Docker Service
>>> 2021/05/22 15:10:02 Force Restart Docker Service
>>> 2021/05/22 15:10:02
>>> 2021/05/22 15:10:02
>>> 2021/05/22 15:10:02 Last 20 lines of Docker daemon log file, fetched after force restart:
>>> time="2021-05-22T15:10:02.554674900Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/containerd/containerd.sock 0 <nil>}] <nil>}" module=grpc
>>> time="2021-05-22T15:10:02.554683700Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
>>> time="2021-05-22T15:10:02.555814300Z" level=info msg="parsed scheme: \"unix\"" module=grpc
>>> time="2021-05-22T15:10:02.556471100Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
>>> time="2021-05-22T15:10:02.556488600Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/containerd/containerd.sock 0 <nil>}] <nil>}" module=grpc
>>> time="2021-05-22T15:10:02.556501300Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
>>> time="2021-05-22T15:10:02.564340600Z" level=info msg="[graphdriver] using prior storage driver: overlay2"
>>> time="2021-05-22T15:10:02.565848300Z" level=warning msg="Your kernel does not support swap memory limit"
>>> time="2021-05-22T15:10:02.565884700Z" level=warning msg="Your kernel does not support cgroup rt period"
>>> time="2021-05-22T15:10:02.565891600Z" level=warning msg="Your kernel does not support cgroup rt runtime"
>>> time="2021-05-22T15:10:02.565896900Z" level=warning msg="Your kernel does not support cgroup blkio weight"
>>> time="2021-05-22T15:10:02.565903600Z" level=warning msg="Your kernel does not support cgroup blkio weight_device"
>>> time="2021-05-22T15:10:02.566121800Z" level=info msg="Loading containers: start."
>>> time="2021-05-22T15:10:02.651460900Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
>>> time="2021-05-22T15:10:02.682343000Z" level=info msg="Loading containers: done."
>>> time="2021-05-22T15:10:02.698387100Z" level=warning msg="Not using native diff for overlay2, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled" storage-driver=overlay2
>>> time="2021-05-22T15:10:02.698780400Z" level=info msg="Docker daemon" commit=7d75c1d40d88ddef08653dbd611f41df42bdf087 graphdriver(s)=overlay2 version=19.03.14+azure
>>> time="2021-05-22T15:10:02.698851500Z" level=info msg="Daemon has completed initialization"
>>> time="2021-05-22T15:10:02.715795400Z" level=info msg="API listen on /var/run/docker.sock"
>>> Started Docker Application Container Engine.
>>>
>>> 2021/05/22 15:10:02 Finished restarting docker service if needed
>>> 2021/05/22 15:10:02 Waiting for docker daemon to come up.
>>> 2021/05/22 15:10:02 Waiting for docker daemon to come up.
>>> 2021/05/22 15:10:02 Docker daemon is active
>>> 2021/05/22 15:10:02 Docker daemon is active
>>> 2021/05/22 15:10:02 Retry Docker Command...
>>> 2021/05/22 15:10:02 Retry Docker Command...
>>> 2021/05/22 15:10:17 Running Docker Command attempt 6 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>> . See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
>>> 2021/05/22 15:10:17 Running Docker Command attempt 6 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>> . See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
>>> 2021/05/22 15:12:26 Force Restart Docker Service
>>> 2021/05/22 15:12:26 Force Restart Docker Service
>>> 2021/05/22 15:12:26
>>> 2021/05/22 15:12:26
>>> 2021/05/22 15:12:26 Last 20 lines of Docker daemon log file, fetched after force restart:
>>> time="2021-05-22T15:12:26.277283800Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/containerd/containerd.sock 0 <nil>}] <nil>}" module=grpc
>>> time="2021-05-22T15:12:26.277294100Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
>>> time="2021-05-22T15:12:26.279050800Z" level=info msg="parsed scheme: \"unix\"" module=grpc
>>> time="2021-05-22T15:12:26.279075400Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
>>> time="2021-05-22T15:12:26.279089200Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/containerd/containerd.sock 0 <nil>}] <nil>}" module=grpc
>>> time="2021-05-22T15:12:26.279097600Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
>>> time="2021-05-22T15:12:26.283940400Z" level=info msg="[graphdriver] using prior storage driver: overlay2"
>>> time="2021-05-22T15:12:26.285551700Z" level=warning msg="Your kernel does not support swap memory limit"
>>> time="2021-05-22T15:12:26.285572600Z" level=warning msg="Your kernel does not support cgroup rt period"
>>> time="2021-05-22T15:12:26.285579900Z" level=warning msg="Your kernel does not support cgroup rt runtime"
>>> time="2021-05-22T15:12:26.285585600Z" level=warning msg="Your kernel does not support cgroup blkio weight"
>>> time="2021-05-22T15:12:26.285591500Z" level=warning msg="Your kernel does not support cgroup blkio weight_device"
>>> time="2021-05-22T15:12:26.285715100Z" level=info msg="Loading containers: start."
>>> time="2021-05-22T15:12:26.364858000Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
>>> time="2021-05-22T15:12:26.396268400Z" level=info msg="Loading containers: done."
>>> time="2021-05-22T15:12:26.412820700Z" level=warning msg="Not using native diff for overlay2, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled" storage-driver=overlay2
>>> time="2021-05-22T15:12:26.413208200Z" level=info msg="Docker daemon" commit=7d75c1d40d88ddef08653dbd611f41df42bdf087 graphdriver(s)=overlay2 version=19.03.14+azure
>>> time="2021-05-22T15:12:26.413260000Z" level=info msg="Daemon has completed initialization"
>>> time="2021-05-22T15:12:26.432611400Z" level=info msg="API listen on /var/run/docker.sock"
>>> Started Docker Application Container Engine.
>>>
>>> 2021/05/22 15:12:26 Finished restarting docker service if needed
>>> 2021/05/22 15:12:26 Waiting for docker daemon to come up.
>>> 2021/05/22 15:12:26 Waiting for docker daemon to come up.
>>> 2021/05/22 15:12:26 Docker daemon is active
>>> 2021/05/22 15:12:26 Docker daemon is active
>>> 2021/05/22 15:12:26 Retry Docker Command...
>>> 2021/05/22 15:12:26 Retry Docker Command...
>>> 2021/05/22 15:12:41 Running Docker Command attempt 7 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>> . See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
>>> 2021/05/22 15:12:41 Running Docker Command attempt 7 failed with client timeout err: exit status 1,Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>> . See documentation for error details: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-faq#docker-pull-fails-with-error-nethttp-request-canceled-while-waiting-for-connection-clienttimeout-exceeded-while-awaiting-headers
>>> 2021/05/22 15:12:41 Run docker command to pull public image failed with error: Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>> .
>>> 2021/05/22 15:12:41 Docker config dir /mnt/batch/tasks/shared/LS_root/jobs/dds-ml-east/azureml/fastai-custom-image_1621695915_a4bb441e/wd/docker_login_6FE00B6271AD80D6 does not exist, skip removing it
>>> 2021/05/22 15:12:41 Pull docker image time: 6m4.6839193s
>>>
>>> 2021/05/22 15:12:41 Get credentials or pull docker image failed with err: Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>> , skipping start Docker Container
>>> 2021/05/22 15:12:41 Starting Container fail with err Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>>
>>> 2021/05/22 15:12:41 Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>>
>>> 2021/05/22 15:12:45 Attempt 1 of http call to https://6e64c585-4845-4356-b1e0-a28ca62f252a.workspace.eastus.api.azureml.ms/history/v1.0/private/subscriptions/91095667-e119-4555-acea-1826488492f0/resourceGroups/ds-tengri-resources-eastus/providers/Microsoft.MachineLearningServices/workspaces/DDS-ML-EAST/runs/fastai-custom-image_1621695915_a4bb441e/spans
>>> 2021/05/22 15:13:11 Time Out after 20 second retries for flushing the logs, doing another retry before exiting
>>> 2021/05/22 15:13:11 Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
>>>
>>>
2021-05-22T15:13:11Z PostJobNodeHealthCheck
2021-05-22T15:13:11Z Executing 'Post job node health check' on 10.8.96.89
2021-05-22T15:13:41Z Post job node health check succeeded on 10.8.96.89. Output:
>>> 2021/05/22 15:13:11 Starting App Insight Logger for task: postJobNodeHealthCheck
>>> 2021/05/22 15:13:11 Version: 3.0.01597.0004 Branch: 2021-05-17-bing-hotfix Commit: 974f3e4
>>> 2021/05/22 15:13:11 Start Post-job node health check
>>> 2021/05/22 15:13:11 PostJobNodeHealthCheck
>>> 2021/05/22 15:13:11 GetDBE: get DBE error
>>> 2021/05/22 15:13:11 No system error was found
>>> 2021/05/22 15:13:11 DBEOutput:
>>> 2021/05/22 15:13:11 GetOOM: get OOM error
>>> 2021/05/22 15:13:11 No system error was found
>>> 2021/05/22 15:13:11 Skipping NCCL CUDA Error Check because it's not enabled in dynamic config
>>> 2021/05/22 15:13:11 This is a cpu cluster, skipping gpu usage check
>>> 2021/05/22 15:13:11 Not exporting to RunHistory as the exporter is either stopped or there is no data.
>>> Stopped: false
>>> OriginalData: 1
>>> FilteredData: 0.
>>> 2021/05/22 15:13:11 Process Exiting with Code: 0
>>> 2021/05/22 15:13:41 Time Out after 20 second retries for flushing the logs, doing another retry before exiting
>>>
2021-05-22T15:13:41Z Executing 'JobRelease task' on 10.8.96.89
2021-05-22T15:14:12Z JobRelease task succeeded on 10.8.96.89. Output:
>>> 2021/05/22 15:13:41 Starting App Insight Logger for task: jobRelease
>>> 2021/05/22 15:13:41 Version: 3.0.01597.0004 Branch: 2021-05-17-bing-hotfix Commit: 974f3e4
>>> 2021/05/22 15:13:42 Exit since job container is not in running state.
>>> 2021/05/22 15:14:12 Time Out after 20 second retries for flushing the logs, doing another retry before exiting
>>> 2021/05/22 15:14:12 App Insight Client has already been closed
>>> 2021/05/22 15:14:12 Not exporting to RunHistory as the exporter is either stopped or there is no data.
>>> Stopped: false
>>> OriginalData: 1
>>> FilteredData: 0.
>>>
2021-05-22T15:14:12Z Executing 'Collect error information from workers' on 10.8.96.89
2021-05-22T15:14:12Z Collect error information from workers succeeded on 10.8.96.89. Output:
>>>
>>>
2021-05-22T15:14:12Z Executing 'Job environment clean-up' on 10.8.96.89
2021-05-22T15:14:12Z Removing container fastai-custom-image_1621695915_a4bb441e exited with 1, Error: No such container: fastai-custom-image_1621695915_a4bb441e