Provides a quick and easy way to get up and running with a DeepRacer training environment in Azure or AWS, using either the Azure N-Series Virtual Machines or AWS EC2 Accelerated Computing instances.
DeepRacer-For-Cloud (DRfC) started as an extension of the work done by Alex (https://github.com/alexschultz/deepracer-for-dummies), which is again a wrapper around the amazing work done by Chris (https://github.com/crr0004/deepracer). With the introduction of the second generation Deepracer Console the repository has been split up. This repository contains the scripts needed to run the training, but depends on Docker Hub to provide pre-built docker images. All the under-the-hood building capabilities have been moved to my Deepracer Build repository.
Main differences to the work done by Alex is:
- Runtime S3 storage is setup to fit the connected cloud platform:
- Azure: Local 'virtual' S3 instance (minio) is now using an Azure Storage Account / Blob Storage as a back-end. This allows for access between sesssions using e.g. Storage Explorer (https://azure.microsoft.com/en-us/features/storage-explorer/).
- AWS: Directly connects to a real S3 bucket.
- Robomaker and Log Analysis containers are extended with required drivers to enable Tensorflow to use the GPU. Containers are all pre-compiled and available from Docker Hub.
- Configuration has been reorganized :
custom_files/hyperparameters.jsonstores the runtime hyperparameters, which logically belongs together with the model_metadata.json and rewards.py files.system.envcontains system-wide constants (expected to be configured only at setup)run.envcontains user session configuration (pretraining, track etc.) as well as information about where to upload your model (S3 bucket and prefix).docker/.envremains the home for more static configuration. This is not expected to change between sessions.
DRfC supports a wide set of features to ensure that you can focus on creating the best model:
- User-friendly
- Modes
- Time Trial
- Object Avoidance
- Head-to-Bot
- Training
- Multiple Robomaker instances per Sagemaker (N:1) to improve training progress.
- Multiple training sessions in parallel - each being (N:1) if hardware supports it - to test out things in parallel.
- Connect multiple nodes together (Swarm-mode only) to combine the powers of multiple computers/instances.
- Evaluation
- Evaluate independently from training.
- Save evaluation run to MP4 file in S3.
- Logging
- Training metrics and trace files are stored to S3.
- Optional integration with AWS CloudWatch.
- Optional exposure of Robomaker internal log-files.
- Technology
- Supports both Docker Swarm (used for connecting multiple nodes together) and Docker Compose (used to support OpenGL)
Full documentation can be found on the Deepracer-for-Cloud GitHub Pages.
- For general support it is suggested to join the AWS DeepRacing Community. The Community Slack has a channel #dr-drfc-setup where the community provides active support.
- Create a GitHub issue if you find an actual code issue, or where updates to documentation would be required.