Skip to content

SamBradley2024/carla-data-collection

Repository files navigation

Google DeepMind Logo

CARLA Data Collection

Python 3.7+ License: MIT Build Status Docs Coverage

A professional, extensible framework for generating and collecting scenario data from the CARLA autonomous driving simulator.
Developed for my supervisor's dataset during my internship at Google DeepMind Research, University of York.


Table of Contents


Features

  • Scenario Generation: Easily generate diverse driving scenarios with customizable weather, traffic, and obstructions.
  • YAML Configuration: All scenario parameters are defined in human-readable YAML files.
  • Automated Data Logging: Collects sensor, vehicle, and environment data for each run.
  • Traffic & Weather Simulation: Supports dynamic traffic and weather conditions.
  • Unique Output Management: Automatically creates unique output directories for each run.
  • Extensive Testing: Includes pytest-based unit tests for all major functions.
  • Professional Documentation: Sphinx-generated API docs and usage guides.

Structure of the repository

File Structure:

  • carla_data_collection/newscenariogenerator.py — Scenario YAML generator
  • carla_data_collection/runscenariofromyaml.py — Scenario runner
  • configs/ — Example scenario YAMLs
  • data/ — Output data and logs
  • tests/ — Unit tests for all major functions
  • docs/ or _build/html/ — Sphinx-generated documentation
  • .devcontainer/devcontainer.json — VS Code Dev Container config

Quick Start

  1. Clone the repository:

    git clone https://github.com/yourusername/carla-data-collection.git
    cd carla-data-collection
  2. Install Python dependencies:

    pip install -r requirements.txt
  3. (Optional) Download the initial dataset: See Download Initial Dataset.

  4. Run the CARLA server (see Setup).

  5. Generate scenarios:

    python3 carla_data_collection/newscenariogenerator.py
  6. Run a scenario:

    python3 carla_data_collection/runscenariofromyaml.py configs/example_scenario.yaml --data_collection

Requirements

  • Python 3.7+
  • CARLA simulator 0.9.10
  • Python dependencies: carla, numpy, opencv-python, pyyaml, orjson, pygame

Install Python packages with:

pip install -r requirements.txt

Setup

Set up CARLA with Docker on Ubuntu 22.04+. This setup uses a packaged build inside a Docker container with NVIDIA support.

Prerequisites

  • Ubuntu 22.04+, tested on Ubuntu 24.04
  • Docker
  • NVIDIA GPU with drivers installed
  • Unreal Engine 4.x (from source, e.g. UE_4.26)

Configuration

Edit and export these paths based on your environment:

export UE4_ROOT=/home/your_user/UnrealEngine_4.26   # Path to your Unreal Engine build
export CARLA_DIR=$HOME/carla                        # Path where CARLA will be cloned

1. Install Docker and NVIDIA Container Toolkit

sudo apt update
sudo apt install -y docker.io curl git

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list \
  | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt update
sudo apt install -y nvidia-docker2
sudo systemctl restart docker

# Allow Docker GUI access (for CarlaUE4Editor)
xhost +local:docker || true

2. Clone the CARLA Repository

git clone -b dev https://github.com/carla-simulator/carla.git "$CARLA_DIR"

3. Export Unreal Engine Path

export UE4_ROOT=/home/your_user/UnrealEngine_4.26

4. Build CARLA Inside Docker

Run the CARLA container:

cd "$CARLA_DIR"
./Scripts/run_container.sh

Once inside the container, run the following:

cd /workspaces/carla
./Update.sh
make PythonAPI
make CarlaUE4Editor
make build.utils
make package

Packaged build will be available in:

/workspaces/carla/Dist/

Run CARLA

I used Visual Studio's Dev container and loaded the main carla folder after:

nano .devcontainer/devcontainer.json

Then I used this:

{
  "name": "CARLA UE4 Dev (jammy)",
  "image": "carla-ue4-jammy-dev",
  "updateRemoteUserUID": false,
  "customizations": {
    "vscode": {
      "settings": {
        "terminal.integrated.shell.linux": "bash"
      },
      "extensions": [
        "ms-vscode.cpptools"
      ]
    }
  },
  "postStartCommand": "bash",
  "runArgs": [
    "--rm",
    "--name", "carla-ue4-jammy-devcontainer",
    "--hostname", "carla-devcontainer",
    "--env", "DISPLAY=${localEnv:DISPLAY}",
    "--volume", "/tmp/.X11-unix:/tmp/.X11-unix",
    "--volume", "/usr/share/vulkan/icd.d/nvidia_icd.json:/usr/share/vulkan/icd.d/nvidia_icd.json",
    "--volume", "${localEnv:UE4_ROOT}:/opt/UE4.26",
    "--gpus", "all",
    "-p", "2000:2000"
  ]
}

Start the packaged CARLA server:

./Dist/CARLA_Shipping_*/LinuxNoEditor/CarlaUE4.sh

Or run headlessly with epic quality:

./CarlaUE4.sh -RenderOffScreen -quality-level=Epic

Python Client Setup

Clone this repository in carla/PythonAPI. Then while in the workspace:

sudo apt update
sudo apt install python3-pip -y
pip3 install pyyaml numpy orjson opencv-python pygame

Usage

Running a Scenario from YAML

python3 carla_data_collection/runscenariofromyaml.py configs/example_scenario.yaml --data_collection

Generating a set of scenarios

python3 carla_data_collection/newscenariogenerator.py

Example Output

After running a scenario, you will find output data (JSON logs, images) in a uniquely named directory under data/.
Example:

data/
  scenario_20250811_1530/
    frame_00001.json.gz
    frame_00002.json.gz
    ...
    camera_00001.png
    ...
    config.yaml

Download Initial Dataset

You can download the zipped dataset, configs and data, (~50GB, unzipped ~500GB) from the following link:
https://drive.google.com/file/d/1SQbpy4k_7yYwmBDxhFngg4_YWA4a26Hn/view?usp=sharing

Once downloaded, extract it in the root of the project:

tar -xvzf initialdataset.tar.gz

Notes

  • Ensure CARLA server is running on localhost port 2000 before running scripts.
  • Output data and images will be saved in uniquely named output directories under the configured output folder.
  • Modify the YAML configs in configs/ to customize weather, traffic, sensors, and other settings.

Contact

I'll be active on the email: [email protected] till 22nd August


Contributing

  • Please use Black for code formatting.
  • Add type hints and docstrings to all new functions.
  • Use the logging module instead of print statements.
  • Submit pull requests with clear descriptions.

Code Structure

  • carla_data_collection/newscenariogenerator.py — Generates scenario YAMLs and launches runs.
  • carla_data_collection/runscenariofromyaml.py — Runs a scenario from a YAML config.
  • configs/ — Example scenario YAMLs.
  • data/ — Output data and logs.
  • tests/ — Unit tests for all major functions.
  • docs/ or _build/html/ — Sphinx-generated documentation.

Example Scenario YAML

weather:
  rain: 20.0
  fog: 10.0
  cloudiness: 50.0
  sun_altitude_angle: 45.0
  sun_azimuth_angle: 90.0
traffic:
  traffic_needed: true
  use_sumo: false
  number_of_vehicles: 5
sensors:
  lane_distance: 2.0
  vehicle_distance_radius: 50.0
obstructions:
  enabled: false
  tile_percentage: 0
output_dir: data/example_run
max_frames: 100

Changelog

See CHANGELOG.md for version history and updates.


License

This project is licensed under the MIT License.

  • docs/ or _build/html/ — Sphinx-generated documentation.

About

Scripts used to collect data for my supervisors dataset during my internship at Google Deepmind Research at Uni of York

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published