CARLA Data Collection

A professional, extensible framework for generating and collecting scenario data from the CARLA autonomous driving simulator.
Developed for my supervisor's dataset during my internship at Google DeepMind Research, University of York.

Features

Scenario Generation: Easily generate diverse driving scenarios with customizable weather, traffic, and obstructions.
YAML Configuration: All scenario parameters are defined in human-readable YAML files.
Automated Data Logging: Collects sensor, vehicle, and environment data for each run.
Traffic & Weather Simulation: Supports dynamic traffic and weather conditions.
Unique Output Management: Automatically creates unique output directories for each run.
Extensive Testing: Includes pytest-based unit tests for all major functions.
Professional Documentation: Sphinx-generated API docs and usage guides.

Structure of the repository

File Structure:

carla_data_collection/newscenariogenerator.py — Scenario YAML generator
carla_data_collection/runscenariofromyaml.py — Scenario runner
configs/ — Example scenario YAMLs
data/ — Output data and logs
tests/ — Unit tests for all major functions
docs/ or _build/html/ — Sphinx-generated documentation
.devcontainer/devcontainer.json — VS Code Dev Container config

Quick Start

Clone the repository:

git clone https://github.com/yourusername/carla-data-collection.git
cd carla-data-collection

Install Python dependencies:
```
pip install -r requirements.txt
```
(Optional) Download the initial dataset: See Download Initial Dataset.
Run the CARLA server (see Setup).

Generate scenarios:

python3 carla_data_collection/newscenariogenerator.py

Run a scenario:

python3 carla_data_collection/runscenariofromyaml.py configs/example_scenario.yaml --data_collection

Requirements

Python 3.7+
CARLA simulator 0.9.10
Python dependencies: carla, numpy, opencv-python, pyyaml, orjson, pygame

Install Python packages with:

pip install -r requirements.txt

Setup

Set up CARLA with Docker on Ubuntu 22.04+. This setup uses a packaged build inside a Docker container with NVIDIA support.

Prerequisites

Ubuntu 22.04+, tested on Ubuntu 24.04
Docker
NVIDIA GPU with drivers installed
Unreal Engine 4.x (from source, e.g. UE_4.26)

Configuration

Edit and export these paths based on your environment:

export UE4_ROOT=/home/your_user/UnrealEngine_4.26   # Path to your Unreal Engine build
export CARLA_DIR=$HOME/carla                        # Path where CARLA will be cloned

1. Install Docker and NVIDIA Container Toolkit

sudo apt update
sudo apt install -y docker.io curl git

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list \
  | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt update
sudo apt install -y nvidia-docker2
sudo systemctl restart docker

# Allow Docker GUI access (for CarlaUE4Editor)
xhost +local:docker || true

2. Clone the CARLA Repository

git clone -b dev https://github.com/carla-simulator/carla.git "$CARLA_DIR"

3. Export Unreal Engine Path

export UE4_ROOT=/home/your_user/UnrealEngine_4.26

4. Build CARLA Inside Docker

Run the CARLA container:

cd "$CARLA_DIR"
./Scripts/run_container.sh

Once inside the container, run the following:

cd /workspaces/carla
./Update.sh
make PythonAPI
make CarlaUE4Editor
make build.utils
make package

Packaged build will be available in:

/workspaces/carla/Dist/

Run CARLA

I used Visual Studio's Dev container and loaded the main carla folder after:

nano .devcontainer/devcontainer.json

Then I used this:

{
  "name": "CARLA UE4 Dev (jammy)",
  "image": "carla-ue4-jammy-dev",
  "updateRemoteUserUID": false,
  "customizations": {
    "vscode": {
      "settings": {
        "terminal.integrated.shell.linux": "bash"
      },
      "extensions": [
        "ms-vscode.cpptools"
      ]
    }
  },
  "postStartCommand": "bash",
  "runArgs": [
    "--rm",
    "--name", "carla-ue4-jammy-devcontainer",
    "--hostname", "carla-devcontainer",
    "--env", "DISPLAY=${localEnv:DISPLAY}",
    "--volume", "/tmp/.X11-unix:/tmp/.X11-unix",
    "--volume", "/usr/share/vulkan/icd.d/nvidia_icd.json:/usr/share/vulkan/icd.d/nvidia_icd.json",
    "--volume", "${localEnv:UE4_ROOT}:/opt/UE4.26",
    "--gpus", "all",
    "-p", "2000:2000"
  ]
}

Start the packaged CARLA server:

./Dist/CARLA_Shipping_*/LinuxNoEditor/CarlaUE4.sh

Or run headlessly with epic quality:

./CarlaUE4.sh -RenderOffScreen -quality-level=Epic

Python Client Setup

Clone this repository in carla/PythonAPI. Then while in the workspace:

sudo apt update
sudo apt install python3-pip -y
pip3 install pyyaml numpy orjson opencv-python pygame

Usage

Running a Scenario from YAML

python3 carla_data_collection/runscenariofromyaml.py configs/example_scenario.yaml --data_collection

Generating a set of scenarios

python3 carla_data_collection/newscenariogenerator.py

Example Output

After running a scenario, you will find output data (JSON logs, images) in a uniquely named directory under data/.
Example:

data/
  scenario_20250811_1530/
    frame_00001.json.gz
    frame_00002.json.gz
    ...
    camera_00001.png
    ...
    config.yaml

Download Initial Dataset

You can download the zipped dataset, configs and data, (~50GB, unzipped ~500GB) from the following link:
https://drive.google.com/file/d/1SQbpy4k_7yYwmBDxhFngg4_YWA4a26Hn/view?usp=sharing

Once downloaded, extract it in the root of the project:

tar -xvzf initialdataset.tar.gz

Notes

Ensure CARLA server is running on localhost port 2000 before running scripts.
Output data and images will be saved in uniquely named output directories under the configured output folder.
Modify the YAML configs in configs/ to customize weather, traffic, sensors, and other settings.

Contact

I'll be active on the email: [email protected] till 22nd August

Contributing

Please use Black for code formatting.
Add type hints and docstrings to all new functions.
Use the logging module instead of print statements.
Submit pull requests with clear descriptions.

Code Structure

carla_data_collection/newscenariogenerator.py — Generates scenario YAMLs and launches runs.
carla_data_collection/runscenariofromyaml.py — Runs a scenario from a YAML config.
configs/ — Example scenario YAMLs.
data/ — Output data and logs.
tests/ — Unit tests for all major functions.
docs/ or _build/html/ — Sphinx-generated documentation.

Example Scenario YAML

weather:
  rain: 20.0
  fog: 10.0
  cloudiness: 50.0
  sun_altitude_angle: 45.0
  sun_azimuth_angle: 90.0
traffic:
  traffic_needed: true
  use_sumo: false
  number_of_vehicles: 5
sensors:
  lane_distance: 2.0
  vehicle_distance_radius: 50.0
obstructions:
  enabled: false
  tile_percentage: 0
output_dir: data/example_run
max_frames: 100

Changelog

See CHANGELOG.md for version history and updates.

License

This project is licensed under the MIT License.

docs/ or _build/html/ — Sphinx-generated documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
_build		_build
carla_data_collection		carla_data_collection
configs		configs
data		data
tests		tests
.coverage		.coverage
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
conf.py		conf.py
coverage.svg		coverage.svg
docs.svg		docs.svg
environment.yml		environment.yml
index.rst		index.rst
make.bat		make.bat
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CARLA Data Collection

Table of Contents

Features

Structure of the repository

Quick Start

Requirements

Setup

Prerequisites

Configuration

1. Install Docker and NVIDIA Container Toolkit

2. Clone the CARLA Repository

3. Export Unreal Engine Path

4. Build CARLA Inside Docker

Run CARLA

Start the packaged CARLA server:

Or run headlessly with epic quality:

Python Client Setup

Usage

Running a Scenario from YAML

Generating a set of scenarios

Example Output

Download Initial Dataset

Notes

Contact

Contributing

Code Structure

Example Scenario YAML

Changelog

License

About

Uh oh!

Releases

Packages

Languages

License

SamBradley2024/carla-data-collection

Folders and files

Latest commit

History

Repository files navigation

CARLA Data Collection

Table of Contents

Features

Structure of the repository

Quick Start

Requirements

Setup

Prerequisites

Configuration

1. Install Docker and NVIDIA Container Toolkit

2. Clone the CARLA Repository

3. Export Unreal Engine Path

4. Build CARLA Inside Docker

Run CARLA

Start the packaged CARLA server:

Or run headlessly with epic quality:

Python Client Setup

Usage

Running a Scenario from YAML

Generating a set of scenarios

Example Output

Download Initial Dataset

Notes

Contact

Contributing

Code Structure

Example Scenario YAML

Changelog

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages