A professional, extensible framework for generating and collecting scenario data from the CARLA autonomous driving simulator.
Developed for my supervisor's dataset during my internship at Google DeepMind Research, University of York.
- Features
- Quick Start
- Requirements
- Setup
- Usage
- Example Output
- Download Initial Dataset
- Notes
- Contact
- Contributing
- Code Structure
- Example Scenario YAML
- Changelog
- License
- VS Code Dev Container
- Scenario Generation: Easily generate diverse driving scenarios with customizable weather, traffic, and obstructions.
- YAML Configuration: All scenario parameters are defined in human-readable YAML files.
- Automated Data Logging: Collects sensor, vehicle, and environment data for each run.
- Traffic & Weather Simulation: Supports dynamic traffic and weather conditions.
- Unique Output Management: Automatically creates unique output directories for each run.
- Extensive Testing: Includes pytest-based unit tests for all major functions.
- Professional Documentation: Sphinx-generated API docs and usage guides.
File Structure:
carla_data_collection/newscenariogenerator.py— Scenario YAML generatorcarla_data_collection/runscenariofromyaml.py— Scenario runnerconfigs/— Example scenario YAMLsdata/— Output data and logstests/— Unit tests for all major functionsdocs/or_build/html/— Sphinx-generated documentation.devcontainer/devcontainer.json— VS Code Dev Container config
-
Clone the repository:
git clone https://github.com/yourusername/carla-data-collection.git cd carla-data-collection -
Install Python dependencies:
pip install -r requirements.txt
-
(Optional) Download the initial dataset: See Download Initial Dataset.
-
Run the CARLA server (see Setup).
-
Generate scenarios:
python3 carla_data_collection/newscenariogenerator.py
-
Run a scenario:
python3 carla_data_collection/runscenariofromyaml.py configs/example_scenario.yaml --data_collection
- Python 3.7+
- CARLA simulator 0.9.10
- Python dependencies:
carla,numpy,opencv-python,pyyaml,orjson,pygame
Install Python packages with:
pip install -r requirements.txtSet up CARLA with Docker on Ubuntu 22.04+. This setup uses a packaged build inside a Docker container with NVIDIA support.
- Ubuntu 22.04+, tested on Ubuntu 24.04
- Docker
- NVIDIA GPU with drivers installed
- Unreal Engine 4.x (from source, e.g.
UE_4.26)
Edit and export these paths based on your environment:
export UE4_ROOT=/home/your_user/UnrealEngine_4.26 # Path to your Unreal Engine build
export CARLA_DIR=$HOME/carla # Path where CARLA will be clonedsudo apt update
sudo apt install -y docker.io curl git
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list \
| sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt update
sudo apt install -y nvidia-docker2
sudo systemctl restart docker
# Allow Docker GUI access (for CarlaUE4Editor)
xhost +local:docker || truegit clone -b dev https://github.com/carla-simulator/carla.git "$CARLA_DIR"export UE4_ROOT=/home/your_user/UnrealEngine_4.26Run the CARLA container:
cd "$CARLA_DIR"
./Scripts/run_container.shOnce inside the container, run the following:
cd /workspaces/carla
./Update.sh
make PythonAPI
make CarlaUE4Editor
make build.utils
make packagePackaged build will be available in:
/workspaces/carla/Dist/I used Visual Studio's Dev container and loaded the main carla folder after:
nano .devcontainer/devcontainer.jsonThen I used this:
{
"name": "CARLA UE4 Dev (jammy)",
"image": "carla-ue4-jammy-dev",
"updateRemoteUserUID": false,
"customizations": {
"vscode": {
"settings": {
"terminal.integrated.shell.linux": "bash"
},
"extensions": [
"ms-vscode.cpptools"
]
}
},
"postStartCommand": "bash",
"runArgs": [
"--rm",
"--name", "carla-ue4-jammy-devcontainer",
"--hostname", "carla-devcontainer",
"--env", "DISPLAY=${localEnv:DISPLAY}",
"--volume", "/tmp/.X11-unix:/tmp/.X11-unix",
"--volume", "/usr/share/vulkan/icd.d/nvidia_icd.json:/usr/share/vulkan/icd.d/nvidia_icd.json",
"--volume", "${localEnv:UE4_ROOT}:/opt/UE4.26",
"--gpus", "all",
"-p", "2000:2000"
]
}./Dist/CARLA_Shipping_*/LinuxNoEditor/CarlaUE4.sh./CarlaUE4.sh -RenderOffScreen -quality-level=EpicClone this repository in carla/PythonAPI. Then while in the workspace:
sudo apt update
sudo apt install python3-pip -y
pip3 install pyyaml numpy orjson opencv-python pygamepython3 carla_data_collection/runscenariofromyaml.py configs/example_scenario.yaml --data_collectionpython3 carla_data_collection/newscenariogenerator.pyAfter running a scenario, you will find output data (JSON logs, images) in a uniquely named directory under data/.
Example:
data/
scenario_20250811_1530/
frame_00001.json.gz
frame_00002.json.gz
...
camera_00001.png
...
config.yaml
You can download the zipped dataset, configs and data, (~50GB, unzipped ~500GB) from the following link:
https://drive.google.com/file/d/1SQbpy4k_7yYwmBDxhFngg4_YWA4a26Hn/view?usp=sharing
Once downloaded, extract it in the root of the project:
tar -xvzf initialdataset.tar.gz- Ensure CARLA server is running on localhost port 2000 before running scripts.
- Output data and images will be saved in uniquely named output directories under the configured output folder.
- Modify the YAML configs in
configs/to customize weather, traffic, sensors, and other settings.
I'll be active on the email: [email protected] till 22nd August
- Please use Black for code formatting.
- Add type hints and docstrings to all new functions.
- Use the logging module instead of print statements.
- Submit pull requests with clear descriptions.
carla_data_collection/newscenariogenerator.py— Generates scenario YAMLs and launches runs.carla_data_collection/runscenariofromyaml.py— Runs a scenario from a YAML config.configs/— Example scenario YAMLs.data/— Output data and logs.tests/— Unit tests for all major functions.docs/or_build/html/— Sphinx-generated documentation.
weather:
rain: 20.0
fog: 10.0
cloudiness: 50.0
sun_altitude_angle: 45.0
sun_azimuth_angle: 90.0
traffic:
traffic_needed: true
use_sumo: false
number_of_vehicles: 5
sensors:
lane_distance: 2.0
vehicle_distance_radius: 50.0
obstructions:
enabled: false
tile_percentage: 0
output_dir: data/example_run
max_frames: 100See CHANGELOG.md for version history and updates.
This project is licensed under the MIT License.
docs/or_build/html/— Sphinx-generated documentation.
