This repository contains all the modules required for the vehicle data uploading pipeline to the EIDF-VM, along with tools to convert uploaded recordings into Segments.ai datasets.
The workflow begins with the Rosbag Uploader, which compresses and transfers ROS bag recordings from the vehicle’s storage to the EIDF-VM.
Once uploaded, the New Rosbag Watchdog monitors the EIDF-VM storage and detects the arrival of new recordings. When a new ROS bag is found, the ROS2 Bag Extractor processes it, exporting its contents into system files such as images, point clouds, transformations, and calibration files.
With the exported data ready, the DatasetCreator builds a Segments.ai dataset using both the processed files and the original ROS bag metadata. This stage includes:
- Trajectory Generator – creates a trajectory for each point cloud in the recording and stores the trajectory points in
.tumfile. - Data Uploader – transfers the exported files to an S3 bucket for Segments.ai to access.
- S3 Backup Agent – stores a backup of all newly processed ROS bag recordings in an S3 bucket.
This pipeline ensures that data collected from the vehicle is uploaded, processed, backed up, and prepared for use within the Segments.ai platform.
Requirements:
- The user has access to EIDF virtual machine (EIDF-VM) SSD storage
/mnt/vdb/*and S3 - The user has access to Segments.ai platform
- The user has defined a SSH alias for the EIDF-VM on vehicle's server
- Create an SSH alias in
~/.ssh/configto access the EIDF-VM, if you haven't. Add these two aliases to theconfigfile and make sure toUserpoints to your EIDF-VM usernameHost eidf-gateway HostName eidf-gateway.epcc.ed.ac.uk User <your_eidf_username> IdentityFile ~/.ssh/id_ed25519 Host eidf-vm HostName <eidf_ip> User <your_eidf_username> IdentityFile ~/.ssh/id_ed25519 ProxyJump eidf-gateway
- See EIDF's guide for further reference
- Create an SSH alias in
Setup this repository by defining your access key tokens in a dataset_keys.env file. Create the file named inside a keys directory in the parent directory of this repository:
cd <your_path>/tartan_forklift
mkdir -p keys && touch ./keys/dataset_keys.envAdd the following environment variables to dataset_keys.env:
# EIDF AWS S3
AWS_ACCESS_KEY_ID=my_access_key_id
AWS_SECRET_ACCESS_KEY=my_secret_access_key
AWS_ENDPOINT_URL=my_s3_organisation_url
AWS_BUCKET_NAME=my_bucket_name
AWS_ROSBAG_BACKUP_BUCKET_NAME=my_backup_bucket_name
EIDF_PROJECT_NAME=my_projectxyz
# Segments.ai key
SEGMENTS_API_KEY=my_segment_ai_api_keyBoth dev.sh and runtime.sh scripts will attempt to locate the dataset_keys.env file. If the file is missing or incorrectly named, the script will throw an error. File and path names are case-sensitive.
For access credentials, please contact Hector Cruz or Alejandro Bordallo.
To build and run the Docker container interactively, use:
./runtime.sh -l -p <rosbags_directory> -o <exported_data_directory>where:
<rosbags_directory>: Parent directory containing your ROS bags recordings<exported_data_directory>: Parent directory where the data is/will be exported.
The input directories will be mounted in /opt/ros_ws/rosbags and /opt/ros_ws/exported_data in the container respectively.
If running in dev mode dev.sh, install the Python modules after running the docker:
./dev.sh -l -p <rosbags_directory> -o <exported_data_directory>
pip install -e ./scripts- Log in to the EIDF-VM.
- Ensure you have set up the
tartan_forkliftkeys as described in the Usage guide. - Use the admin S3 account keys and secret from the EIDF Project dashboard.
- Set
AWS_BUCKET_NAMEandAWS_ROSBAG_BACKUP_BUCKET_NAMEenvironment variables in yourdataset_keys.envas described here
Start the data_manager:
cd /your_path/tartan_forklift
./runtime.sh data_manager- Log in to the vehicle's server.
- Default upload sources:
/mnt/sata_ssd_raid/edinburgh(vehicle server)/mnt/vdb/data(EIDF-VM)
If you need a different location, update upload_rosbags/upload_config.yaml.
Activate the Python environment:
source ~/python_default/bin/activateStart the upload:
cd ~/tartan_forklift
python3 -m upload_rosbags.upload_rosbags --config ./upload_rosbags/upload_config.yaml --lftp-password <lxo_user_password>When a rosbag recording is fully uploaded, the data_manager (on the EIDF-VM) detects it and automatically:
- Creates a dataset in Segments.ai.
- Backs up the rosbag to S3.
Each module was designed to run as a standalone script please pass the --help flag to each one of them to check its usage/arguments
This script automates the process of uploading rosbags from the IPAB-RAD autonomous vehicle server to a cloud instance within the EIDF (Edinburgh International Data Facility) infrastructure. It streamlines data collection and transfer by first compressing the rosbags using the MCAP CLI, and then uploading the compressed files. This ensures efficient handling and storage of large datasets generated by vehicle sensors.
Dependencies
-
Vehicle machine (host)
-
Install Python dependencies:
pip install colorlog paramiko paramiko_jump
-
Install MCAP CLI
v0.0.47for rosbag compression:wget -O $HOME/mcap https://github.com/foxglove/mcap/releases/download/releases%2Fmcap-cli%2Fv0.0.47/mcap-linux-amd64 chmod +x $HOME/mcap
-
Set up an FTP server by following this guide
-
-
EIDF (cloud)
- Install
lftp:sudo apt update && sudo apt install lftp
- Install
Usage
To execute the script from the host machine, run:
python3 -m upload_rosbags.upload_rosbags \
--config ./upload_rosbags/upload_config.yaml \
--lftp-password <host_user_password> \
# Add --debug flag to set DEBUG log levelYAML Parameters
The configuration file accepts the following parameters:
local_host_user(str): Username for the host machine.local_hostname(str): IP address or hostname of the host machine (interface connected to the internet).local_rosbags_directory(str): Path to the directory on the host machine containing the rosbags.cloud_ssh_alias(str): SSH alias for the cloud server defined in~/.ssh/config. If unset,cloud_userandcloud_hostnamemust be provided.cloud_user(str): Username for the cloud target machine. Ignored ifcloud_ssh_aliasis defined and valid.cloud_hostname(str): Hostname or IP of the cloud target machine. Ignored ifcloud_ssh_aliasis defined and valid.cloud_upload_directory(str): Destination directory on the cloud server for uploading compressed files.mcap_bin_path(str): Full path to themcapCLI binary.mcap_compression_chunk_size(int): Chunk size in bytes used during MCAP compression.compression_parallel_workers(int): Number of parallel worker threads for compression.compression_queue_max_size(int): Maximum number of compressed rosbags allowed in the queue at any time.
See upload_config.yaml for a sample configuration.
Logging
The script logs its actions to a file named <timestamp>_rosbag_upload.log.
You can export your rosbags with the following command:
./runtime.sh bash
cd /opt/ros_ws
ros2 run ros2_bag_exporter bag_exporter --ros-args \
-p rosbags_directory:=./rosbags/<my_recording_directory> \
-p output_directory:=./exported_data \
-p config_file:=./config/av_sensor_export_config.yamlThe config/av_sensor_export_config.yaml is the default config file that tells ros2_bag_exporter which sensor topics and data formats to use.
The exporter will create a directory inside exported_data/. This directory will contain:
- Extracted point clouds (
.pcd) - Images (
.jpg) export_metadata.yaml
We'll refer to this directory as <data_directory>.
Export the ROSbag recording first and then run:
./runtime.sh bash
dataset_creator --export_directory ./exported_data/<your_exported_directory> \
--recording_directory ./rosbags/<your_recording_directory> \
--dataset_attributes_file ./config/dataset_attributes.jsondataset_creator runs the following sub-modules in the background, but if for some reason you need to run each sub-module individually here is what you need to do:
Create a new dataset on the Segments.ai platform if you haven't already. For consistency, name the dataset exactly the same as your exported <data_directory> directory. On Segments.ai, datasets follow the format organisation_name/dataset_name. Therefore, your full dataset_name should be UniversityOfEdinburgh/<data_directory>_name, where UniversityOfEdinburgh is the organisation name currently in use. This naming convention helps keep your exported data and Segments.ai datasets aligned.
-
Extract the Ego Trajectory from the ROSbag
generate_ego_trajectory <my_path_to_rosbag.mcap> <data_directory>
A
.tumfile with the same name as your rosbag should appear in<data_directory>. -
Upload Data to S3
To upload the extracted data to either EIDF or Segments.ai AWS S3, run:
upload_data <dataset_name> <data_directory> eidf # Or upload_data <dataset_name> <data_directory> segments
If no S3 organisation is specified,
eidfis used by default.After the upload, you should see an
upload_metadata.jsonfile inside<data_directory>. -
Add a multi-sensor sample to Segments.ai
Run the script:
add_segmentsai_sample <dataset_name> <sequence_name> <data_directory>
where:
<dataset_name>: The full dataset name on Segments.ai<sequence_name>: Desired sequence name for the multi-sensor sample- Ensure the sequence name is unique within your dataset; otherwise, the sample will not be uploaded
<data_directory>: Directory with the extracted rosbags and metadata files
If successful, you will see your new sequence listed in the Samples tab on your dataset page.
./runtime.sh bash
s3_backup_agent --recordings_list /opt/ros_ws/rosbags/2025_08_12-12_22_49_meadowsThis script automates the merging of ROS2 bag files using the ros2 bag convert command.
The merging is based on a .yaml file that complies with rosbag2 guidelines.
Before running the Python script, ensure that the Docker container is set up by running the dev.sh script:
./dev.shNote: dev.sh will automatically mount the /mnt/vdb/data directory as the rosbags directory in the container. This is the current default SSD location on the EIDF cloud machine. Use the -p flag to specify a different directory.
Once the Docker container is running, you can execute the Python script with the following command:
merge_rosbags.py --input <input_directory> --config <config_file.yaml> --range <start:end>--input <input_directory>: The directory containing.mcapfiles to be merged.--config <config_file.yaml>: Path to the YAML configuration file used for the merge.- Please refer to the
rosbag_utildirectory androsbag2Converting bagsREADMEfor further reference.
- Please refer to the
--range <start:end>(optional): Range of rosbag split indices to process, in the formatstart:end.
Example:
If we have the following directory structure:
└── rosbags
├── 2024_10_24-14_15_33_haymarket_1
│ ├── 2024_10_24-14_15_33_haymarket_1_0.mcap
│ ├── 2024_10_24-14_15_33_haymarket_1_1.mcap
│ ├── 2024_10_24-14_15_33_haymarket_1_2.mcap
│ ├── 2024_10_24-14_15_33_haymarket_1_3.mcap
│ ├── 2024_10_24-14_15_33_haymarket_1_4.mcap
...We can run the script as:
merge_rosbags.py --input ./rosbags/2024_10_24-14_15_33_haymarket_1 --range 0:2 --config ./rosbag_util/mapping_merge_params.yaml
# Or omit the range to merge all available rosbags
merge_rosbags.py --input ./rosbags/2024_10_24-14_15_33_haymarket_1 --config ./rosbag_util/mapping_merge_params.yamlThe resulting merged .mcap file will be saved within the specified directory under a sub-directory named the same as the value of the uri parameter defined in the YAML file. The file name will be in the format <current_rosbag_name>_<uri>_<from-to>. If no range is specified, the name will be <current_rosbag_name>_<suffix>.
└── rosbags
├── 2024_10_24-14_15_33_haymarket_1
│ ├── 2024_10_24-14_15_33_haymarket_1_0.mcap
│ ├── 2024_10_24-14_15_33_haymarket_1_1.mcap
│ ├── 2024_10_24-14_15_33_haymarket_1_2.mcap
│ ├── 2024_10_24-14_15_33_haymarket_1_3.mcap
│ ├── 2024_10_24-14_15_33_haymarket_1_4.mcap
│ └── mapping
├── 2024_10_24-14_15_33_haymarket_mapping_0-2.mcap
└── 2024_10_24-14_15_33_haymarket_mapping.mcap # Larger file
...