Skip to content

Sanctuaria-Gaze is a multimodal dataset of egocentric recordings from visits to four sanctuaries and an open-source framework for automatic gaze-based analysis. ACM JOCCH 2025.

License

Notifications You must be signed in to change notification settings

aimagelab/Sanctuaria-Gaze

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sanctuaria-Gaze

Paper Dataset

Sanctuaria-Gaze is a multimodal dataset featuring egocentric recordings from 40 visits to four architecturally and culturally significant sanctuaries in Italy. Collected using Meta Project Aria Glasses, the dataset includes synchronized RGB videos, gaze raw data, head motion, and environmental point cloud data, totaling over 4 hours of recordings.

output.mp4

Alongside the data, we provide an open-source framework for automatic detection and analysis of Areas of Interest (AOIs), enabling gaze-based research without manual annotation.


Dataset Overview

The dataset is publicly available at the Hugging Face Hub.

This dataset represents the largest multimodal egocentric dataset focused on religious heritage. Each recording captures a eal-world visual exploration of sacred spaces, allowing researchers to analyze human visual attention, spatial navigation, and interaction with architectural and religious elements.

recording_sample

Data Acquisition

Participants were equipped with Meta Project Aria Glasses, which integrate multiple synchronized sensors.
Each video was captured at a resolution of 1408×1408 pixels and a frame rate of 15 fps.

The following sensor streams were recorded:

Modality Sensor Frequency Description
RGB Video RGB camera 15 Hz Egocentric view of the environment
Gaze Eye-tracking cameras 30 Hz Timestamped gaze coordinates for visual attention analysis
Depth & SLAM SLAM cameras 15 Hz Depth estimation and 3D scene reconstruction
IMU Accelerometer / Gyroscope 1000 Hz / 800 Hz Head motion and inertial measurements
Magnetometer Magnetic field sensor 10 Hz Orientation and spatial context
Barometer Pressure sensor 50 Hz Altitude and environmental pressure

Experimental Protocol

Each participant was asked to freely explore one of the churches while wearing the glasses, simulating a natural visit.
They were given complete autonomy over their movements and activities, without specific instructions or tasks.

Before every recording, the glasses were calibrated following the official Project Aria calibration procedure to ensure accurate gaze tracking.
To minimize social or environmental interference, each participant explored the church individually, ensuring undisturbed, natural behavior.

  • Participants: 20
  • Sequences: 40 (10 per sanctuary)
  • Average duration: 6.7 minutes
  • Total duration: 4.47 hours
  • Total frames: 241,355

Analysis Framework

Along with the dataset, we provide an open-source framework for AOI detection and gaze-based analysis.

Features

  • Batch or single-file processing
  • IDT-based scanpath generation
  • Automatic AOI detection with pretrained models
  • Frame extraction and annotation
  • Annotated video generation
  • Configurable command-line interface

Installation

Clone the repository and install dependencies:

git clone https://github.com/aimagelab/Sanctuaria-Gaze.git
cd Sanctuaria-Gaze
pip install -r requirements.txt

Usage

Single File

python annotate.py --idt --verbose path/to/subject_gaze.csv path/to/subject.mp4

Batch Folder

python annotate.py --idt --verbose path/to/folder/

Options

  • --idt : Run IDT scanpath generation
  • --no-extract : Skip frame extraction
  • --no-annotate : Skip annotation
  • --no-video : Skip video creation
  • --idt-dis-threshold FLOAT : Set IDT dispersion threshold (default: 0.05)
  • --idt-dur-threshold INT : Set IDT duration threshold (default: 100)
  • --stop-frame INT : Stop after this frame number
  • --verbose : Enable verbose logging

Input Format

The input _gaze.csv file must include the following columns:

  • gaze_timestamp
  • world_index
  • confidence
  • norm_pos_x
  • norm_pos_y

each row represents a single gaze sample, with normalized gaze positions (norm_pos_x, norm_pos_y) ranging from 0 to 1, and timestamps (gaze_timestamp) in seconds. The confidence column indicates the reliability of each gaze point, and world_index corresponds to the frame index in the associated video.

Object Classes

The list of object names (classes) used for detection must be specified in a plain text file named object_classes.txt.
Each line should contain a single object name. Lines starting with # are treated as comments and ignored.

Example object_classes.txt:

# Religious and architectural objects
altar
crucifix
pews
pulpit
chalice
...

This file should be placed in the project directory.
The detection models (e.g., YOLOv8_World, OWLv2) will automatically load object names from this file at runtime.

Example Demo

Suppose you have the following files in data/:

  • subject1_gaze.csv
  • subject1.mp4
  • subject2_gaze.csv
  • subject2.mp4

To process all pairs in the folder:

python annotate.py --idt data/

To process a single pair with verbose output and custom IDT thresholds:

python annotate.py data/subject1_gaze.csv data/subject1.mp4 --idt --idt-dis-threshold 0.07 --idt-dur-threshold 120 --verbose

Output

  • Annotated CSV files and processed videos are saved in the output directory.
  • Temporary extracted frames and intermediate files are cleaned up automatically.
example

License

This project is licensed under the MIT License. See the LICENSE file for details.

Citation

If you use the dataset or the tool in your research, please cite:

@article{cartella2025sanctuaria,
	title={Sanctuaria-Gaze: A Multimodal Egocentric Dataset for Human Attention Analysis in Religious Sites},
	author={Cartella, Giuseppe and Cuculo, Vittorio and Cornia, Marcella and Papasidero, Marco and Ruozzi, Federico and Cucchiara, Rita},
	journal={Journal on Computing and Cultural Heritage (JOCCH)},
	year={2025},
	publisher={ACM New York, NY, USA},
    doi={10.1145/3769091}
}

Acknowledgment

This work has been supported by the PNRR project “Italian Strengthening of Esfri RI Resilience (ITSERR)” funded by the European Union - NextGenerationEU (CUP B53C22001770006).

itserr

About

Sanctuaria-Gaze is a multimodal dataset of egocentric recordings from visits to four sanctuaries and an open-source framework for automatic gaze-based analysis. ACM JOCCH 2025.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages