Skip to content

tayyabrehman96/Multi-Agent-Coordination-for-Latency-Constrained-VLMs-Anomaly-Detection-in-Large-Scale-Surveillance

Repository files navigation

Multi-Agent Coordination for Latency-Constrained VLMs: Anomaly Detection in Large-Scale Surveillance

Python 3.10+ PyTorch 2.0+ License: MIT

This repository is the official evaluation and reproducibility bundle for research on multi-agent coordination, cascade early-exit, and latency-aware vision–language models (VLMs) applied to weakly supervised video anomaly detection in large-scale surveillance.

Remote: github.com/tayyabrehman96/Multi-Agent-Coordination-for-Latency-Constrained-VLMs-Anomaly-Detection-in-Large-Scale-Surveillance


Project overview (figures)

Cascaded multi-agent pipeline and dual-stage perception

The system is built around progressive perception: inexpensive models run first, and early exit avoids invoking heavier models on “easy” frames. The Dual-Stage Perception Module (Stages I–II) combines a reconstruction gate (convolutional autoencoder) with YOLOv8-nano object screening. Frames that remain ambiguous after these stages are eligible for Stage III semantic reasoning with a vision–language model (LLaVA-Next–class), using embedding + prototype matching and a cosine-similarity threshold (\tau_c) (with abstention when confidence is low). Event-driven and cyclical agents coordinate ingestion from multiple cameras over a Pub/Sub broker (Redis) in the full deployment narrative; this repository emphasizes offline evaluation and reproducible metric CSVs.

Cascaded multi-agent anomaly detection pipeline (dual-stage perception through VLM and decision layer)

Latency and frame routing

Because the VLM is orders of magnitude slower than the AE and YOLO stages, latency-constrained coordination is central: most traffic should exit early at Stage I or II. Representative profiling illustrates per-stage latency (note the log scale for milliseconds) and the fraction of frames routed to each stage—only a minority should reach the VLM under a tuned policy.

Per-stage latency comparison and frame routing / early-exit distribution

Operator dashboard (Stages I–II)

The Combined Anomaly Detection Dashboard is an optional operator-facing tool (typically shipped alongside this bundle in a larger monorepo) for browsing large frame corpora. It surfaces Person (YOLO) and Anomaly (Autoencoder) events with reconstruction error, duration in frames, and source paths—suitable for audits and CSV export. It does not replace the paper’s csv/runs/ experiment outputs; see GUIDELINES.md.

Combined anomaly detection dashboard (example run on large frame set)


Project description (overview)

CVCIMP26 is a research-grade evaluation bundle for Multi-Agent Coordination for Latency-Constrained VLMs: Anomaly Detection in Large-Scale Surveillance. It packages runnable Python tooling to reproduce and extend the quantitative experiments associated with the paper: frame-level detection metrics, cascade ablations, latency accounting, cross-dataset transfer characterization, and multi-stream scaling stress tests.

Problem setting

Large-scale surveillance produces high-volume, continuous video. Running a heavy vision–language model (VLM) on every frame is prohibitively slow and expensive. At the same time, weakly supervised anomaly detection on benchmarks such as UCF-Crime requires calibrated, comparable reporting (e.g. AUROC, AP, F1) under realistic operational constraints. This project addresses the gap by combining progressive inference (cheap gates before expensive reasoning), multi-agent coordination (who runs what, when, under a budget), and documented evaluation scripts that emit versioned, provenance-rich CSV outputs.

Objectives

  • Provide a transparent pipeline from benchmark manifests and local frame data to paper-aligned metric CSVs under csv/runs/.
  • Encode the three-stage cascade (reconstruction → object cues → selective VLM semantics) in a way that separates fully runnable components (e.g. autoencoder scoring, YOLO paths where installed) from paper-described Stage III behavior (see honesty notes in MODELS_AND_STACK.md).
  • Support reproducibility: pinned dependency guidance, experiment launchers (exp_E1exp_E6), and timestamps on every generated result file.
  • Document datasets, licenses, and non-redistribution expectations clearly so users can comply with benchmark terms.

Technical contributions (codebase alignment)

Theme How it appears in this repository
Cascaded early exit Ablation and latency scripts quantify stage contributions and timing; Stage I/II paths are exercised in evaluation and dashboard-adjacent workflows described in GUIDELINES.md.
Latency-constrained VLM use Latency benchmarks and documentation reflect selective Stage III invocation rather than dense per-frame VLM scoring.
Multi-agent coordination Scaling and orchestration narratives are supported by multi_agent_benchmark.py / E5 and the paper’s agent model; full broker infrastructure may be external.
Semantic stabilization Described in the manuscript (CLIP-style embeddings, prototypes, abstention); optional Hugging Face stack in requirements.txt.
Rigorous metrics code/metrics.py centralizes AUROC/AP/F1 and related reporting for frame-level studies.

Evaluation scope (what you can reproduce here)

  • E1 — Frame-level results on UCF-oriented frame subsets (see script and data layout).
  • E2 — Cascade ablation (which stages matter).
  • E3Latency profiling across stages (with documented caveats for simulated or partial Stage III).
  • E4Cross-dataset behavior (ShanghaiTech, XD-Violence loaders and templates).
  • E5Multi-stream / scaling benchmark.
  • E6 — Autoencoder reconstruction diagnostics.

Exact mapping to LaTeX labels and sections: experiments/README.md. Supporting detail: MODELS_AND_STACK.md, GUIDELINES.md, BENCHMARKS.md.

Intended audience

  • Researchers reproducing or comparing against the paper’s experimental protocol.
  • Graduate students learning cascaded VAD + weak-supervision evaluation.
  • Engineers prototyping surveillance analytics under compute budgets, extending manifests and loaders without rewriting metric code.

Non-goals (explicit)

  • Hosting full benchmark videos or proprietary weights in git.
  • Guaranteeing identical numbers across all hardware without your own pinned environment and data snapshot.
  • Replacing official dataset licenses or redistribution policies — users must obtain data legally.

Suggested GitHub repository metadata

Use the following in the repository About field on GitHub if you like:

  • Description: Research code: multi-agent coordination and cascaded gates for latency-aware VLM anomaly detection on UCF-Crime, ShanghaiTech, and XD-Violence — evaluation bundle with provenance CSV outputs.
  • Topics: anomaly-detection video-surveillance weak-supervision vision-language-model pytorch yolov8 ucf-crime shanghai-tech xd-violence multi-agent latency-optimization computer-vision

What this work is about

Surveillance pipelines must balance accuracy, latency, and compute when scenes produce a continuous stream of frames. This project studies a three-stage cascaded detector:

  1. Stage I — Reconstruction gate: A lightweight convolutional autoencoder flags frames whose reconstruction error exceeds a threshold, suppressing obvious normality and reducing downstream load.
  2. Stage II — Object-level screening: A compact detector (YOLOv8-family) provides object-level cues for candidate anomalies.
  3. Stage III — Vision–language reasoning: A VLM (paper: LLaVA-Next–class backbone) performs selective semantic reasoning; CLIP-style text embeddings and prototype matching stabilize category decisions and support abstention when confidence is low.

Multi-agent coordination (event-driven and cyclical agents over a publish–subscribe style design, with deployment narrative including Redis and containers) governs when each stage runs and how partial results are fused under latency constraints.

The code here focuses on offline evaluation: frame-level AUROC / AP / F1, cascade ablations, per-stage latency, cross-dataset behavior, and multi-stream scaling, aligned with the paper’s experimental section.


Repository layout

This tree is the CVCIMP26 bundle: manifests, experiment launchers, mirrored evaluation scripts, and documented outputs.

Path Role
code/ Evaluation scripts (generate_results.py, ablation_study.py, …)
code/results_io.py Writes provenance-aware rows to csv/runs/
code/metrics.py AUROC, AP, F1, ROC helpers
experiments/ One-command runners exp_E1exp_E6 (paper-aligned)
csv/manifests/ Benchmark index templates (UCF-Crime, ShanghaiTech, XD-Violence)
csv/runs/ Generated metrics (timestamps, hostname); often gitignored locally
csv/reference/ Literature-only baseline table (not produced by our runs)
requirements.txt Pinned-style dependency list with optional VLM block
BENCHMARKS.md, MODELS_AND_STACK.md, GUIDELINES.md Datasets, model stack, operator FAQ

What we do not ship

To respect licenses, size, and reproducibility norms:

  • Full UCF-Crime, ShanghaiTech Campus, or XD-Violence video archives (download from official sources; see BENCHMARKS.md).
  • Pretrained checkpoints by default (*.pth / large *.pt); obtain or train your own and place them where scripts expect (typically next to Test/ for the bundled AE demos).
  • A full live broker stack for every script; the paper describes deployment; this repo emphasizes evaluation and optional dashboard tooling in a larger monorepo if present.

Installation

Environment

python -m venv .venv
.venv\Scripts\activate
pip install --upgrade pip
pip install -r requirements.txt

Use a CUDA build of PyTorch from pytorch.org when available.

Optional VLM dependencies

Uncomment the Hugging Face–related lines at the bottom of requirements.txt when you extend Stage III inference.


Data and checkpoints

  1. Download datasets per BENCHMARKS.md.
  2. Fill or generate manifests under csv/manifests/ (see experiments/load_manifest_example.py).
  3. For the default AE frame demo in generate_results.py, place extracted frames under Test/ (repository working directory for experiments — see below) and autoencoder_model.pth where the script can load it.

Running experiments (paper-aligned)

Experiment launchers set the correct working directory and invoke code/. From the directory that contains this bundle:

Standalone clone (this repository only):

python experiments/exp_E1_frame_level_ucf.py

Inside a monorepo where this folder is named CVCIMP26/ under a parent project root (e.g. full CVC26 tree):

python CVCIMP26/experiments/exp_E1_frame_level_ucf.py
ID Launcher Output CSV (csv/runs/)
E1 experiments/exp_E1_frame_level_ucf.py frame_results_ucf.csv
E2 experiments/exp_E2_ablation_cascade.py ablation_ucf.csv
E3 experiments/exp_E3_latency_profile.py latency_stages.csv
E4 experiments/exp_E4_cross_dataset.py cross_dataset.csv
E5 experiments/exp_E5_multiagent_scaling.py scaling_streams.csv
E6 experiments/exp_E6_ae_training_metrics.py ae_reconstruction_metrics.csv

LaTeX label crosswalk: experiments/README.md.


Results workflow and provenance

Each run writes CSV rows with generated_utc (UTC ISO) and hostname, so tables can be traced to a specific machine and time. For paper tables, copy values from csv/runs/ into your manuscript; see csv/paper_tables/README.md.

Dashboard exports (if you use a sibling anomaly_dashboard.py in a larger project) are not the same artifact as csv/runs/ — see the FAQ in GUIDELINES.md.


Documentation map

Document Contents
GUIDELINES.md Architecture → data → CSV; dashboard vs experiments
MODELS_AND_STACK.md Stages I–III; what is fully run vs simulated in places
BENCHMARKS.md Dataset statistics and official links
experiments/README.md E1–E6 ↔ paper / LaTeX labels

Citation

When the camera-ready paper is available, cite it as your primary reference. For benchmark attribution, cite the original datasets (e.g. UCF-Crime: Sultani et al., CVPR 2018). Example placeholder:

@misc{multiagent_vlm_surveillance_2026,
  title        = {Multi-Agent Coordination for Latency-Constrained VLMs: Anomaly Detection in Large-Scale Surveillance},
  howpublished = {GitHub repository},
  year         = {2026},
  note         = {Replace with venue proceedings entry when published}
}

License and contact

This bundle includes an LICENSE file (MIT); change it if your institution requires a different license for the public remote.

Author: Tayyab Rehman — please use GitHub Issues on the public repository for reproducibility questions.


Acknowledgments

Thanks to the creators of UCF-Crime, ShanghaiTech Campus, and XD-Violence, and to the PyTorch, Ultralytics, and Hugging Face communities.

About

Research code: multi-agent coordination and cascaded gates for latency-aware VLM anomaly detection on UCF-Crime, ShanghaiTech, and XD-Violence — evaluation bundle with provenance CSV outputs.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages