Multi-Agent Coordination for Latency-Constrained VLMs: Anomaly Detection in Large-Scale Surveillance

This repository is the official evaluation and reproducibility bundle for research on multi-agent coordination, cascade early-exit, and latency-aware vision–language models (VLMs) applied to weakly supervised video anomaly detection in large-scale surveillance.

Remote: github.com/tayyabrehman96/Multi-Agent-Coordination-for-Latency-Constrained-VLMs-Anomaly-Detection-in-Large-Scale-Surveillance

Project overview (figures)

Cascaded multi-agent pipeline and dual-stage perception

The system is built around progressive perception: inexpensive models run first, and early exit avoids invoking heavier models on “easy” frames. The Dual-Stage Perception Module (Stages I–II) combines a reconstruction gate (convolutional autoencoder) with YOLOv8-nano object screening. Frames that remain ambiguous after these stages are eligible for Stage III semantic reasoning with a vision–language model (LLaVA-Next–class), using embedding + prototype matching and a cosine-similarity threshold (\tau_c) (with abstention when confidence is low). Event-driven and cyclical agents coordinate ingestion from multiple cameras over a Pub/Sub broker (Redis) in the full deployment narrative; this repository emphasizes offline evaluation and reproducible metric CSVs.

Latency and frame routing

Because the VLM is orders of magnitude slower than the AE and YOLO stages, latency-constrained coordination is central: most traffic should exit early at Stage I or II. Representative profiling illustrates per-stage latency (note the log scale for milliseconds) and the fraction of frames routed to each stage—only a minority should reach the VLM under a tuned policy.

Operator dashboard (Stages I–II)

The Combined Anomaly Detection Dashboard is an optional operator-facing tool (typically shipped alongside this bundle in a larger monorepo) for browsing large frame corpora. It surfaces Person (YOLO) and Anomaly (Autoencoder) events with reconstruction error, duration in frames, and source paths—suitable for audits and CSV export. It does not replace the paper’s csv/runs/ experiment outputs; see GUIDELINES.md.

Project description (overview)

CVCIMP26 is a research-grade evaluation bundle for Multi-Agent Coordination for Latency-Constrained VLMs: Anomaly Detection in Large-Scale Surveillance. It packages runnable Python tooling to reproduce and extend the quantitative experiments associated with the paper: frame-level detection metrics, cascade ablations, latency accounting, cross-dataset transfer characterization, and multi-stream scaling stress tests.

Problem setting

Large-scale surveillance produces high-volume, continuous video. Running a heavy vision–language model (VLM) on every frame is prohibitively slow and expensive. At the same time, weakly supervised anomaly detection on benchmarks such as UCF-Crime requires calibrated, comparable reporting (e.g. AUROC, AP, F1) under realistic operational constraints. This project addresses the gap by combining progressive inference (cheap gates before expensive reasoning), multi-agent coordination (who runs what, when, under a budget), and documented evaluation scripts that emit versioned, provenance-rich CSV outputs.

Objectives

Provide a transparent pipeline from benchmark manifests and local frame data to paper-aligned metric CSVs under csv/runs/.
Encode the three-stage cascade (reconstruction → object cues → selective VLM semantics) in a way that separates fully runnable components (e.g. autoencoder scoring, YOLO paths where installed) from paper-described Stage III behavior (see honesty notes in MODELS_AND_STACK.md).
Support reproducibility: pinned dependency guidance, experiment launchers (exp_E1–exp_E6), and timestamps on every generated result file.
Document datasets, licenses, and non-redistribution expectations clearly so users can comply with benchmark terms.

Technical contributions (codebase alignment)

Theme	How it appears in this repository
Cascaded early exit	Ablation and latency scripts quantify stage contributions and timing; Stage I/II paths are exercised in evaluation and dashboard-adjacent workflows described in `GUIDELINES.md`.
Latency-constrained VLM use	Latency benchmarks and documentation reflect selective Stage III invocation rather than dense per-frame VLM scoring.
Multi-agent coordination	Scaling and orchestration narratives are supported by `multi_agent_benchmark.py` / E5 and the paper’s agent model; full broker infrastructure may be external.
Semantic stabilization	Described in the manuscript (CLIP-style embeddings, prototypes, abstention); optional Hugging Face stack in `requirements.txt`.
Rigorous metrics	`code/metrics.py` centralizes AUROC/AP/F1 and related reporting for frame-level studies.

Evaluation scope (what you can reproduce here)

E1 — Frame-level results on UCF-oriented frame subsets (see script and data layout).
E2 — Cascade ablation (which stages matter).
E3 — Latency profiling across stages (with documented caveats for simulated or partial Stage III).
E4 — Cross-dataset behavior (ShanghaiTech, XD-Violence loaders and templates).
E5 — Multi-stream / scaling benchmark.
E6 — Autoencoder reconstruction diagnostics.

Exact mapping to LaTeX labels and sections: experiments/README.md. Supporting detail: MODELS_AND_STACK.md, GUIDELINES.md, BENCHMARKS.md.

Intended audience

Researchers reproducing or comparing against the paper’s experimental protocol.
Graduate students learning cascaded VAD + weak-supervision evaluation.
Engineers prototyping surveillance analytics under compute budgets, extending manifests and loaders without rewriting metric code.

Non-goals (explicit)

Hosting full benchmark videos or proprietary weights in git.
Guaranteeing identical numbers across all hardware without your own pinned environment and data snapshot.
Replacing official dataset licenses or redistribution policies — users must obtain data legally.

Suggested GitHub repository metadata

Use the following in the repository About field on GitHub if you like:

Description: Research code: multi-agent coordination and cascaded gates for latency-aware VLM anomaly detection on UCF-Crime, ShanghaiTech, and XD-Violence — evaluation bundle with provenance CSV outputs.
Topics: anomaly-detection video-surveillance weak-supervision vision-language-model pytorch yolov8 ucf-crime shanghai-tech xd-violence multi-agent latency-optimization computer-vision

What this work is about

Surveillance pipelines must balance accuracy, latency, and compute when scenes produce a continuous stream of frames. This project studies a three-stage cascaded detector:

Stage I — Reconstruction gate: A lightweight convolutional autoencoder flags frames whose reconstruction error exceeds a threshold, suppressing obvious normality and reducing downstream load.
Stage II — Object-level screening: A compact detector (YOLOv8-family) provides object-level cues for candidate anomalies.
Stage III — Vision–language reasoning: A VLM (paper: LLaVA-Next–class backbone) performs selective semantic reasoning; CLIP-style text embeddings and prototype matching stabilize category decisions and support abstention when confidence is low.

Multi-agent coordination (event-driven and cyclical agents over a publish–subscribe style design, with deployment narrative including Redis and containers) governs when each stage runs and how partial results are fused under latency constraints.

The code here focuses on offline evaluation: frame-level AUROC / AP / F1, cascade ablations, per-stage latency, cross-dataset behavior, and multi-stream scaling, aligned with the paper’s experimental section.

Repository layout

This tree is the CVCIMP26 bundle: manifests, experiment launchers, mirrored evaluation scripts, and documented outputs.

Path	Role
`code/`	Evaluation scripts (`generate_results.py`, `ablation_study.py`, …)
`code/results_io.py`	Writes provenance-aware rows to `csv/runs/`
`code/metrics.py`	AUROC, AP, F1, ROC helpers
`experiments/`	One-command runners `exp_E1` … `exp_E6` (paper-aligned)
`csv/manifests/`	Benchmark index templates (UCF-Crime, ShanghaiTech, XD-Violence)
`csv/runs/`	Generated metrics (timestamps, hostname); often gitignored locally
`csv/reference/`	Literature-only baseline table (not produced by our runs)
`requirements.txt`	Pinned-style dependency list with optional VLM block
`BENCHMARKS.md`, `MODELS_AND_STACK.md`, `GUIDELINES.md`	Datasets, model stack, operator FAQ

What we do not ship

To respect licenses, size, and reproducibility norms:

Full UCF-Crime, ShanghaiTech Campus, or XD-Violence video archives (download from official sources; see BENCHMARKS.md).
Pretrained checkpoints by default (*.pth / large *.pt); obtain or train your own and place them where scripts expect (typically next to Test/ for the bundled AE demos).
A full live broker stack for every script; the paper describes deployment; this repo emphasizes evaluation and optional dashboard tooling in a larger monorepo if present.

Installation

Environment

python -m venv .venv
.venv\Scripts\activate
pip install --upgrade pip
pip install -r requirements.txt

Use a CUDA build of PyTorch from pytorch.org when available.

Optional VLM dependencies

Uncomment the Hugging Face–related lines at the bottom of requirements.txt when you extend Stage III inference.

Data and checkpoints

Download datasets per BENCHMARKS.md.
Fill or generate manifests under csv/manifests/ (see experiments/load_manifest_example.py).
For the default AE frame demo in generate_results.py, place extracted frames under Test/ (repository working directory for experiments — see below) and autoencoder_model.pth where the script can load it.

Running experiments (paper-aligned)

Experiment launchers set the correct working directory and invoke code/. From the directory that contains this bundle:

Standalone clone (this repository only):

python experiments/exp_E1_frame_level_ucf.py

Inside a monorepo where this folder is named CVCIMP26/ under a parent project root (e.g. full CVC26 tree):

python CVCIMP26/experiments/exp_E1_frame_level_ucf.py

ID	Launcher	Output CSV (`csv/runs/`)
E1	`experiments/exp_E1_frame_level_ucf.py`	`frame_results_ucf.csv`
E2	`experiments/exp_E2_ablation_cascade.py`	`ablation_ucf.csv`
E3	`experiments/exp_E3_latency_profile.py`	`latency_stages.csv`
E4	`experiments/exp_E4_cross_dataset.py`	`cross_dataset.csv`
E5	`experiments/exp_E5_multiagent_scaling.py`	`scaling_streams.csv`
E6	`experiments/exp_E6_ae_training_metrics.py`	`ae_reconstruction_metrics.csv`

LaTeX label crosswalk: experiments/README.md.

Results workflow and provenance

Each run writes CSV rows with generated_utc (UTC ISO) and hostname, so tables can be traced to a specific machine and time. For paper tables, copy values from csv/runs/ into your manuscript; see csv/paper_tables/README.md.

Dashboard exports (if you use a sibling anomaly_dashboard.py in a larger project) are not the same artifact as csv/runs/ — see the FAQ in GUIDELINES.md.

Documentation map

Document	Contents
`GUIDELINES.md`	Architecture → data → CSV; dashboard vs experiments
`MODELS_AND_STACK.md`	Stages I–III; what is fully run vs simulated in places
`BENCHMARKS.md`	Dataset statistics and official links
`experiments/README.md`	E1–E6 ↔ paper / LaTeX labels

Citation

When the camera-ready paper is available, cite it as your primary reference. For benchmark attribution, cite the original datasets (e.g. UCF-Crime: Sultani et al., CVPR 2018). Example placeholder:

@misc{multiagent_vlm_surveillance_2026,
  title        = {Multi-Agent Coordination for Latency-Constrained VLMs: Anomaly Detection in Large-Scale Surveillance},
  howpublished = {GitHub repository},
  year         = {2026},
  note         = {Replace with venue proceedings entry when published}
}

License and contact

This bundle includes an LICENSE file (MIT); change it if your institution requires a different license for the public remote.

Author: Tayyab Rehman — please use GitHub Issues on the public repository for reproducibility questions.

Acknowledgments

Thanks to the creators of UCF-Crime, ShanghaiTech Campus, and XD-Violence, and to the PyTorch, Ultralytics, and Hugging Face communities.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Agent Coordination for Latency-Constrained VLMs: Anomaly Detection in Large-Scale Surveillance

Project overview (figures)

Cascaded multi-agent pipeline and dual-stage perception

Latency and frame routing

Operator dashboard (Stages I–II)

Project description (overview)

Problem setting

Objectives

Technical contributions (codebase alignment)

Evaluation scope (what you can reproduce here)

Intended audience

Non-goals (explicit)

Suggested GitHub repository metadata

What this work is about

Repository layout

What we do not ship

Installation

Environment

Optional VLM dependencies

Data and checkpoints

Running experiments (paper-aligned)

Results workflow and provenance

Documentation map

Citation

License and contact

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
benchmarks/video		benchmarks/video
code		code
csv		csv
docs/images		docs/images
experiments		experiments
.gitignore		.gitignore
BENCHMARKS.md		BENCHMARKS.md
DATASETS.md		DATASETS.md
GUIDELINES.md		GUIDELINES.md
LICENSE		LICENSE
MODELS_AND_STACK.md		MODELS_AND_STACK.md
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Multi-Agent Coordination for Latency-Constrained VLMs: Anomaly Detection in Large-Scale Surveillance

Project overview (figures)

Cascaded multi-agent pipeline and dual-stage perception

Latency and frame routing

Operator dashboard (Stages I–II)

Project description (overview)

Problem setting

Objectives

Technical contributions (codebase alignment)

Evaluation scope (what you can reproduce here)

Intended audience

Non-goals (explicit)

Suggested GitHub repository metadata

What this work is about

Repository layout

What we do not ship

Installation

Environment

Optional VLM dependencies

Data and checkpoints

Running experiments (paper-aligned)

Results workflow and provenance

Documentation map

Citation

License and contact

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages