[CI] Add inference performance regression tests by AjAnubolu · Pull Request #1140 · hao-ai-lab/FastVideo

AjAnubolu · 2026-02-27T20:09:19Z

Summary

Adds automated performance CI measuring generation latency and peak GPU memory for Wan2.1-T2V-1.3B
1 warmup + 3 measurement runs with device-aware thresholds (L40S initial placeholders)
Writes JSON results for future trend tracking
Triggers on changes to dits, pipelines, attention, layers, worker, entrypoints

gemini-code-assist · 2026-02-27T20:09:38Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the continuous integration pipeline by integrating automated performance regression tests. The primary goal is to proactively monitor and prevent performance degradations in video generation models, specifically focusing on generation latency and GPU memory consumption. By establishing device-aware thresholds and logging detailed results, the changes ensure that critical performance metrics remain within acceptable bounds, contributing to the overall stability and efficiency of the system.

Highlights

Automated Performance CI: Introduced a new automated performance CI step to measure generation latency and peak GPU memory for the Wan2.1-T2V-1.3B model.
Device-Aware Thresholds: Implemented device-aware performance thresholds (initially for L40S GPUs) to validate generation time and peak memory, with a fallback for unknown devices.
Result Tracking: Configured the performance tests to write detailed JSON results, including individual run metrics and thresholds, for future trend tracking and analysis.
Targeted Triggering: The new performance tests are configured to trigger specifically on changes to core components like dits, pipelines, attention, layers, worker, entrypoints, and performance test files themselves.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

.buildkite/pipeline.yml
- Added a new Buildkite step named 'Performance Tests' to the CI pipeline.
- Configured the performance test step to run on specific file path changes, including core model components and the performance test directory.
- Set the TEST_TYPE environment variable to 'performance' for the new step.
.buildkite/scripts/pr_test.sh
- Extended the case statement to handle a new TEST_TYPE value, 'performance'.
- Mapped the 'performance' test type to execute the run_performance_tests Modal function.
fastvideo/tests/modal/pr_test.py
- Added a new Modal function run_performance_tests to orchestrate the execution of performance tests.
- Configured run_performance_tests to use two L40S GPUs, a 30-minute timeout, and specific environment variables for Hugging Face cache and PyTorch CUDA memory allocation.
fastvideo/tests/performance/test_inference_performance.py
- Created a new pytest file test_inference_performance.py dedicated to measuring video generation performance.
- Defined parameters for the Wan2.1-T2V-1.3B model, including num_gpus, model_path, height, width, num_frames, and inference_steps.
- Implemented logic for device-aware performance thresholds, with initial values for L40S GPUs and a default fallback.
- Included helper functions for retrieving thresholds, shutting down executors, running single generations, and writing results to JSON files.
- The main test function performs warmup runs, multiple measurement runs, calculates average generation time and maximum peak GPU memory, and asserts these against the defined thresholds.

Activity

AjAnubolu created this pull request to introduce automated inference performance regression tests.
The pull request body provides a summary of the changes, including the scope of the tests and trigger conditions.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces automated performance regression tests for the Wan2.1-T2V-1.3B model, measuring generation latency and peak GPU memory. The changes include updates to the Buildkite pipeline, the pr_test.sh script, and a new Modal test file. The new test_inference_performance.py file sets up device-aware thresholds and writes JSON results for trend tracking, which is a valuable addition for monitoring performance over time. The implementation is generally robust, with proper resource cleanup and clear assertion messages.

gemini-code-assist · 2026-02-27T20:11:07Z

.buildkite/pipeline.yml

+                - "pyproject.toml"
+                - "docker/Dockerfile.python3.12"
+              config:
+                command: "timeout 30m .buildkite/scripts/pr_test.sh"


The command for the "Performance Tests" step includes timeout 30m. A similar timeout (timeout=1800) is also specified in the run_performance_tests function within fastvideo/tests/modal/pr_test.py. It's generally better to have a single source of truth for timeouts to avoid confusion and potential conflicts. Consider removing one of these timeouts or clarifying their intended roles (e.g., Buildkite timeout as a failsafe, Modal timeout as the primary control).

gemini-code-assist · 2026-02-27T20:11:07Z

fastvideo/tests/performance/test_inference_performance.py

+
+logger = init_logger(__name__)
+
+REQUIRED_GPUS = 2


The REQUIRED_GPUS constant is defined but not used within this file. If this constant is meant to enforce or indicate the number of GPUs required for the test, consider adding a check to ensure the test environment meets this requirement (e.g., assert torch.cuda.device_count() == REQUIRED_GPUS). Otherwise, it can be removed to avoid dead code.

gemini-code-assist · 2026-02-27T20:11:07Z

fastvideo/tests/performance/test_inference_performance.py

+    results_dir = os.path.join(script_dir, "results")
+    os.makedirs(results_dir, exist_ok=True)
+
+    timestamp = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")


The timestamp for the filename is generated using strftime("%Y%m%dT%H%M%SZ"), while the timestamp within the JSON results (line 210) uses datetime.now(timezone.utc).isoformat(). For consistency, it would be beneficial to use the same format for both, preferably isoformat() if the filename can handle the characters, or explicitly define a common format string for both uses.

Suggested change

timestamp = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")

timestamp = datetime.now(timezone.utc).isoformat().replace(":", "-").replace(".", "-") # ISO 8601 compatible and filename-safe

Eigensystem

Overall looks good. Maybe you can also record the current performance data and check whether subsequent pull requests will cause a performance drop in CI.

Adds back PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True which was accidentally removed, causing a ~2x performance regression. Thresholds calibrated from baseline (28.3s / 8908MB) at 1.2x.

AjAnubolu · 2026-03-02T09:44:22Z

Collaborator

Calibrated L40S thresholds from a baseline run (28.3s avg latency, 8908MB peak memory) and set a threshold (1.2x) to check performance regressions. Results are also written to JSON after each run for tracking, but baseline needs to be manually updated with the current implementation. Does this look reasonable or do you have any recommendations for improvement?

Eigensystem

Overall looks pretty good. How should we use the dashboard?

fastvideo/worker/multiproc_executor.py

AjAnubolu · 2026-03-06T09:40:01Z

I just saw vLLM had a dashboard to track and visualize performance over time(perf.vllm.ai), can change or remove if unnecessary, dont think sglang had one

Eigensystem · 2026-03-06T09:51:29Z

I just saw vLLM had a dashboard to track and visualize performance over time(perf.vllm.ai), can change or remove if unnecessary, dont think sglang had one

Maybe remove it from this PR?

[CI] Add inference performance regression tests

ac439e1

gemini-code-assist bot reviewed Feb 27, 2026

View reviewed changes

Eigensystem reviewed Feb 28, 2026

View reviewed changes

Fix memory measurement and remove expandable_segments

094c69f

AjAnubolu force-pushed the ci/performance-tests branch from 901a738 to 094c69f Compare March 2, 2026 04:28

AjAnubolu added 5 commits March 1, 2026 20:45

Return peak_memory_mb from generate_video result

85f11fe

Restore peak memory tracking in worker and executor

e9ab196

Restore expandable_segments and calibrate L40S thresholds

05615f3

Adds back PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True which was accidentally removed, causing a ~2x performance regression. Thresholds calibrated from baseline (28.3s / 8908MB) at 1.2x.

Revert expandable_segments, incompatible with Modal kernel

208cf83

Merge origin/main, resolve conflict in video_generator.py

9f8bf13

AjAnubolu added 2 commits March 2, 2026 17:03

merge main into ci/performance-tests

da54fc3

config-driven performance tests with dashboard and results persistence

7862b4d

AjAnubolu added the go Trigger Buildkite CI label Mar 5, 2026

AjAnubolu added 3 commits March 4, 2026 17:12

fix pymarkdown lint in architecture.md

c5bad5d

Merge remote-tracking branch 'origin/main' into ci/performance-tests

d1ebf92

merge main and fix yapf formatting

2ed3234

AjAnubolu requested a review from Eigensystem March 5, 2026 01:25

Eigensystem reviewed Mar 6, 2026

View reviewed changes

fastvideo/worker/multiproc_executor.py Outdated Show resolved Hide resolved

remove duplicate peak_memory_mb code, record in worker extra directly

7724172

remove dashboard and results upload (moving to separate PR)

a9d1a36

Eigensystem approved these changes Mar 7, 2026

View reviewed changes

Eigensystem merged commit 02c1c49 into hao-ai-lab:main Mar 7, 2026
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Add inference performance regression tests#1140

[CI] Add inference performance regression tests#1140
Eigensystem merged 14 commits intohao-ai-lab:mainfrom
AjAnubolu:ci/performance-tests

AjAnubolu commented Feb 27, 2026

Uh oh!

gemini-code-assist bot commented Feb 27, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 27, 2026

Uh oh!

gemini-code-assist bot Feb 27, 2026

Uh oh!

gemini-code-assist bot Feb 27, 2026

Uh oh!

Eigensystem left a comment

Uh oh!

AjAnubolu commented Mar 2, 2026

Uh oh!

Eigensystem left a comment

Uh oh!

Uh oh!

AjAnubolu commented Mar 6, 2026

Uh oh!

Eigensystem commented Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	timestamp = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
	timestamp = datetime.now(timezone.utc).isoformat().replace(":", "-").replace(".", "-") # ISO 8601 compatible and filename-safe


		logger = init_logger(__name__)

		REQUIRED_GPUS = 2

Conversation

AjAnubolu commented Feb 27, 2026

Summary

Uh oh!

gemini-code-assist bot commented Feb 27, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Eigensystem left a comment

Choose a reason for hiding this comment

Uh oh!

AjAnubolu commented Mar 2, 2026

Uh oh!

Eigensystem left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AjAnubolu commented Mar 6, 2026

Uh oh!

Eigensystem commented Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants