feat: Support FastVideo for Video Generation Models#1303
Conversation
| set MODEL_DIR /path/to/Wan2.2-I2V-A14B-Diffusers | ||
| set OUT_ROOT /path/to/eval_out/vbvr_wan22_full_highres | ||
| set VIDEOS_DIR $OUT_ROOT/videos | ||
| set METRICS_DIR $OUT_ROOT/metrics | ||
| mkdir -p $VIDEOS_DIR $METRICS_DIR |
There was a problem hiding this comment.
This part should utilize the cache directory and the submission directory. Using environment variables seems rather inflexible. After examining the repo structure on HuggingFace, one possibility is to set up a cache directory similar to the other video paths. Then, before each download, we can use snapshot_download from HuggingFace, which would return the path to the video directory. As for the metrics directory, I think it could be handled via generate_submission?
There was a problem hiding this comment.
Oh just find forgot to write the downloading code, updated by adding
lmms-eval/lmms_eval/tasks/vbvr/utils.py
Line 86 in cc5267e
There was a problem hiding this comment.
Lemme check the model output. I got lazy before and just pointed it to my training folder randomly. Let me fix that now.
There was a problem hiding this comment.
Seems that 笨蛋 claude makes the default output dir to huggingface dir
lmms-eval/lmms_eval/models/chat/fastvideo.py
Lines 73 to 75 in cc5267e
This definitely needs to be changed.
But I’m not entirely sure how to handle generate_submission. I noticed that Bagel outputs to ./logs/bagel_images/<run_id>/ by default, so maybe we should do the same?
There was a problem hiding this comment.
Fixed by adding
lmms-eval/lmms_eval/models/chat/fastvideo.py
Lines 81 to 87 in c1019e4
There was a problem hiding this comment.
Yeah, I think currently could use log dir as the output dir, unless otherwise specify in the init args.
| # Resume: if the target mp4 already exists and is non-empty, reuse it. | ||
| # Set overwrite=True in model_args to force regeneration. | ||
| presults: List[Optional[GenerationResult]] = [None] * len(prepared) | ||
| skipped_indices: List[int] = [] | ||
| if not self.overwrite: | ||
| for i, prep in enumerate(prepared): | ||
| path = prep.get("output_path") | ||
| if path and os.path.isfile(path) and os.path.getsize(path) > 0: | ||
| presults[i] = self._pack_result(os.path.abspath(path)) | ||
| skipped_indices.append(i) | ||
| if skipped_indices: | ||
| eval_logger.info(f"FastVideo: resume — reusing {len(skipped_indices)}/{len(prepared)} " f"existing mp4s (set overwrite=True to regenerate)") |
There was a problem hiding this comment.
Maybe can the caching features for lmms-eval instead of hardcoding here? I am fine for this, just wondering if this is possible.
There was a problem hiding this comment.
What I’m actually more concerned about is that right now the videos and the video paths are separated. Videos don’t seem as easy to cache as plain text. But I guess it’s still doable — worst case, it just throws a “video not found” error.
There was a problem hiding this comment.
I think if we store the video as path in the output, the caching logic would just like plain text? We have a similar structure just like text so can just load from db. Or if this is not the case can just reload from the previously define output dir and keep this code block
There was a problem hiding this comment.
ok, then I will do this. I was actually concerned about the scenario where the path still exists but the video has been deleted. My understanding is that our script doesn't check for this.
There was a problem hiding this comment.
Yeah, I think our script doesn't check for this. If the storage is not persistent then maybe should disable the cache mode or clean the cached data.
|
I made the changes. I'll run it tonight to check the if the results are the same. |
|
Ok thanks, I think once you feel this PR is mostly done, I will approve and merge the PR. Thanks! |

Added FastVideo and VBVR
There is a minor issue: the current final process results in VBVR are single-threaded. If we want to change this to multi-threaded, it seems we would need to modify the main trunk of the code. The final process results step takes approximately 10 minutes to run. However, if you use 32 threads, it can be completed within a minute.
lmms-eval/lmms_eval/tasks/vbvr/utils.py
Lines 191 to 192 in 545ffdf