-
Notifications
You must be signed in to change notification settings - Fork 381
fix(video): correct keyframe seek timestamp calculation for start_time #6005
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Greptile OverviewGreptile SummaryThis PR fixes a critical bug in The BugThe original implementation used The FixThe PR correctly changes the formula to Additional improvements include:
Code Quality NotesThe core fix is mathematically sound and addresses the reported issue. The code properly handles edge cases like Confidence Score: 4/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant User
participant VideoFile
participant PyAV
participant Container
participant VideoStream
User->>VideoFile: keyframes(start_time=10, end_time=None)
VideoFile->>VideoFile: open() file
VideoFile->>PyAV: av.open(file)
PyAV-->>VideoFile: container
VideoFile->>Container: get video stream
Container-->>VideoFile: video stream
alt start_time > 0 and video.time_base exists
VideoFile->>VideoFile: seek_timestamp = int(start_time / float(time_base))
VideoFile->>Container: seek(seek_timestamp, stream=video)
Container-->>VideoFile: positioned at keyframe
end
VideoFile->>VideoStream: skip_frame = "NONKEY"
VideoFile->>Container: decode(video)
loop for each keyframe
Container-->>VideoFile: frame
alt frame.time < start_time
VideoFile->>VideoFile: continue (skip frame)
else if end_time exists and frame.time > end_time
VideoFile->>VideoFile: break (stop iteration)
else
VideoFile->>VideoFile: frame.to_image()
VideoFile-->>User: yield PIL.Image
end
end
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
5d250c0 to
1d64cd0
Compare
1d64cd0 to
7bacde6
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #6005 +/- ##
==========================================
+ Coverage 72.89% 72.91% +0.01%
==========================================
Files 971 971
Lines 125987 125989 +2
==========================================
+ Hits 91842 91859 +17
+ Misses 34145 34130 -15
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 2 comments
| def test_keyframes_start_time_skips_early_frames(sample_video_path): | ||
| file = daft.VideoFile(sample_video_path) | ||
| metadata = file.metadata() | ||
| duration = metadata["duration"] | ||
| assert duration is not None | ||
|
|
||
| start_time = duration / 2.0 | ||
|
|
||
| all_keyframes = list(file.keyframes()) | ||
| later_keyframes = list(file.keyframes(start_time=start_time)) | ||
|
|
||
| assert len(later_keyframes) < len(all_keyframes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test only verifies that fewer frames are returned, but doesn't validate that the returned frames actually start from the correct time. Consider strengthening the test to verify the actual frame times:
| def test_keyframes_start_time_skips_early_frames(sample_video_path): | |
| file = daft.VideoFile(sample_video_path) | |
| metadata = file.metadata() | |
| duration = metadata["duration"] | |
| assert duration is not None | |
| start_time = duration / 2.0 | |
| all_keyframes = list(file.keyframes()) | |
| later_keyframes = list(file.keyframes(start_time=start_time)) | |
| assert len(later_keyframes) < len(all_keyframes) | |
| def test_keyframes_start_time_skips_early_frames(sample_video_path): | |
| file = daft.VideoFile(sample_video_path) | |
| metadata = file.metadata() | |
| duration = metadata["duration"] | |
| assert duration is not None | |
| start_time = duration / 2.0 | |
| all_keyframes = list(file.keyframes()) | |
| later_keyframes = list(file.keyframes(start_time=start_time)) | |
| assert len(later_keyframes) < len(all_keyframes) | |
| # Verify that returned keyframes actually start after start_time | |
| # This would require accessing frame metadata, which PIL.Image doesn't have | |
| # Consider using a lower-level API or storing timestamps to validate |
The current test passes as long as the count is different, but doesn't verify correctness. Ideally, you'd want to check that the first returned frame is at or after start_time.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/file/test_video.py
Line: 83:94
Comment:
This test only verifies that fewer frames are returned, but doesn't validate that the returned frames actually start from the correct time. Consider strengthening the test to verify the actual frame times:
```suggestion
def test_keyframes_start_time_skips_early_frames(sample_video_path):
file = daft.VideoFile(sample_video_path)
metadata = file.metadata()
duration = metadata["duration"]
assert duration is not None
start_time = duration / 2.0
all_keyframes = list(file.keyframes())
later_keyframes = list(file.keyframes(start_time=start_time))
assert len(later_keyframes) < len(all_keyframes)
# Verify that returned keyframes actually start after start_time
# This would require accessing frame metadata, which PIL.Image doesn't have
# Consider using a lower-level API or storing timestamps to validate
```
The current test passes as long as the count is different, but doesn't verify correctness. Ideally, you'd want to check that the first returned frame is at or after `start_time`.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
Additional Comments (1)
This makes the code more consistent and slightly cleaner. Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! Prompt To Fix With AIThis is a comment left during a code review.
Path: daft/file/video.py
Line: 121:124
Comment:
The `end_time` check has a style inconsistency with the `start_time` check. Line 117 directly uses `frame.time`, but here you assign it to `frame_time` first. Consider refactoring for consistency:
```suggestion
# Check end time if specified
if end_time is not None:
if frame.time and frame.time > end_time:
break
```
This makes the code more consistent and slightly cleaner.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise. |
Changes Made
VideoFile.keyframes()computed an incorrect seek timestamp whenstart_time > 0. The previous implementation usedstart_time * time_base, which keeps the effective seek position near 0 for smalltime_basevalues (for example1/30000). As a result, thestart_timeparameter was effectively ignored and keyframes were decoded from the beginning of the video.This also affected the
daft.functions.video_keyframes()UDF, which delegates toVideoFile.keyframes(). Users expecting to extract keyframes from a later time range (for example starting at 10 seconds) instead received frames starting close to time 0.Related Issues
#5949