fix(video): correct keyframe seek timestamp calculation for start_time #6005

huleilei · 2026-01-10T06:57:24Z

Changes Made

VideoFile.keyframes() computed an incorrect seek timestamp when start_time > 0. The previous implementation used start_time * time_base, which keeps the effective seek position near 0 for small time_base values (for example 1/30000). As a result, the start_time parameter was effectively ignored and keyframes were decoded from the beginning of the video.

This also affected the daft.functions.video_keyframes() UDF, which delegates to VideoFile.keyframes(). Users expecting to extract keyframes from a later time range (for example starting at 10 seconds) instead received frames starting close to time 0.

Related Issues

#5949

greptile-apps · 2026-01-10T07:02:48Z

Greptile Overview

Greptile Summary

This PR fixes a critical bug in VideoFile.keyframes() where the start_time parameter was effectively ignored due to incorrect timestamp calculation.

The Bug

The original implementation used seek_timestamp = int(start_time * video.time_base) to calculate the seek position. Since time_base represents seconds per PTS unit (typically values like 1/30000), multiplying by start_time produced timestamps near 0 regardless of the actual start_time value. For example, with start_time=10 seconds and time_base=1/30000, the old formula gave 10 * (1/30000) = 0.000333, which rounds to 0.

The Fix

The PR correctly changes the formula to seek_timestamp = int(start_time / float(video.time_base)), which properly converts seconds to PTS units. With the same example: 10 / (1/30000) = 300000 PTS units - the correct seek position.

Additional improvements include:

Adding stream=video parameter to container.seek() for stream-specific seeking
Adding post-seek frame filtering to skip frames that land before start_time (since seeks may land on earlier keyframes)
Adding test coverage for start_time functionality

Code Quality Notes

The core fix is mathematically sound and addresses the reported issue. The code properly handles edge cases like None time_base values. Minor style inconsistencies exist in the end_time checking logic, and test coverage could be strengthened to verify actual frame timestamps rather than just frame counts.

Confidence Score: 4/5

This PR is safe to merge with minimal risk - it fixes a critical bug with a mathematically correct solution
The core timestamp calculation fix is mathematically sound and directly addresses the reported issue. The implementation properly handles edge cases (None time_base, frame.time checks). Additional safety measures (post-seek filtering, stream parameter) strengthen robustness. Minor style inconsistencies and test coverage gaps exist but don't affect correctness. Score is 4/5 rather than 5/5 due to: (1) test coverage validates counts but not actual frame timestamps, and (2) minor style inconsistency in end_time handling that could be cleaned up.
No files require special attention - both files have been thoroughly reviewed and the changes are straightforward

Important Files Changed

File Analysis

Filename	Score	Overview
daft/file/video.py	4/5	Fixed critical timestamp calculation bug and added post-seek frame filtering; minor style inconsistency in end_time handling
tests/file/test_video.py	3/5	Added three tests for start_time functionality; tests verify basic behavior but lack edge case coverage and precise frame time validation

Sequence Diagram

sequenceDiagram
    participant User
    participant VideoFile
    participant PyAV
    participant Container
    participant VideoStream
    
    User->>VideoFile: keyframes(start_time=10, end_time=None)
    VideoFile->>VideoFile: open() file
    VideoFile->>PyAV: av.open(file)
    PyAV-->>VideoFile: container
    VideoFile->>Container: get video stream
    Container-->>VideoFile: video stream
    
    alt start_time > 0 and video.time_base exists
        VideoFile->>VideoFile: seek_timestamp = int(start_time / float(time_base))
        VideoFile->>Container: seek(seek_timestamp, stream=video)
        Container-->>VideoFile: positioned at keyframe
    end
    
    VideoFile->>VideoStream: skip_frame = "NONKEY"
    VideoFile->>Container: decode(video)
    
    loop for each keyframe
        Container-->>VideoFile: frame
        
        alt frame.time < start_time
            VideoFile->>VideoFile: continue (skip frame)
        else if end_time exists and frame.time > end_time
            VideoFile->>VideoFile: break (stop iteration)
        else
            VideoFile->>VideoFile: frame.to_image()
            VideoFile-->>User: yield PIL.Image
        end
    end

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

daft/file/video.py

codecov · 2026-01-12T02:55:37Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.91%. Comparing base (ae14540) to head (7bacde6).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6005      +/-   ##
==========================================
+ Coverage   72.89%   72.91%   +0.01%     
==========================================
  Files         971      971              
  Lines      125987   125989       +2     
==========================================
+ Hits        91842    91859      +17     
+ Misses      34145    34130      -15

Files with missing lines	Coverage Δ
daft/file/video.py	`69.84% <100.00%> (+30.49%)`	⬆️

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-12T03:34:28Z

tests/file/test_video.py

+def test_keyframes_start_time_skips_early_frames(sample_video_path):
+    file = daft.VideoFile(sample_video_path)
+    metadata = file.metadata()
+    duration = metadata["duration"]
+    assert duration is not None
+
+    start_time = duration / 2.0
+
+    all_keyframes = list(file.keyframes())
+    later_keyframes = list(file.keyframes(start_time=start_time))
+
+    assert len(later_keyframes) < len(all_keyframes)


This test only verifies that fewer frames are returned, but doesn't validate that the returned frames actually start from the correct time. Consider strengthening the test to verify the actual frame times:

Suggested change

def test_keyframes_start_time_skips_early_frames(sample_video_path):

file = daft.VideoFile(sample_video_path)

metadata = file.metadata()

duration = metadata["duration"]

assert duration is not None

start_time = duration / 2.0

all_keyframes = list(file.keyframes())

later_keyframes = list(file.keyframes(start_time=start_time))

assert len(later_keyframes) < len(all_keyframes)

def test_keyframes_start_time_skips_early_frames(sample_video_path):

file = daft.VideoFile(sample_video_path)

metadata = file.metadata()

duration = metadata["duration"]

assert duration is not None

start_time = duration / 2.0

all_keyframes = list(file.keyframes())

later_keyframes = list(file.keyframes(start_time=start_time))

assert len(later_keyframes) < len(all_keyframes)

# Verify that returned keyframes actually start after start_time

# This would require accessing frame metadata, which PIL.Image doesn't have

# Consider using a lower-level API or storing timestamps to validate

The current test passes as long as the count is different, but doesn't verify correctness. Ideally, you'd want to check that the first returned frame is at or after start_time.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: tests/file/test_video.py Line: 83:94 Comment: This test only verifies that fewer frames are returned, but doesn't validate that the returned frames actually start from the correct time. Consider strengthening the test to verify the actual frame times: ```suggestion def test_keyframes_start_time_skips_early_frames(sample_video_path): file = daft.VideoFile(sample_video_path) metadata = file.metadata() duration = metadata["duration"] assert duration is not None start_time = duration / 2.0 all_keyframes = list(file.keyframes()) later_keyframes = list(file.keyframes(start_time=start_time)) assert len(later_keyframes) < len(all_keyframes) # Verify that returned keyframes actually start after start_time # This would require accessing frame metadata, which PIL.Image doesn't have # Consider using a lower-level API or storing timestamps to validate ``` The current test passes as long as the count is different, but doesn't verify correctness. Ideally, you'd want to check that the first returned frame is at or after `start_time`. <sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub> How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-01-12T03:34:30Z

Additional Comments (1)

daft/file/video.py
The end_time check has a style inconsistency with the start_time check. Line 117 directly uses frame.time, but here you assign it to frame_time first. Consider refactoring for consistency:

                    # Check end time if specified
                    if end_time is not None:
                        if frame.time and frame.time > end_time:
                            break

This makes the code more consistent and slightly cleaner.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review.
Path: daft/file/video.py
Line: 121:124

Comment:
The `end_time` check has a style inconsistency with the `start_time` check. Line 117 directly uses `frame.time`, but here you assign it to `frame_time` first. Consider refactoring for consistency:

```suggestion
                    # Check end time if specified
                    if end_time is not None:
                        if frame.time and frame.time > end_time:
                            break
```

This makes the code more consistent and slightly cleaner.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

github-actions bot added the fix label Jan 10, 2026

greptile-apps bot reviewed Jan 10, 2026

View reviewed changes

daft/file/video.py Outdated Show resolved Hide resolved

huleilei force-pushed the hll/fix-video-keyframe-seek branch from 5d250c0 to 1d64cd0 Compare January 10, 2026 07:28

huleilei marked this pull request as draft January 10, 2026 08:05

fix(video): correct keyframe seek timestamp calculation for start_time

7bacde6

huleilei force-pushed the hll/fix-video-keyframe-seek branch from 1d64cd0 to 7bacde6 Compare January 12, 2026 02:08

huleilei marked this pull request as ready for review January 12, 2026 03:29

greptile-apps bot reviewed Jan 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(video): correct keyframe seek timestamp calculation for start_time #6005

fix(video): correct keyframe seek timestamp calculation for start_time #6005

huleilei commented Jan 10, 2026

Uh oh!

greptile-apps bot commented Jan 10, 2026 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

codecov bot commented Jan 12, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Jan 12, 2026

Uh oh!

greptile-apps bot commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix(video): correct keyframe seek timestamp calculation for start_time #6005

Are you sure you want to change the base?

fix(video): correct keyframe seek timestamp calculation for start_time #6005

Conversation

huleilei commented Jan 10, 2026

Changes Made

Related Issues

Uh oh!

greptile-apps bot commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

The Bug

The Fix

Code Quality Notes

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Jan 12, 2026

Codecov Report

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps bot commented Jan 10, 2026 •

edited

Loading