Skip to content

Conversation

@ajcasagrande
Copy link
Contributor

@ajcasagrande ajcasagrande commented Oct 24, 2025

Summary by CodeRabbit

  • New Features

    • Image retrieval endpoint for NIM-based image inference.
    • Video media modality added across dataset flows (single/multi-turn, pools).
    • Audio and video open/encode utilities and media URL/encoding handling.
  • Metrics

    • New image and video metrics: counts, throughput, and per-item latency.
    • Extended metric units for images/videos and new video-only flag.
  • Tests

    • Unit tests added/updated for image retrieval, media encoding, and loaders.

@github-actions github-actions bot added the feat label Oct 24, 2025
@ajcasagrande
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Oct 24, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link

coderabbitai bot commented Oct 24, 2025

Walkthrough

Adds image/video modality support across the codebase: new metric units and metrics, image retrieval endpoint and response type, media encoding utilities and dataset loader changes to handle video, removal of per-field exclude-if-none serialization, tokenizer/parse-flow adjustments, and accompanying tests and fixtures.

Changes

Cohort / File(s) Summary
Metric Enums & Units
src/aiperf/common/enums/metric_enums.py
Added IMAGE, IMAGES, VIDEO, VIDEOS to GenericMetricUnit; added inverted flag to MetricOverTimeUnitInfo and exposed on MetricOverTimeUnit; added IMAGES_PER_SECOND, MS_PER_IMAGE, VIDEOS_PER_SECOND, MS_PER_VIDEO; added SUPPORTS_VIDEO_ONLY flag.
Endpoint Type
src/aiperf/common/enums/plugin_enums.py
Added EndpointType.IMAGE_RETRIEVAL.
Image Retrieval Endpoint & Tests
src/aiperf/endpoints/nim_image_retrieval.py, src/aiperf/endpoints/__init__.py, tests/endpoints/test_nim_image_retrieval_endpoint*.py
New ImageRetrievalEndpoint registered for IMAGE_RETRIEVAL; implements metadata, format_payload, parse_response; exports added and unit tests for formatting and parsing.
Response Models & Exports
src/aiperf/common/models/record_models.py, src/aiperf/common/models/__init__.py
Added ImageRetrievalResponseData and included it in ParsedResponse unions and package exports; removed exclude_if_none export from models init.
Media Utilities & Exports
src/aiperf/dataset/utils.py, src/aiperf/dataset/__init__.py
Added open_audio, encode_audio, open_video, encode_video (format detection + base64 encoding); re-exported these in package API.
Loader Media Conversion & Mixins
src/aiperf/dataset/loader/mixins.py, src/aiperf/dataset/loader/models.py
Added URL detection, encoding helpers (_is_url, _is_already_encoded, _encode_media_file, _handle_media_content); added video/video(s) fields to models and updated validation to include video modality.
Loaders — Turn Conversion Updates
src/aiperf/dataset/loader/single_turn.py, src/aiperf/dataset/loader/multi_turn.py, src/aiperf/dataset/loader/random_pool.py
Pass videos=media[MediaType.VIDEO] into Turn construction in single-turn, multi-turn, and random-pool conversion flows.
Image & Video Metrics
src/aiperf/metrics/types/image_metrics.py, src/aiperf/metrics/types/video_metrics.py
Added NumImagesMetric, ImageThroughputMetric, ImageLatencyMetric; added NumVideosMetric, VideoThroughputMetric, VideoLatencyMetric with proper units, flags, and dependencies on RequestLatencyMetric.
Test Fixtures & Loader Tests
tests/loaders/conftest.py, tests/loaders/test_single_turn.py, tests/loaders/test_multi_turn.py, tests/loaders/test_random_pool.py
Added fixtures for test_images, create_test_image, create_test_audio, create_test_video; updated tests to expect base64-encoded local media, URL pass-through, and new video/audio behaviors.
Encoding Tests & Error Cases
tests/loaders/test_single_turn.py (new classes)
Added explicit tests for image/audio/video encoding behavior (local file -> encoded, URL passthrough, existing encoded values preserved) and error handling for missing local files.
Serialization / exclude_if_none Removal
src/aiperf/common/models/base_models.py, src/aiperf/common/models/dataset_models.py, src/aiperf/common/models/export_models.py
Removed exclude_if_none decorator and related machinery; simplified AIPerfBaseModel; removed per-field model serializer; removed @exclude_if_none("role") on Turn; changed JSON export models to use BaseModel.
Message JSON Dispatch & ErrorMessage
src/aiperf/common/messages/base_messages.py, src/aiperf/common/messages/command_messages.py
Switched JSON loading to load_json_str; added dynamic _message_type_lookup and __init_subclass__ for Message subclasses; added ErrorMessage; removed use of exclude_if_none decorator and updated command message deserialization to use load_json_str.
Sequence Distribution JSON Parsing
src/aiperf/common/models/sequence_distribution.py
Replaced json.loads with load_json_str/orjson path and updated JSON decode error handling.
Tokenizer / Parser Changes
src/aiperf/records/inference_result_parser.py, src/aiperf/records/record_processor_service.py, src/aiperf/dataset/dataset_manager.py
Renamed configure_configure_tokenizers; tokenizer setup guarded by endpoint capabilities; process_valid_record now accepts input_token_count and token counting conditioned on endpoint; dataset manager tokenization conditional and JSON serialization changed to exclude_none=True.
Tests Updated for Parser API
tests/parsers/test_usage_passthrough.py
Updated tests to call process_valid_record(record, input_token_count=...).
Removed Tests
tests/test_messages.py
Deleted tests relying on the removed exclude_if_none behavior.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Notes for reviewer:

  • Inspect tokenizer/parse-flow signature changes and all test updates that call the new process_valid_record signature.
  • Verify serialization behavior changes after removing exclude_if_none (message/model JSON roundtrips and existing consumers).
  • Review media encoding helpers and format detection for edge cases (data-URIs, audio format delimiting).
  • Confirm metric unit inversion and unit conversion logic for the new per-image/per-video units.

Poem

🐰 I nibble bytes and hop through frames,

I stash each pixel, count their names.
Videos, images join the run,
Metrics hum and encodings sun.
Hooray — a rabbit’s work is done! 🥕📸

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "feat: add nim image retrieval endpoint support" directly corresponds to the primary change in this pull request, which is the introduction of the ImageRetrievalEndpoint class in src/aiperf/endpoints/nim_image_retrieval.py. The title is clear, specific, and concise, accurately capturing the main feature being added without vague language. While the PR includes substantial supporting changes (new metrics, model enhancements, dataset loader updates, and infrastructure refactoring), the title appropriately focuses on the primary new feature. The title is easily understandable and would allow a developer scanning the commit history to quickly identify this as adding image retrieval endpoint functionality.
Docstring Coverage ✅ Passed Docstring coverage is 91.40% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/aiperf/dataset/loader/random_pool.py (1)

179-192: Add videos to the merge logic.

The _merge_turns method merges texts, images, and audios but omits videos. This inconsistency will cause video data to be lost when turns are merged.

Apply this diff to include videos in the merged turn:

     def _merge_turns(self, turns: list[Turn]) -> Turn:
         """Merge turns into a single turn.
 
         Args:
             turns: A list of turns.
 
         Returns:
             A single turn.
         """
         merged_turn = Turn(
             texts=[text for turn in turns for text in turn.texts],
             images=[image for turn in turns for image in turn.images],
             audios=[audio for turn in turns for audio in turn.audios],
+            videos=[video for turn in turns for video in turn.videos],
         )
         return merged_turn
🧹 Nitpick comments (27)
tests/endpoints/test_nim_image_retrieval_endpoint.py (1)

49-52: Make the failure assertion less brittle and add multi-image coverage.

  • Use a stable substring in the regex to reduce brittleness if wording changes.
  • Add a test for multiple images to ensure list ordering and formatting.

Apply this minimal tweak to the assertion:

-with pytest.raises(
-    ValueError, match="Image Retrieval request requires at least one image"
-):
+with pytest.raises(ValueError, match=r"requires at least one image"):

Optionally add:

def test_format_payload_multiple_images(endpoint, model_endpoint):
    turn = Turn(images=[Image(contents=[""]), Image(contents=[""])], model="image-retrieval-model")
    req = RequestInfo(model_endpoint=model_endpoint, turns=[turn])
    payload = endpoint.format_payload(req)
    assert [i["url"] for i in payload["input"]] == ["",""]
tests/loaders/test_multi_turn.py (1)

456-467: Good coverage; consider asserting image ordering for stability.

Add an explicit order check to ensure the two encoded images remain in the provided order during conversion.

 first_turn = conversation.turns[0]
 assert first_turn.texts[0].contents == ["What's this?"]
-assert len(first_turn.images[0].contents) == 1
+assert len(first_turn.images[0].contents) == 1
 # ...
 second_turn = conversation.turns[1]
 assert second_turn.texts[0].contents == ["Follow up"]
-assert len(second_turn.images[0].contents) == 1
+assert len(second_turn.images[0].contents) == 1
+
+# Optional: verify order is preserved by comparing raw contents before/after
+# (placeholders—focus is on positional stability)
+img0 = first_turn.images[0].contents[0]
+img1 = second_turn.images[0].contents[0]
+assert img0 != img1  # sanity check; should represent different inputs

Also applies to: 479-485, 486-492

tests/loaders/test_random_pool.py (2)

251-275: Add an explicit ordering assertion for batched images.

Helps catch accidental reordering during encoding.

 for img_content in turn.images[0].contents:
     assert img_content.startswith("data:image/")
     assert ";base64," in img_content
+assert turn.images[0].contents[0] != turn.images[0].contents[1]

325-375: Good multi-file assertions; mirror the image-encoding checks for both conversations.

You already validate base64 for both; consider asserting that text-image pairs belong to different files (queries vs contexts) by name when available to tighten guarantees.

tests/loaders/conftest.py (2)

89-111: Minor: produce exact sample count for generated audio.

Use endpoint=False to avoid including the end sample twice for short durations.

- t = np.linspace(0, duration, int(sample_rate * duration))
+ t = np.linspace(0, duration, int(sample_rate * duration), endpoint=False)

124-166: Narrow exception handling and ensure temp-frame cleanup in video fixture.

Catching Exception masks errors (Ruff BLE001). Also, ensure frames are cleaned even on failure by using TemporaryDirectory.

-    def _create_video(name: str = "test_video.mp4"):
+    def _create_video(name: str = "test_video.mp4"):
         dest_path = tmp_path / name
-
-        # Try using ffmpeg-python if available, otherwise create a minimal MP4
-        try:
-            import tempfile
-
-            import ffmpeg
-            # Create a few simple frames
-            temp_frame_dir = tempfile.mkdtemp(prefix="video_frames_")
-            for i in range(3):
-                img = Image.new("RGB", (64, 64), (i * 80, 0, 0))
-                draw = ImageDraw.Draw(img)
-                draw.text((10, 25), f"F{i}", fill=(255, 255, 255))
-                img.save(f"{temp_frame_dir}/frame_{i:03d}.png")
-            # Use ffmpeg to create video
-            (
-                ffmpeg.input(f"{temp_frame_dir}/frame_%03d.png", framerate=1)
-                .output(str(dest_path), vcodec="libx264", pix_fmt="yuv420p", t=1)
-                .overwrite_output()
-                .run(quiet=True)
-            )
-            for file in Path(temp_frame_dir).glob("*.png"):
-                file.unlink()
-            Path(temp_frame_dir).rmdir()
-        except (ImportError, Exception):
+        # Try using ffmpeg-python if available, otherwise create a minimal MP4
+        try:
+            try:
+                import ffmpeg  # type: ignore
+            except ImportError:
+                ffmpeg = None
+            if ffmpeg:
+                import tempfile as _tf
+                from tempfile import TemporaryDirectory
+                with TemporaryDirectory(prefix="video_frames_") as temp_frame_dir:
+                    for i in range(3):
+                        img = Image.new("RGB", (64, 64), (i * 80, 0, 0))
+                        draw = ImageDraw.Draw(img)
+                        draw.text((10, 25), f"F{i}", fill=(255, 255, 255))
+                        img.save(f"{temp_frame_dir}/frame_{i:03d}.png")
+                    (
+                        ffmpeg.input(f"{temp_frame_dir}/frame_%03d.png", framerate=1)
+                        .output(str(dest_path), vcodec="libx264", pix_fmt="yuv420p", t=1)
+                        .overwrite_output()
+                        .run(quiet=True)
+                    )
+            else:
+                raise RuntimeError("ffmpeg not available")
+        except Exception:
             # Fallback: create a minimal valid MP4 file
             minimal_mp4 = bytes.fromhex(
                 "000000186674797069736f6d0000020069736f6d69736f32617663310000"
                 "0008667265650000002c6d6461740000001c6d6f6f7600000000006d7668"
                 "6400000000000000000000000000000001000000"
             )
             with open(dest_path, "wb") as f:
                 f.write(minimal_mp4)
         return str(dest_path)

If keeping broad except is intentional, add a noqa for BLE001 with a short rationale.

tests/loaders/test_single_turn.py (5)

399-437: Avoid hard-coded asset UUIDs; use the fixture to reduce skips.

Replace the fixed source path with the create_test_image fixture to make this portable and keep the test running across environments.

- def test_convert_local_image_to_base64(self, create_jsonl_file):
+ def test_convert_local_image_to_base64(self, create_jsonl_file, create_test_image):
     """Test that local image files are encoded to base64 data URLs."""
-    test_image = Path("src/aiperf/dataset/generator/assets/source_images/0bfd8fdf-457f-43c8-9253-a2346d37d26a_1024.jpg")
-    if not test_image.exists():
-        pytest.skip("Test image not found")
+    test_image = Path(create_test_image())

Also, narrow the exception in base64 validation:

-    try:
-        base64.b64decode(base64_part)
-    except Exception as e:
+    import binascii
+    try:
+        base64.b64decode(base64_part)
+    except (binascii.Error, ValueError) as e:
         pytest.fail(f"Invalid base64 encoding: {e}")

472-512: Use fixture-driven images instead of hard-coded paths.

Swap the two explicit paths with the test_images fixture to avoid brittle skips.

- test_images = [
-     Path("src/.../source_images/0bfd8fdf-..._1024.jpg"),
-     Path("src/.../source_images/119544eb-..._861.jpg"),
- ]
+ def_imgs = [Path(p) for _, p in sorted(test_images.items())[:2]]
+ test_images = def_imgs

513-552: Prefer create_test_image for the local component in mixed sources.

Keeps the test self-contained and portable.

- test_image = Path("src/aiperf/dataset/generator/assets/source_images/0bfd8fdf-457f-43c8-9253-a2346d37d26a_1024.jpg")
- if not test_image.exists():
-     pytest.skip("Test image not found")
+ test_image = Path(create_test_image())

596-601: Narrow the exception type in audio base64 validation.

Catching Exception is too broad and hides unrelated bugs.

-    try:
-        base64.b64decode(base64_part)
-    except Exception as e:
+    import binascii
+    try:
+        base64.b64decode(base64_part)
+    except (binascii.Error, ValueError) as e:
         pytest.fail(f"Invalid base64 encoding: {e}")

Note: Audio uses "wav," whereas images/videos use data URLs. Consider aligning formats or documenting the difference clearly.


665-668: Same here: narrow the exception type for video base64 validation.

-    try:
-        base64.b64decode(base64_part)
-    except Exception as e:
+    import binascii
+    try:
+        base64.b64decode(base64_part)
+    except (binascii.Error, ValueError) as e:
         pytest.fail(f"Invalid base64 encoding: {e}")
src/aiperf/dataset/loader/models.py (1)

56-69: Reduce duplication in validators to prevent drift.

Extract shared helpers for:

  • mutually exclusive scalar vs list per modality
  • at-least-one-modality checks

This keeps SingleTurn and RandomPool in sync as modalities evolve.

Example helper sketch:

def _ensure_exclusive(self, pairs: list[tuple[object, object]], names: list[tuple[str,str]]):
    for (a,b), (an,bn) in zip(pairs, names):
        if a and b:
            raise ValueError(f"{an} and {bn} cannot be set together")

def _has_any(self, fields: list[object]) -> bool:
    return any(bool(f) for f in fields)

Then call with the relevant fields per model. Also consider rejecting empty lists explicitly if passed.

Also applies to: 149-160, 162-178

src/aiperf/dataset/utils.py (1)

150-153: Consider using shorter exception messages or custom exception classes.

Static analysis suggests avoiding long exception messages outside the exception class. While not critical, consider either shortening these messages or creating custom exception classes if this pattern appears frequently.

Also applies to: 197-200

src/aiperf/endpoints/nim_image_retrieval.py (1)

35-35: Consider using shorter exception messages or custom exception classes.

Static analysis suggests avoiding long exception messages outside the exception class. While not critical for functionality, this is a style consideration.

Also applies to: 46-46, 49-49

src/aiperf/dataset/loader/mixins.py (1)

111-111: Consider using shorter exception messages or custom exception classes.

Static analysis suggests avoiding long exception messages outside the exception class. This is a style consideration and not critical for functionality.

Also applies to: 171-171

src/aiperf/common/enums/metric_enums.py (2)

296-301: Guard conversions between inverted and non‑inverted over‑time units.

Tags reflect inversion, but convert_to does not explicitly prevent converting between inverted and non‑inverted units (e.g., IMAGES_PER_SECOND ↔ MS_PER_IMAGE). Make this fail fast with a clear error to avoid accidental misuse.

Apply this diff:

 class MetricOverTimeUnitInfo(BaseMetricUnitInfo):
@@
     def convert_to(self, other_unit: "MetricUnitT", value: int | float) -> float:
@@
-        if isinstance(other_unit, MetricOverTimeUnit | MetricOverTimeUnitInfo):
+        if isinstance(other_unit, MetricOverTimeUnit | MetricOverTimeUnitInfo):
+            # Disallow conversions across inverted orientation to avoid subtle errors.
+            if self.inverted != other_unit.inverted:
+                raise MetricUnitError(
+                    f"Cannot convert between inverted ('{self.tag}') and non-inverted ('{other_unit.tag}') units. "
+                    "Compute the reciprocal metric explicitly."
+                )
             # Chain convert each unit to the other unit.
             value = self.primary_unit.convert_to(other_unit.primary_unit, value)
             value = self.time_unit.convert_to(other_unit.time_unit, value)
             if self.third_unit and other_unit.third_unit:
                 value = self.third_unit.convert_to(other_unit.third_unit, value)
             return value

Also applies to: 315-336


354-371: Naming and inverted configuration look good; consider optional seconds variants.

IMAGES_PER_SECOND/MS_PER_IMAGE and VIDEOS_PER_SECOND/MS_PER_VIDEO are coherent. If consumers need seconds-per-image/video without rounding to ms, consider adding SECONDS_PER_IMAGE and SECONDS_PER_VIDEO for symmetry; otherwise current time-unit conversions on latency metrics suffice.

Confirm whether UI/CSV exporters ever need “s/image” or “s/video” tags directly.

src/aiperf/metrics/types/image_metrics.py (5)

1-10: Import ClassVar to annotate mutable class attributes.

Needed for RUF012 compliance.

-from aiperf.common.enums import MetricFlags
+from typing import ClassVar
+from aiperf.common.enums import MetricFlags

21-35: Count logic ok; silence unused record_metrics.

The summation matches the stated behavior. Delete the unused parameter to satisfy ARG002 without changing the signature.

     def _parse_record(
         self, record: ParsedResponseRecord, record_metrics: MetricRecordDict
     ) -> int:
         """Parse the number of images from the record by summing the number of images in each turn."""
+        del record_metrics  # unused
         num_images = sum(
             len(image.contents)
             for turn in record.request.turns
             for image in turn.images
         )
         if num_images == 0:
-            raise NoMetricValue(
-                "Record must have at least one image in at least one turn."
-            )
+            raise NoMetricValue("No images found.")
         return num_images

46-49: Annotate mutable class attribute required_metrics with ClassVar.

Avoids it being treated as an instance attribute.

-    required_metrics = {
+    required_metrics: ClassVar[set[str]] = {
         NumImagesMetric.tag,
         RequestLatencyMetric.tag,
     }

71-74: Annotate mutable class attribute required_metrics with ClassVar.

Same as throughput metric.

-    required_metrics = {
+    required_metrics: ClassVar[set[str]] = {
         NumImagesMetric.tag,
         RequestLatencyMetric.tag,
     }

76-84: Silence unused record parameter.

Keeps signature while appeasing ARG002.

     def _parse_record(
         self, record: ParsedResponseRecord, record_metrics: MetricRecordDict
     ) -> float:
         """Parse the image latency from the record by dividing the request latency by the number of images."""
+        del record  # unused
         num_images = record_metrics.get_or_raise(NumImagesMetric)
         request_latency_ms = record_metrics.get_converted_or_raise(
             RequestLatencyMetric, self.unit.time_unit
         )
         return request_latency_ms / num_images
src/aiperf/metrics/types/video_metrics.py (5)

1-10: Import ClassVar for mutable class attribute annotations.

-from aiperf.common.enums import MetricFlags
+from typing import ClassVar
+from aiperf.common.enums import MetricFlags

21-35: Count logic ok; silence unused record_metrics.

     def _parse_record(
         self, record: ParsedResponseRecord, record_metrics: MetricRecordDict
     ) -> int:
         """Parse the number of videos from the record by summing the number of videos in each turn."""
+        del record_metrics  # unused
         num_videos = sum(
             len(video.contents)
             for turn in record.request.turns
             for video in turn.videos
         )
         if num_videos == 0:
-            raise NoMetricValue(
-                "Record must have at least one video in at least one turn."
-            )
+            raise NoMetricValue("No videos found.")
         return num_videos

45-48: Annotate mutable class attribute required_metrics with ClassVar.

-    required_metrics = {
+    required_metrics: ClassVar[set[str]] = {
         NumVideosMetric.tag,
         RequestLatencyMetric.tag,
     }

70-73: Annotate mutable class attribute required_metrics with ClassVar.

-    required_metrics = {
+    required_metrics: ClassVar[set[str]] = {
         NumVideosMetric.tag,
         RequestLatencyMetric.tag,
     }

75-83: Silence unused record parameter.

     def _parse_record(
         self, record: ParsedResponseRecord, record_metrics: MetricRecordDict
     ) -> float:
         """Parse the video latency from the record by dividing the request latency by the number of videos."""
+        del record  # unused
         num_videos = record_metrics.get_or_raise(NumVideosMetric)
         request_latency_ms = record_metrics.get_converted_or_raise(
             RequestLatencyMetric, self.unit.time_unit
         )
         return request_latency_ms / num_videos
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8ddb6b4 and b44390a.

📒 Files selected for processing (21)
  • src/aiperf/common/enums/metric_enums.py (6 hunks)
  • src/aiperf/common/enums/plugin_enums.py (1 hunks)
  • src/aiperf/common/models/__init__.py (2 hunks)
  • src/aiperf/common/models/record_models.py (2 hunks)
  • src/aiperf/dataset/__init__.py (2 hunks)
  • src/aiperf/dataset/loader/mixins.py (4 hunks)
  • src/aiperf/dataset/loader/models.py (9 hunks)
  • src/aiperf/dataset/loader/multi_turn.py (1 hunks)
  • src/aiperf/dataset/loader/random_pool.py (1 hunks)
  • src/aiperf/dataset/loader/single_turn.py (1 hunks)
  • src/aiperf/dataset/utils.py (2 hunks)
  • src/aiperf/endpoints/__init__.py (2 hunks)
  • src/aiperf/endpoints/nim_image_retrieval.py (1 hunks)
  • src/aiperf/metrics/types/image_metrics.py (1 hunks)
  • src/aiperf/metrics/types/video_metrics.py (1 hunks)
  • tests/endpoints/test_nim_image_retrieval_endpoint.py (1 hunks)
  • tests/endpoints/test_nim_image_retrieval_endpoint_parse_response.py (1 hunks)
  • tests/loaders/conftest.py (2 hunks)
  • tests/loaders/test_multi_turn.py (2 hunks)
  • tests/loaders/test_random_pool.py (5 hunks)
  • tests/loaders/test_single_turn.py (6 hunks)
🧰 Additional context used
🪛 Ruff (0.14.1)
src/aiperf/dataset/utils.py

150-153: Avoid specifying long messages outside the exception class

(TRY003)


197-200: Avoid specifying long messages outside the exception class

(TRY003)

src/aiperf/metrics/types/image_metrics.py

22-22: Unused method argument: record_metrics

(ARG002)


31-33: Avoid specifying long messages outside the exception class

(TRY003)


46-49: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


52-52: Unused method argument: record

(ARG002)


71-74: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


77-77: Unused method argument: record

(ARG002)

src/aiperf/dataset/loader/mixins.py

111-111: Avoid specifying long messages outside the exception class

(TRY003)


171-171: Avoid specifying long messages outside the exception class

(TRY003)

tests/loaders/conftest.py

153-153: Do not catch blind exception: Exception

(BLE001)

src/aiperf/dataset/loader/models.py

66-66: Avoid specifying long messages outside the exception class

(TRY003)


159-159: Avoid specifying long messages outside the exception class

(TRY003)

tests/loaders/test_single_turn.py

434-434: Do not catch blind exception: Exception

(BLE001)


599-599: Do not catch blind exception: Exception

(BLE001)


667-667: Do not catch blind exception: Exception

(BLE001)

src/aiperf/endpoints/nim_image_retrieval.py

35-35: Avoid specifying long messages outside the exception class

(TRY003)


46-46: Avoid specifying long messages outside the exception class

(TRY003)


49-49: Avoid specifying long messages outside the exception class

(TRY003)

src/aiperf/metrics/types/video_metrics.py

22-22: Unused method argument: record_metrics

(ARG002)


31-33: Avoid specifying long messages outside the exception class

(TRY003)


45-48: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


51-51: Unused method argument: record

(ARG002)


70-73: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


76-76: Unused method argument: record

(ARG002)

🔇 Additional comments (27)
src/aiperf/common/models/__init__.py (1)

72-72: LGTM! ImageRetrievalResponseData properly exported.

The new response data class is correctly imported and exported following the same pattern as other response data types.

Also applies to: 144-144

src/aiperf/common/models/record_models.py (2)

602-612: LGTM! ImageRetrievalResponseData follows established pattern.

The new response data class is well-structured and consistent with similar non-text response types (EmbeddingResponseData, RankingsResponseData).


623-623: LGTM! ParsedResponse union updated correctly.

ImageRetrievalResponseData properly added to the SerializeAsAny union type.

src/aiperf/dataset/loader/multi_turn.py (1)

142-142: LGTM! Video modality support added consistently.

The videos field is correctly passed to the Turn constructor, following the same pattern as texts, images, and audios.

src/aiperf/dataset/loader/random_pool.py (1)

167-167: LGTM! Video modality support added.

The videos field is correctly passed to the Turn constructor, consistent with the pattern for other modalities.

src/aiperf/common/enums/plugin_enums.py (1)

30-30: LGTM! IMAGE_RETRIEVAL endpoint type added.

The new endpoint type follows the established pattern and naming convention for other endpoint types.

src/aiperf/endpoints/__init__.py (1)

7-9: LGTM! ImageRetrievalEndpoint properly exported.

The new endpoint is correctly imported and exported, following the same pattern as other endpoint implementations.

Also applies to: 28-28

src/aiperf/dataset/loader/single_turn.py (1)

113-113: LGTM! Video modality support added consistently.

The videos field is correctly passed to the Turn constructor, following the same pattern as other modalities.

tests/endpoints/test_nim_image_retrieval_endpoint_parse_response.py (3)

19-36: LGTM! Endpoint fixture properly configured.

The fixture correctly sets up an ImageRetrievalEndpoint with appropriate mocking for the transport layer.


38-68: LGTM! Basic parse response test is comprehensive.

The test validates the complete parsing flow including timestamp preservation, response type verification, and data structure integrity.


70-77: LGTM! Invalid response handling tested.

The test properly verifies that None is returned for invalid/empty responses.

tests/endpoints/test_nim_image_retrieval_endpoint.py (1)

31-43: Happy path looks solid.

Asserting a single image_url item and echoing the data URL is correct for the NIM payload.

tests/loaders/test_random_pool.py (1)

223-250: LGTM for multimodal conversion assertions.

Data URL checks for image and passthrough for audio URL are appropriate.

tests/loaders/test_single_turn.py (1)

310-326: URL passthrough assertions look correct.

Good separation: local files are encoded elsewhere; remote URLs pass through as-is.

src/aiperf/dataset/loader/models.py (1)

42-46: Video modality support is correctly integrated.

Fields and validators mirror existing modalities; docstrings updated accordingly.

Also applies to: 65-67, 75-85, 143-147, 158-160, 166-176

src/aiperf/dataset/utils.py (2)

127-159: Verify type consistency between open_audio return value and encode_audio parameter.

The function returns audio_format.value (a string), but encode_audio expects format: AudioFormat (an enum). This type mismatch could cause confusion and may fail static type checking.

Consider either:

  1. Changing the return type to tuple[bytes, AudioFormat] and returning the enum, or
  2. Updating encode_audio to accept str instead of AudioFormat

Apply this diff to return the enum for consistency:

-    return audio_bytes, audio_format.value
+    return audio_bytes, audio_format

And update the docstring:

     Returns:
-        A tuple of (audio_bytes, format) where format is 'wav' or 'mp3'.
+        A tuple of (audio_bytes, format) where format is an AudioFormat enum.

176-206: Verify type consistency between open_video return value and encode_video parameter.

Similar to open_audio, this function returns video_format.value (a string), but encode_video expects format: VideoFormat (an enum). This creates a type mismatch.

Apply this diff to return the enum for consistency:

-    return video_bytes, video_format.value
+    return video_bytes, video_format

And update the docstring:

     Returns:
-        A tuple of (video_bytes, format) where format is VideoFormat.MP4.
+        A tuple of (video_bytes, format) where format is a VideoFormat enum.
src/aiperf/dataset/__init__.py (1)

40-51: LGTM!

The new audio and video utilities are correctly imported and exported. The public API surface expansion is clean and consistent with existing patterns.

Also applies to: 53-92

src/aiperf/endpoints/nim_image_retrieval.py (2)

23-30: LGTM!

The metadata configuration is appropriate for an image retrieval endpoint.


65-83: LGTM!

The response parsing handles missing JSON and missing data fields appropriately with debug logging. Returning None for unparseable responses appears to be the established pattern in this codebase.

src/aiperf/dataset/loader/mixins.py (4)

47-89: LGTM!

The extended media conversion logic correctly handles video alongside image and audio, with appropriate encoding for local files. The singular and plural field handling is consistent.


91-114: LGTM!

The URL validation logic is robust, correctly handling valid URLs, non-URLs, and raising errors for malformed URLs with only scheme or netloc. This prevents subtle bugs.


144-171: Verify compatibility with utils.py type signatures.

This method calls utils.open_audio and utils.open_video which currently return string format values, but then passes those to utils.encode_audio and utils.encode_video which expect enum types. This works at runtime because the encode functions incorrectly call .lower() on the parameter without .value, but the type signatures are inconsistent.

Ensure the type signature fixes suggested for utils.py are applied consistently, so that:

  • open_audio and open_video return enums
  • encode_audio and encode_video accept enums and call .value.lower()

Or alternatively:

  • All functions use strings consistently

173-202: LGTM!

The media content handling logic is well-structured, checking for already-encoded content (including data URLs) before checking for remote URLs, then treating remaining content as local files. The ordering is correct and prevents data URLs from being misidentified as remote URLs.

src/aiperf/common/enums/metric_enums.py (3)

191-195: Units added are consistent and clear.

IMAGE/IMAGES/VIDEO/VIDEOS naming aligns with existing pattern and tag casing. No issues.


393-397: Good API: expose inverted on the enum.

Surface mirrors info cleanly; helps callers avoid peeking into info.


678-680: Video-only flag addition is sensible and non-breaking.

Bit position continues sequence; no overlap.

@ajcasagrande ajcasagrande self-assigned this Oct 24, 2025
@ajcasagrande ajcasagrande force-pushed the ajc/img-ret branch 2 times, most recently from 3583a74 to de3f73f Compare November 1, 2025 14:00
@ajcasagrande
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Nov 1, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (5)
src/aiperf/common/models/export_models.py (1)

13-13: Consider consistency in base model usage.

JsonMetricResult and JsonExportData now inherit from BaseModel while other export models in this file (TelemetrySummary, GpuSummary, EndpointData, TelemetryExportData) still inherit from AIPerfBaseModel. This creates an inconsistency in the class hierarchy.

While not a functional issue (both work correctly), consider standardizing on one base class for all export models to maintain architectural consistency.

Also applies to: 70-70

tests/endpoints/test_nim_image_retrieval_endpoint.py (1)

31-52: Good test coverage for basic scenarios.

The tests properly validate the basic payload formatting and error handling. The structure is clean and follows pytest conventions.

Consider adding tests for:

  • Multiple images in a single request
  • Batched image inputs
  • Edge cases like empty BASE64 strings or malformed data URLs

These would provide more comprehensive coverage of the format_payload method's behavior.

tests/loaders/test_single_turn.py (1)

396-705: Excellent comprehensive encoding test coverage.

The new test classes thoroughly validate image, audio, and video encoding behaviors across multiple scenarios including local files, URLs, data URLs, and mixed sources. The test structure is well-organized and follows consistent patterns.

Consider catching more specific exceptions in the base64 validation blocks (lines 434, 599, 667) instead of broad Exception. Use binascii.Error which is raised by base64.b64decode for invalid input:

-        except Exception as e:
-            pytest.fail(f"Invalid base64 encoding: {e}")
+        except binascii.Error as e:
+            pytest.fail(f"Invalid base64 encoding: {e}")

This addresses the static analysis hints and improves error specificity.

src/aiperf/dataset/utils.py (1)

156-157: Use file_path variable consistently.

Line 156 uses filename (the string parameter) instead of file_path (the Path object constructed on line 140). While both work here, using file_path would be more consistent with the pattern established in open_video and maintain consistency within the function.

Apply this diff for consistency:

     # Read file bytes
-    with open(filename, "rb") as f:
+    with open(file_path, "rb") as f:
         audio_bytes = f.read()
src/aiperf/dataset/loader/mixins.py (1)

15-18: Update docstring to include video.

The class docstring on line 17 mentions "text, image, and audio" but doesn't include video, which is now supported. For consistency with the implementation and other docstrings in the file, video should be added.

Apply this diff:

 class MediaConversionMixin:
     """Mixin providing shared media conversion functionality for dataset loaders.
-    It is used to construct text, image, and audio data from a CustomDatasetT object.
+    It is used to construct text, image, audio, and video data from a CustomDatasetT object.
     """
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b44390a and de3f73f.

📒 Files selected for processing (33)
  • src/aiperf/common/enums/metric_enums.py (6 hunks)
  • src/aiperf/common/enums/plugin_enums.py (1 hunks)
  • src/aiperf/common/messages/base_messages.py (3 hunks)
  • src/aiperf/common/messages/command_messages.py (3 hunks)
  • src/aiperf/common/models/__init__.py (2 hunks)
  • src/aiperf/common/models/base_models.py (1 hunks)
  • src/aiperf/common/models/dataset_models.py (1 hunks)
  • src/aiperf/common/models/export_models.py (2 hunks)
  • src/aiperf/common/models/record_models.py (2 hunks)
  • src/aiperf/common/models/sequence_distribution.py (2 hunks)
  • src/aiperf/dataset/__init__.py (2 hunks)
  • src/aiperf/dataset/dataset_manager.py (3 hunks)
  • src/aiperf/dataset/generator/prompt.py (1 hunks)
  • src/aiperf/dataset/loader/mixins.py (4 hunks)
  • src/aiperf/dataset/loader/models.py (9 hunks)
  • src/aiperf/dataset/loader/multi_turn.py (1 hunks)
  • src/aiperf/dataset/loader/random_pool.py (1 hunks)
  • src/aiperf/dataset/loader/single_turn.py (1 hunks)
  • src/aiperf/dataset/utils.py (2 hunks)
  • src/aiperf/endpoints/__init__.py (2 hunks)
  • src/aiperf/endpoints/nim_image_retrieval.py (1 hunks)
  • src/aiperf/metrics/types/image_metrics.py (1 hunks)
  • src/aiperf/metrics/types/video_metrics.py (1 hunks)
  • src/aiperf/records/inference_result_parser.py (6 hunks)
  • src/aiperf/records/record_processor_service.py (1 hunks)
  • tests/endpoints/test_nim_image_retrieval_endpoint.py (1 hunks)
  • tests/endpoints/test_nim_image_retrieval_endpoint_parse_response.py (1 hunks)
  • tests/loaders/conftest.py (2 hunks)
  • tests/loaders/test_multi_turn.py (2 hunks)
  • tests/loaders/test_random_pool.py (5 hunks)
  • tests/loaders/test_single_turn.py (6 hunks)
  • tests/parsers/test_usage_passthrough.py (4 hunks)
  • tests/test_messages.py (0 hunks)
💤 Files with no reviewable changes (1)
  • tests/test_messages.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • src/aiperf/dataset/loader/random_pool.py
  • tests/loaders/test_multi_turn.py
  • src/aiperf/dataset/init.py
  • tests/endpoints/test_nim_image_retrieval_endpoint_parse_response.py
  • src/aiperf/common/models/record_models.py
  • src/aiperf/endpoints/init.py
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-10-15T03:24:10.758Z
Learnt from: ajcasagrande
Repo: ai-dynamo/aiperf PR: 359
File: aiperf/metrics/types/time_to_first_output_metric.py:0-0
Timestamp: 2025-10-15T03:24:10.758Z
Learning: In TimeToFirstOutputMetric and similar metrics, invalid timestamp scenarios (where response timestamps precede request start) are automatically caught by the base class validation through the record.valid property, which checks that start_perf_ns < end_perf_ns. This validation happens in _require_valid_record before _parse_record is called, so explicit timestamp validation in _parse_record may be redundant.

Applied to files:

  • src/aiperf/metrics/types/image_metrics.py
  • src/aiperf/metrics/types/video_metrics.py
📚 Learning: 2025-10-03T21:15:21.536Z
Learnt from: ajcasagrande
Repo: ai-dynamo/aiperf PR: 325
File: aiperf/metrics/metric_dicts.py:83-86
Timestamp: 2025-10-03T21:15:21.536Z
Learning: The `MetricFlags.missing_flags(flags)` method returns False if the metric has ANY of the provided flags, and True if it has NONE of them. Therefore, `not missing_flags(MetricFlags.EXPERIMENTAL | MetricFlags.INTERNAL)` correctly evaluates to True when either EXPERIMENTAL or INTERNAL flag is present.

Applied to files:

  • src/aiperf/common/enums/metric_enums.py
📚 Learning: 2025-10-24T04:50:21.306Z
Learnt from: ajcasagrande
Repo: ai-dynamo/aiperf PR: 394
File: src/aiperf/dataset/loader/mixins.py:116-142
Timestamp: 2025-10-24T04:50:21.306Z
Learning: In the aiperf codebase, enums that extend `CaseInsensitiveStrEnum` (defined in src/aiperf/common/enums/base_enums.py) can be directly compared with strings due to a custom `__eq__` implementation. For example, `"wav" in [AudioFormat.WAV, AudioFormat.MP3]` will work correctly without needing to access `.value` on the enum members.

Applied to files:

  • src/aiperf/dataset/loader/mixins.py
  • src/aiperf/dataset/utils.py
📚 Learning: 2025-10-24T04:50:20.836Z
Learnt from: ajcasagrande
Repo: ai-dynamo/aiperf PR: 394
File: src/aiperf/dataset/utils.py:209-220
Timestamp: 2025-10-24T04:50:20.836Z
Learning: In the aiperf codebase, enums that inherit from CaseInsensitiveStrEnum (located in src/aiperf/common/enums/base_enums.py) extend both str and Enum, making enum instances actual string objects. This means string methods like .lower(), .upper(), etc. can be called directly on the enum instances (e.g., `format.lower()`) without needing to access the value property first (e.g., `format.value.lower()`). Examples include AudioFormat and VideoFormat enums.

Applied to files:

  • src/aiperf/dataset/loader/mixins.py
  • src/aiperf/dataset/utils.py
🪛 Ruff (0.14.2)
tests/loaders/test_single_turn.py

434-434: Do not catch blind exception: Exception

(BLE001)


599-599: Do not catch blind exception: Exception

(BLE001)


667-667: Do not catch blind exception: Exception

(BLE001)

src/aiperf/metrics/types/image_metrics.py

22-22: Unused method argument: record_metrics

(ARG002)


31-33: Avoid specifying long messages outside the exception class

(TRY003)


46-49: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


52-52: Unused method argument: record

(ARG002)


71-74: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


77-77: Unused method argument: record

(ARG002)

tests/loaders/conftest.py

182-182: Do not catch blind exception: Exception

(BLE001)

src/aiperf/records/inference_result_parser.py

107-107: Do not catch blind exception: Exception

(BLE001)

src/aiperf/common/models/sequence_distribution.py

344-344: Avoid specifying long messages outside the exception class

(TRY003)

src/aiperf/endpoints/nim_image_retrieval.py

35-35: Avoid specifying long messages outside the exception class

(TRY003)


46-46: Avoid specifying long messages outside the exception class

(TRY003)


59-62: Avoid specifying long messages outside the exception class

(TRY003)

src/aiperf/metrics/types/video_metrics.py

22-22: Unused method argument: record_metrics

(ARG002)


31-33: Avoid specifying long messages outside the exception class

(TRY003)


45-48: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


51-51: Unused method argument: record

(ARG002)


70-73: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


76-76: Unused method argument: record

(ARG002)

src/aiperf/dataset/loader/models.py

66-66: Avoid specifying long messages outside the exception class

(TRY003)


159-159: Avoid specifying long messages outside the exception class

(TRY003)

src/aiperf/dataset/loader/mixins.py

111-111: Avoid specifying long messages outside the exception class

(TRY003)


171-171: Avoid specifying long messages outside the exception class

(TRY003)

src/aiperf/dataset/utils.py

150-153: Avoid specifying long messages outside the exception class

(TRY003)


197-200: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (47)
src/aiperf/common/models/sequence_distribution.py (1)

342-344: Verification confirms code is correct.

All dependencies and implementations are properly in place:

  • orjson~=3.10.18 is declared in pyproject.toml
  • load_json_str utility correctly uses orjson.loads() and raises orjson.JSONDecodeError
  • Exception handling at lines 342–344 properly catches the exception type that load_json_str will raise and converts it to ValueError
src/aiperf/dataset/generator/prompt.py (1)

47-48: Inconsistent tokenizer None-handling between type hint and runtime check.

The added and tokenizer condition on line 47 checks for a None/falsy tokenizer to prevent calling _initialize_corpus(), but the type hint on line 34 declares tokenizer: Tokenizer (not optional). This creates an inconsistency:

  • No test cases or callers pass None for tokenizer
  • BaseComposer enforces tokenizer as required (line 24)
  • Methods like _generate_prompt() (line 129) and _generate_cached_prompt() (line 186) assume self.tokenizer is always available and lack None guards
  • If tokenizer is required per the type contract, the and tokenizer check is unnecessary
  • If tokenizer can be None, the type hint should be updated to Tokenizer | None and defensive checks should be added throughout the class

Resolve the contradiction by either:

  1. Removing the check if tokenizer is always required, or
  2. Updating type hints to Tokenizer | None and adding comprehensive None-handling throughout the class if optional tokenizer is intended
src/aiperf/records/record_processor_service.py (1)

116-116: Tokenizer configure hook matches parser rename

The command handler now calls the updated _configure_tokenizers() coroutine, keeping the profile configure flow in sync with the parser changes. ✔️

src/aiperf/records/inference_result_parser.py (1)

49-189: Nice defensive tokenizer handling

Skipping tokenizer initialization when the endpoint neither tokenizes input nor emits tokens, and threading input_token_count through both success and error paths, gives a tidy perf win without sacrificing metrics fidelity. Looks solid.

tests/parsers/test_usage_passthrough.py (1)

101-217: Tests keep pace with the new API

Updated test calls now pass the explicit input_token_count, so coverage continues to validate the usage passthrough contract without regressions. ✅

src/aiperf/dataset/dataset_manager.py (2)

87-101: LGTM! Conditional tokenizer configuration aligns with multimodal support.

The logic correctly skips tokenizer setup for endpoints that don't tokenize input (e.g., image retrieval), while preserving it for text-based endpoints and the MOONCAKE_TRACE dataset type.


181-181: Verify the serialization behavior change from exclude_unset to exclude_none.

This change affects which fields appear in the generated inputs.json:

  • exclude_unset=True: omits fields not explicitly set (even if they have default values)
  • exclude_none=True: omits only None-valued fields but includes defaulted fields

For multimodal payloads with optional fields (images, videos, audios), this change ensures fields with default values are included. Verify that downstream consumers handle the updated payload structure correctly.

tests/loaders/conftest.py (1)

31-195: LGTM! Test fixtures provide robust media asset generation.

The fixtures properly:

  • Preserve file extensions to avoid MIME type mismatches (addressing past review feedback)
  • Create synthetic assets when sources are unavailable, ensuring tests run independently
  • Support multiple media formats (images, audio, video) for multimodal testing

The broad exception catch at line 182 is appropriate for a test fixture fallback—it gracefully degrades to a minimal MP4 when ffmpeg is unavailable or fails.

tests/loaders/test_random_pool.py (3)

223-250: LGTM! Test correctly uses fixture-generated images and validates encoding.

The test now properly:

  • Uses create_test_image fixture for reproducible test data
  • Validates that images are base64-encoded as data URIs
  • Maintains the URL format for audio (not encoded)

251-274: LGTM! Batched image test correctly validates encoding.

The test properly validates that all batched images are base64-encoded as data URIs, consistent with the multimodal data handling changes.


325-375: LGTM! Multi-file test correctly uses fixtures and validates encoding.

The test properly uses test_images fixture for all image inputs and validates base64 encoding for both turns, aligning with the updated data model.

src/aiperf/common/enums/metric_enums.py (5)

191-194: LGTM! New generic metric units for image/video modalities.

The additions properly extend the unit system to support image and video metrics, consistent with existing patterns for TOKENS and REQUESTS.


290-313: LGTM! Inverted flag enables proper metric tagging.

The inverted field correctly supports metrics where time is the numerator (e.g., "ms/image" instead of "images/ms"), with proper tag generation logic.


354-371: LGTM! Image and video metric units with proper inversion.

The new metric units properly define:

  • Throughput variants (IMAGES_PER_SECOND, VIDEOS_PER_SECOND) with standard ordering
  • Latency variants (MS_PER_IMAGE, MS_PER_VIDEO) with inverted=True for correct tagging

393-396: LGTM! Cached property exposes inverted flag.

The cached property correctly exposes the inverted field from the info object, maintaining consistency with other cached properties in the class.


678-679: LGTM! New flag for video-specific metrics.

The SUPPORTS_VIDEO_ONLY flag follows the established pattern for modality-specific metrics (like SUPPORTS_IMAGE_ONLY and SUPPORTS_AUDIO_ONLY).

src/aiperf/metrics/types/image_metrics.py (3)

12-34: LGTM! NumImagesMetric correctly counts and validates images.

The metric properly:

  • Sums all image contents across all turns
  • Raises NoMetricValue when no images are present, preventing downstream division errors

37-59: LGTM! ImageThroughputMetric correctly computes throughput.

The implementation properly:

  • Retrieves dependent metrics (num_images, request_latency)
  • Performs unit conversion for consistent time units
  • Division by zero is prevented by get_or_raise truthy check (which raises NoMetricValue for zero values)

Based on learnings and past review discussions, the zero-latency guard is unnecessary.


62-84: LGTM! ImageLatencyMetric correctly computes per-image latency.

The metric properly computes latency per image with appropriate unit conversion. Division safety is ensured by upstream validation (NumImagesMetric raises when zero, and get_or_raise catches zero latency).

src/aiperf/metrics/types/video_metrics.py (3)

12-34: LGTM! NumVideosMetric mirrors image metric pattern.

The metric correctly counts videos across all turns and raises NoMetricValue when zero, preventing downstream division errors. Implementation is consistent with NumImagesMetric.


37-58: LGTM! VideoThroughputMetric correctly computes throughput.

The implementation properly retrieves dependencies and performs unit conversion. Division safety is ensured by the same mechanisms as ImageThroughputMetric (truthy check in get_or_raise and upstream validation).

Based on learnings and past review discussions, explicit zero guards are unnecessary.


61-83: LGTM! VideoLatencyMetric mirrors image latency pattern.

The metric correctly computes per-video latency with appropriate unit conversion, consistent with ImageLatencyMetric.

src/aiperf/common/enums/plugin_enums.py (1)

35-35: LGTM! New endpoint type for image retrieval support.

The addition properly extends the EndpointType enum to support the new image retrieval endpoint introduced in this PR.

src/aiperf/dataset/loader/single_turn.py (1)

113-113: LGTM! Video modality support added to single-turn loader.

The addition properly extends the Turn constructor to include video data, consistent with the pattern for images and audios. This aligns with the broader multimodal support changes across all loaders.

src/aiperf/dataset/loader/multi_turn.py (1)

142-142: LGTM! Video support properly integrated.

The addition of videos=media[MediaType.VIDEO] correctly mirrors the existing pattern for text, images, and audio, ensuring consistent multimodal support across all media types.

src/aiperf/common/models/__init__.py (1)

71-71: LGTM! Public API correctly updated.

The addition of ImageRetrievalResponseData to both the imports and __all__ properly exposes the new response data type for the image retrieval endpoint feature.

Also applies to: 147-147

src/aiperf/common/messages/command_messages.py (1)

81-81: LGTM! Improved JSON deserialization.

The switch from json.loads to load_json_str provides better consistency across the codebase and likely adds improved error handling or logging capabilities.

Also applies to: 140-140

src/aiperf/common/models/dataset_models.py (1)

9-9: The review comment is based on incorrect assumptions about the codebase.

After thoroughly searching the codebase, exclude_if_none does not exist—neither as a decorator utility in base_models.py nor as usage anywhere in dataset models. The import at line 9 (from aiperf.common.models.base_models import AIPerfBaseModel) only brings in the base model class, which has no custom serialization utilities.

Since exclude_if_none was never defined or used, removing its import cannot and does not change serialization behavior as claimed. The changes are safe and incur no serialization side effects.

Likely an incorrect or invalid review comment.

src/aiperf/common/models/base_models.py (1)

6-14: Review comment is incorrect — base model methods don't exist in current codebase.

The review comment references removal of exclude_if_none and _serialize_model methods, but neither exists in the current codebase. A search across all Python files returns no matches for these identifiers. The AIPerfBaseModel shown in the snippet is already in its simplified state with only ConfigDict(arbitrary_types_allowed=True, extra="allow").

Serialization behavior is properly controlled at the call site through explicit parameters:

  • Consumers requiring None exclusion use exclude_none=True explicitly (dataset_manager.py, base_messages.py, mock server utils)
  • The JSON exporter uses exclude_unset=True (not exclude_none), preserving explicitly-set None values
  • The buffered JSONL writer explicitly uses exclude_none=False to preserve None values

ZMQ clients transmit messages with model_dump_json() without exclusion parameters, and all test cases pass. This defensive architecture—flexible base model with explicit serialization control at call sites—ensures compatibility without requiring base-level None-exclusion logic.

Likely an incorrect or invalid review comment.

src/aiperf/dataset/utils.py (3)

12-12: LGTM!

The imports are correctly expanded to include AudioFormat and VideoFormat, supporting the new audio and video handling utilities.


162-173: LGTM!

The audio encoding logic correctly produces the "format,base64_data" format as documented. The use of format.lower() on the enum is valid based on the CaseInsensitiveStrEnum implementation.


209-220: LGTM!

The video encoding logic correctly produces a data URL in the format "data:video/{format};base64,{data}", consistent with the pattern used for images. The use of format.lower() is valid based on the CaseInsensitiveStrEnum implementation.

src/aiperf/dataset/loader/models.py (3)

9-9: LGTM!

The Video import is correctly added to support the new video modality fields.


42-87: LGTM!

The video field additions to SingleTurn are consistent with the existing audio and image field patterns. The validators correctly enforce mutual exclusivity between video and videos, and ensure at least one modality (including video) is provided.


143-178: LGTM!

The video field additions to RandomPool mirror the SingleTurn changes and maintain consistency across the codebase. The validators correctly enforce the same mutual exclusivity and at-least-one-modality rules.

src/aiperf/endpoints/nim_image_retrieval.py (3)

19-30: LGTM!

The endpoint registration and metadata are correctly configured. The endpoint path /v1/infer and supports_images=True flag are appropriate for an image retrieval endpoint.


32-68: LGTM!

The payload formatting logic correctly validates and processes all images. The implementation:

  1. Validates at least one image exists (line 45)
  2. Builds the payload from all images with non-empty contents (lines 48-56)
  3. Ensures at least one valid image content was found (lines 58-62)

This comprehensive validation addresses the concern from previous reviews about checking all images, not just the first one.


70-88: LGTM!

The response parsing correctly handles edge cases (missing JSON or data field) by logging and returning None, while successfully parsed responses are wrapped in ParsedResponse with the appropriate ImageRetrievalResponseData.

src/aiperf/dataset/loader/mixins.py (4)

47-89: LGTM!

The updates to _convert_to_media_objects correctly extend media handling to include video alongside image and audio. The logic consistently handles both singular and plural fields by encoding local media files via _handle_media_content.


91-114: LGTM!

The URL validation logic correctly identifies valid URLs (with both scheme and netloc), raises an error for malformed URLs (with only one component), and returns False for non-URLs. This provides clear feedback for invalid input.


116-142: LGTM!

The encoding detection correctly identifies:

  • Image/video: data URL format (scheme == "data")
  • Audio: "format,base64" format with valid audio format prefix

The enum comparison on lines 136-139 works correctly because CaseInsensitiveStrEnum supports direct string comparison. Based on learnings.


173-202: LGTM!

The media content handler correctly implements a three-tier check:

  1. Already encoded (including data URLs) → return as-is
  2. Regular URLs → return as-is
  3. Local file paths → encode to base64

The ordering is important and correct: checking _is_already_encoded before _is_url ensures data URLs (which have a "data" scheme) are treated as already encoded rather than as regular URLs.

src/aiperf/common/messages/base_messages.py (5)

28-37: LGTM!

The automatic subclass registration via __init_subclass__ is a clean approach that eliminates manual registration boilerplate. The check on line 34 ensures only concrete message classes with a defined message_type are registered in the lookup table.


60-73: LGTM!

The updated from_json method correctly uses the lookup table for dynamic dispatch, with proper validation for missing or unknown message types. The use of load_json_str and model_validate follows best practices.


76-86: LGTM!

The optimized from_json_with_type correctly leverages the lookup table and uses model_validate_json directly, avoiding the intermediate JSON parsing step for better performance.


88-89: LGTM!

The explicit exclude_none=True in __str__ provides clearer control over JSON serialization behavior compared to the previous decorator-based approach. This ensures None fields are omitted from the serialized output.


101-106: LGTM!

The new ErrorMessage class correctly extends Message and follows the pattern for concrete message types. It will be automatically registered in the lookup table via the __init_subclass__ hook.

Comment on lines 40 to 43
import orjson

from aiperf.common.utils import load_json_str
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Abstraction leakage: importing orjson only to catch its exception.

The code imports orjson directly (line 40) but uses load_json_str() utility (line 42) for parsing. This creates tight coupling because you're catching orjson.JSONDecodeError (line 343), which exposes the implementation detail that load_json_str uses orjson internally. If load_json_str changes its JSON library, this code will break.

Consider one of these approaches:

Option 1 (Recommended): Use orjson directly

-from aiperf.common.utils import load_json_str
-
-            data = load_json_str(json_str)
+            data = orjson.loads(json_str)

Option 2: Catch a more general exception

-        except orjson.JSONDecodeError as e:
+        except (ValueError, Exception) as e:

Option 3: Have load_json_str wrap exceptions in a library-specific type

This would require changes to the utility function to raise a custom exception type instead of letting orjson exceptions propagate.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/aiperf/common/models/sequence_distribution.py around lines 40–42, do not
import orjson just to catch its JSONDecodeError (leaks implementation of
load_json_str); instead remove the orjson import and change the exception
handling where load_json_str is called to catch a more general /
implementation-agnostic error (e.g., ValueError or json.JSONDecodeError from the
stdlib, or simply Exception if load_json_str can raise varied parsing errors),
or alternatively update load_json_str to raise a custom parsing exception and
catch that here—pick one approach and make the corresponding import and
except-clause change so this module no longer depends on orjson directly.

Comment on lines +163 to +169
elif media_type == MediaType.AUDIO:
audio_bytes, audio_format = utils.open_audio(content)
return utils.encode_audio(audio_bytes, audio_format)

elif media_type == MediaType.VIDEO:
video_bytes, video_format = utils.open_video(content)
return utils.encode_video(video_bytes, video_format)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify type compatibility between open and encode functions.

Lines 164-165 and 168-169 appear to have type mismatches:

  • utils.open_audio returns tuple[bytes, str] (line 159 in utils.py returns audio_format.value)
  • utils.encode_audio expects format: AudioFormat (enum, not string)
  • Similarly, utils.open_video returns a string format but utils.encode_video expects VideoFormat enum

This could cause type checking errors or runtime issues if the encode functions don't handle string inputs gracefully.

Run the following script to check the actual return types and usage:


🏁 Script executed:

#!/bin/bash
# Check the return statements and type annotations for open_* functions
echo "=== open_audio return type ==="
rg -n "def open_audio" --type=py -A 35 | grep -E "(def open_audio|return |-> )"

echo -e "\n=== open_video return type ==="
rg -n "def open_video" --type=py -A 35 | grep -E "(def open_video|return |-> )"

echo -e "\n=== encode_audio signature ==="
rg -n "def encode_audio" --type=py -A 5

echo -e "\n=== encode_video signature ==="
rg -n "def encode_video" --type=py -A 5

Length of output: 1526



---

🏁 Script executed:

```shell
#!/bin/bash
# Check how format parameter is used in encode_audio and encode_video
echo "=== encode_audio implementation ==="
rg -n "def encode_audio" --type=py -A 20

echo -e "\n=== encode_video implementation ==="
rg -n "def encode_video" --type=py -A 20

Length of output: 2232


Fix type annotation mismatches in utils.py.

The functions are annotated inconsistently with their actual behavior:

  1. open_video (line 176): Annotation claims tuple[bytes, VideoFormat] but returns video_format.value (a string)
  2. encode_audio (line 162): Annotated as format: AudioFormat but receives string from open_audio
  3. encode_video (line 209): Annotated as format: VideoFormat but receives string from open_video

While the code works at runtime (strings have .lower() method), the incorrect type annotations create confusion and would break if someone followed the annotations. Correct the return type of open_video to tuple[bytes, str] and update encode_audio/encode_video parameter types to str.

🤖 Prompt for AI Agents
In src/aiperf/dataset/loader/mixins.py around lines 163-169, the calls to
utils.open_audio/open_video and utils.encode_audio/encode_video reveal
mismatched type annotations in src/aiperf/dataset/loader/utils.py: change
open_video's return annotation from tuple[bytes, VideoFormat] to tuple[bytes,
str], and change encode_audio and encode_video parameter annotations for format
from AudioFormat/VideoFormat to str so the annotations match the actual
returned/consumed string values.

@ajcasagrande ajcasagrande marked this pull request as draft November 3, 2025 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants