Skip to content

Refactor generation of reports in clients#1969

Draft
vpetrovicTT wants to merge 41 commits intodevfrom
vpetrovic/feature/1807-generate-report-in-clients
Draft

Refactor generation of reports in clients#1969
vpetrovicTT wants to merge 41 commits intodevfrom
vpetrovic/feature/1807-generate-report-in-clients

Conversation

@vpetrovicTT
Copy link
Collaborator

@vpetrovicTT vpetrovicTT commented Feb 2, 2026

Currently, all media clients (audio_client.py, image_client.py, cnn_client.py, tts_client.py, embedding_client.py) implement their own _generate_report() methods with significant code duplication. Refactor report generation into a scalable and reusable mechanism.

@vpetrovicTT vpetrovicTT changed the title Refactor generate report in clients Refactor generation of reports in clients Feb 2, 2026
@fivanovicTT fivanovicTT changed the base branch from main to dev February 2, 2026 10:47
@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2026

✅ Test Coverage Report

Coverage of Changed Lines

Metric Value
Coverage %
Threshold 50%
Status ✅ PASSED

💡 This checks coverage of newly added/modified lines only, not total codebase coverage.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2026

✅ Test Results - PASSED

Summary

Component Total Passed Skipped Failed Status
tt-inference-server 385 385 0 0
tt-media-server 464 464 0 0
Overall 849 849 0 0

Details

  • Python Version: 3.10
  • Workflow: Test Gate
  • Commit: 687179a
  • Run ID: 21898176042

🎉 All tests passed! This PR is ready for review.

@vpetrovicTT vpetrovicTT linked an issue Feb 2, 2026 that may be closed by this pull request
Copy link
Collaborator

@fivanovicTT fivanovicTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having metrics_utils since we can utilize these across all clients.

However, I think we can improve upon this refactoring. Here are my suggestions:

metric_utils.py

We can extend BaseTestStatus to have an abstract method / interface so that each status class (eg: ImageGenerationTestStatus) can declare which metrics it provides. Image client will have different methods than audio client for example

Bonus points: we can have a single method in BaseTestStatus:

def get_metrics(self) -> Dict[str, float]:
    """Return available metrics for this status type.
    
    Returns dict with keys like 'elapsed', 'ttft', 'rtr', 'tsu', etc.
    Only includes metrics that have values (not None).
    """
    pass

Each subclass implements it, returning only non-None metrics

report generation

While looking at the codebase, I saw that also we should extract eval report generation functionality. Having this also in mind, I think it would be really good to separate this from base_strategy_interface.py (you can create a new file report_utils.py)

We will have these 2 methods:

_generate_benchmark_report(status_list, task_type, extra_data=None)
_generate_eval_report(status_list, task_type, extra_data=None)

where each client would only provide:

task_type (string like "image", "audio", etc.)
extra_benchmarks dict for client-specific fields

Bonus points: We should utilize Dependency Inversion and use ReportContext object for example to pass necessary data to ReportGenerator

With these changes we will have: testability (easy to test each component), single responsibility, loose coupling and reusability

Copy link
Collaborator

@fivanovicTT fivanovicTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comments and improvements:

  1. Let's keep base_strategy_interface.py clean -> don't introduce report related methods which are not part of that class. We are breaking single responsibility here and bloating the class (they should be in report related class).

I think you can just make ReportGenerator available to clients and let them handle the rest.

  1. We can have simpler clients with something like this:
class TtsClientStrategy(BaseMediaStrategy):
    def run_benchmark(self, num_calls: int) -> list[TtsTestStatus]:
        # ... run benchmark ...
        status_list = self._run_tts_benchmark(num_calls)
        
        # Client builds its own extras and calls ReportGenerator directly
        context = ReportContext(
            model_name=self.config.model_spec.model_name,
            device_name=self.config.device_name,
            output_path=self.config.output_path,
            model_id=self.config.model_spec.model_id,
        )
        extras = {"ttft": calculate_ttft(status_list) / 1000, ...}
        self.report_generator.generate_benchmark_report(context, status_list, "tts", extras)
        
        return status_list
  1. Could you think of way to even more simplify metrics_utils.py - seems to me there are too many granularity and it could be even more simplified? If TestStatus._METRIC_ATTRS already defines what fields exist and get_metrics() returns them, why do we need separate calculate_ttft(), calculate_rtr() functions?

Let's see how can we improve current implementation. Keep up the great work! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tech Debt - _generate_report in clients

2 participants