Refactor generation of reports in clients#1969
Conversation
✅ Test Coverage ReportCoverage of Changed Lines
|
✅ Test Results - PASSEDSummary
Details
🎉 All tests passed! This PR is ready for review. |
fivanovicTT
left a comment
There was a problem hiding this comment.
I like the idea of having metrics_utils since we can utilize these across all clients.
However, I think we can improve upon this refactoring. Here are my suggestions:
metric_utils.py
We can extend BaseTestStatus to have an abstract method / interface so that each status class (eg: ImageGenerationTestStatus) can declare which metrics it provides. Image client will have different methods than audio client for example
Bonus points: we can have a single method in BaseTestStatus:
def get_metrics(self) -> Dict[str, float]:
"""Return available metrics for this status type.
Returns dict with keys like 'elapsed', 'ttft', 'rtr', 'tsu', etc.
Only includes metrics that have values (not None).
"""
pass
Each subclass implements it, returning only non-None metrics
report generation
While looking at the codebase, I saw that also we should extract eval report generation functionality. Having this also in mind, I think it would be really good to separate this from base_strategy_interface.py (you can create a new file report_utils.py)
We will have these 2 methods:
_generate_benchmark_report(status_list, task_type, extra_data=None)
_generate_eval_report(status_list, task_type, extra_data=None)
where each client would only provide:
task_type (string like "image", "audio", etc.)
extra_benchmarks dict for client-specific fields
Bonus points: We should utilize Dependency Inversion and use ReportContext object for example to pass necessary data to ReportGenerator
With these changes we will have: testability (easy to test each component), single responsibility, loose coupling and reusability
fivanovicTT
left a comment
There was a problem hiding this comment.
General comments and improvements:
- Let's keep
base_strategy_interface.pyclean -> don't introduce report related methods which are not part of that class. We are breaking single responsibility here and bloating the class (they should be in report related class).
I think you can just make ReportGenerator available to clients and let them handle the rest.
- We can have simpler clients with something like this:
class TtsClientStrategy(BaseMediaStrategy):
def run_benchmark(self, num_calls: int) -> list[TtsTestStatus]:
# ... run benchmark ...
status_list = self._run_tts_benchmark(num_calls)
# Client builds its own extras and calls ReportGenerator directly
context = ReportContext(
model_name=self.config.model_spec.model_name,
device_name=self.config.device_name,
output_path=self.config.output_path,
model_id=self.config.model_spec.model_id,
)
extras = {"ttft": calculate_ttft(status_list) / 1000, ...}
self.report_generator.generate_benchmark_report(context, status_list, "tts", extras)
return status_list
- Could you think of way to even more simplify
metrics_utils.py- seems to me there are too many granularity and it could be even more simplified? IfTestStatus._METRIC_ATTRSalready defines what fields exist and get_metrics() returns them, why do we need separatecalculate_ttft(),calculate_rtr()functions?
Let's see how can we improve current implementation. Keep up the great work! 🚀
Currently, all media clients (audio_client.py, image_client.py, cnn_client.py, tts_client.py, embedding_client.py) implement their own _generate_report() methods with significant code duplication. Refactor report generation into a scalable and reusable mechanism.