-
Notifications
You must be signed in to change notification settings - Fork 21
ITEP-32416 Add FP16 inference with feature flag #233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, minor comments
# Use ascending order sorting to retrieve the oldest matching document | ||
return self.get_one(extra_filter=query, earliest=True) | ||
models = self.get_all(extra_filter=query) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment at L426 is no longer applicable
logger.warning(f"{primary_precision} model requested but not found. Falling back to {fallback_precision}.") | ||
fallback_model = next((model for model in models if fallback_precision in model.precision), None) | ||
if fallback_model: | ||
return fallback_model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return fallback_model | |
return fallback_model | |
logger.warning(f"Fallback {fallback_precision} model also not found.") |
# If we get here, we have matched_docs but none with the expected precisions | ||
return ID() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# If we get here, we have matched_docs but none with the expected precisions | |
return ID() | |
# If we get here, we have matched_docs but none with the expected precisions | |
logger.warning(f"Fallback {fallback_precision} model also not found.") | |
return ID() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a feature flag to toggle FP16 inference and updates model creation, selection logic, and tests to support FP16-first pipeline with FP32 fallback.
- Introduces
FEATURE_FLAG_FP16_INFERENCE
and uses it inprepare_train
andModelRepo
to choose precision order. - Renames
mo_fp32_with_xai
→mo_with_xai
across code, fixtures, and tests. - Updates
ModelRepo.get_latest_model_for_inference*
to fetch all matching precisions and implement FP16/FP32 fallback.
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
interactive_ai/workflows/geti_domain/train/job/tasks/prepare_and_train/train_helpers.py | Use feature flag to set FP16 or FP32 for XAI model |
interactive_ai/workflows/geti_domain/train/job/tasks/evaluate_and_infer/evaluate_and_infer.py | Swap mo_fp32_with_xai references to mo_with_xai |
interactive_ai/workflows/geti_domain/common/jobs_common/features/feature_flag_provider.py | Add FEATURE_FLAG_FP16_INFERENCE enum entry |
interactive_ai/workflows/geti_domain/common/jobs_common_extras/mlflow/utils/train_output_models.py | Rename mo_fp32_with_xai → mo_with_xai in IDs/parse |
interactive_ai/libs/iai_core_py/iai_core/repos/model_repo.py | Update inference query to include both precisions and implement fallback logic |
Tests and fixtures (multiple files) | Rename fields/tests for mo_with_xai and cover both flag states |
Comments suppressed due to low confidence (2)
interactive_ai/libs/iai_core_py/iai_core/repos/model_repo.py:450
- Aggregation pipeline lacks a
$sort
stage to ensure the latest model is returned first; this can lead to selecting an older model ID when multiple precisions exist—add sorting by version or_id
before projecting.
matched_docs = list(self.aggregate_read(aggr_pipeline))
interactive_ai/libs/iai_core_py/tests/repos/test_model_repo.py:431
- [nitpick] Update this docstring to reflect the new FP16-first behavior when the feature flag is enabled, e.g. mention M6_FP16 as expected under
fp16-enabled
.
The latest model for inference is M4 (the first one generated after the base model).
@@ -420,15 +423,34 @@ def get_latest_model_for_inference( | |||
base_model_id=base_model_id, model_status_filter=model_status_filter | |||
) | |||
|
|||
# Use ascending order sorting to retrieve the oldest matching document | |||
return self.get_one(extra_filter=query, earliest=True) | |||
models = self.get_all(extra_filter=query) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_all returns all matching models in unspecified order, so next(...)
may pick an older FP16 model instead of the latest; recommend sorting by version or _id
descending (or using a limited aggregate with sort) before selecting primary or fallback precision.
models = self.get_all(extra_filter=query) | |
# Ensure models are sorted by version (or _id) in descending order | |
models = self.get_all(extra_filter=query).sort([("version", DESCENDING), ("_id", DESCENDING)]) |
Copilot uses AI. Check for mistakes.
📝 Description
This PR introduces
FEATURE_FLAG_FP16_INFERENCE
to control which model precision is used for inference operations. The change allows for more efficient resource utilization while maintaining backward compatibility.Changes:
FEATURE_FLAG_FP16_INFERENCE
to control model precision selectionDetails
When enabled, the system will:
This change optimizes resource utilization by deploying more compact FP16 models that consume less memory and storage while maintaining inference performance.
JIRA: ITEP-32416 ITEP-66504 ITEP-66505
✨ Type of Change
Select the type of change your PR introduces:
🧪 Testing Scenarios
Describe how the changes were tested and how reviewers can test them too:
✅ Checklist
Before submitting the PR, ensure the following: