Skip to content

ITEP-32416 Add FP16 inference with feature flag #233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

itallix
Copy link
Contributor

@itallix itallix commented May 16, 2025

📝 Description

This PR introduces FEATURE_FLAG_FP16_INFERENCE to control which model precision is used for inference operations. The change allows for more efficient resource utilization while maintaining backward compatibility.

Changes:

  • Added new feature flag FEATURE_FLAG_FP16_INFERENCE to control model precision selection
  • Updated model selection logic to prioritize models based on feature flag setting
  • Implemented fallback mechanism between FP16 and FP32 models
  • Ensured appropriate XAI-enabled models are exported based on selected precision
  • Added tests covering both feature flag states

Details

When enabled, the system will:

  • Use FP16 models for inference operations
  • Fall back to FP32 models if FP16 isn't available, or other way around
  • Export appropriate XAI-enabled models according to selected precision

This change optimizes resource utilization by deploying more compact FP16 models that consume less memory and storage while maintaining inference performance.

JIRA: ITEP-32416 ITEP-66504 ITEP-66505

✨ Type of Change

Select the type of change your PR introduces:

  • 🐞 Bug fix – Non-breaking change which fixes an issue
  • 🚀 New feature – Non-breaking change which adds functionality
  • 🔨 Refactor – Non-breaking change which refactors the code base
  • 💥 Breaking change – Changes that break existing functionality
  • 📚 Documentation update
  • 🔒 Security update
  • 🧪 Tests

🧪 Testing Scenarios

Describe how the changes were tested and how reviewers can test them too:

  • ✅ Tested manually
  • 🤖 Run automated end-to-end tests

✅ Checklist

Before submitting the PR, ensure the following:

  • 🔍 PR title is clear and descriptive
  • 📝 For internal contributors: If applicable, include the JIRA ticket number (e.g., ITEP-123456) in the PR title. Do not include full URLs
  • 💬 I have commented my code, especially in hard-to-understand areas
  • 📄 I have made corresponding changes to the documentation
  • ✅ I have added tests that prove my fix is effective or my feature works

@itallix itallix requested a review from a team as a code owner May 16, 2025 11:40
@itallix itallix marked this pull request as draft May 16, 2025 11:41
@github-actions github-actions bot added the IAI Interactive AI backend label May 16, 2025
Copy link
Contributor

@leoll2 leoll2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, minor comments

Comment on lines 426 to 427
# Use ascending order sorting to retrieve the oldest matching document
return self.get_one(extra_filter=query, earliest=True)
models = self.get_all(extra_filter=query)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment at L426 is no longer applicable

logger.warning(f"{primary_precision} model requested but not found. Falling back to {fallback_precision}.")
fallback_model = next((model for model in models if fallback_precision in model.precision), None)
if fallback_model:
return fallback_model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return fallback_model
return fallback_model
logger.warning(f"Fallback {fallback_precision} model also not found.")

Comment on lines 489 to 490
# If we get here, we have matched_docs but none with the expected precisions
return ID()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# If we get here, we have matched_docs but none with the expected precisions
return ID()
# If we get here, we have matched_docs but none with the expected precisions
logger.warning(f"Fallback {fallback_precision} model also not found.")
return ID()

@itallix itallix requested a review from Copilot May 16, 2025 15:01
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a feature flag to toggle FP16 inference and updates model creation, selection logic, and tests to support FP16-first pipeline with FP32 fallback.

  • Introduces FEATURE_FLAG_FP16_INFERENCE and uses it in prepare_train and ModelRepo to choose precision order.
  • Renames mo_fp32_with_xaimo_with_xai across code, fixtures, and tests.
  • Updates ModelRepo.get_latest_model_for_inference* to fetch all matching precisions and implement FP16/FP32 fallback.

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
interactive_ai/workflows/geti_domain/train/job/tasks/prepare_and_train/train_helpers.py Use feature flag to set FP16 or FP32 for XAI model
interactive_ai/workflows/geti_domain/train/job/tasks/evaluate_and_infer/evaluate_and_infer.py Swap mo_fp32_with_xai references to mo_with_xai
interactive_ai/workflows/geti_domain/common/jobs_common/features/feature_flag_provider.py Add FEATURE_FLAG_FP16_INFERENCE enum entry
interactive_ai/workflows/geti_domain/common/jobs_common_extras/mlflow/utils/train_output_models.py Rename mo_fp32_with_xaimo_with_xai in IDs/parse
interactive_ai/libs/iai_core_py/iai_core/repos/model_repo.py Update inference query to include both precisions and implement fallback logic
Tests and fixtures (multiple files) Rename fields/tests for mo_with_xai and cover both flag states
Comments suppressed due to low confidence (2)

interactive_ai/libs/iai_core_py/iai_core/repos/model_repo.py:450

  • Aggregation pipeline lacks a $sort stage to ensure the latest model is returned first; this can lead to selecting an older model ID when multiple precisions exist—add sorting by version or _id before projecting.
matched_docs = list(self.aggregate_read(aggr_pipeline))

interactive_ai/libs/iai_core_py/tests/repos/test_model_repo.py:431

  • [nitpick] Update this docstring to reflect the new FP16-first behavior when the feature flag is enabled, e.g. mention M6_FP16 as expected under fp16-enabled.
The latest model for inference is M4 (the first one generated after the base model).

@@ -420,15 +423,34 @@ def get_latest_model_for_inference(
base_model_id=base_model_id, model_status_filter=model_status_filter
)

# Use ascending order sorting to retrieve the oldest matching document
return self.get_one(extra_filter=query, earliest=True)
models = self.get_all(extra_filter=query)
Copy link
Preview

Copilot AI May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_all returns all matching models in unspecified order, so next(...) may pick an older FP16 model instead of the latest; recommend sorting by version or _id descending (or using a limited aggregate with sort) before selecting primary or fallback precision.

Suggested change
models = self.get_all(extra_filter=query)
# Ensure models are sorted by version (or _id) in descending order
models = self.get_all(extra_filter=query).sort([("version", DESCENDING), ("_id", DESCENDING)])

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IAI Interactive AI backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants