- 
                Notifications
    
You must be signed in to change notification settings  - Fork 328
 
fix: removes checking model directly for embedding dimensions #5445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Simplifies test implementation by removing runtime model validation checks. The PR removes code that was instantiating embedding models in CI and verifying that AutoConfig.hidden_size matches the actual embedding dimensions from model.get_sentence_embedding_dimension().
Major changes:
- Relocated test file from 
tests/ai/test_sentence_transformers.pytotests/ai/transformers/test_transformers_text_embedder.py - Removed 
IS_CIimport and conditional logic for running models in CI - Removed runtime validation that instantiated models and checked actual embedding dimensions
 - Removed actual text embedding tests (
embedder.embed_text(test_texts)) - Simplified 
test_sentence_transformers_text_embedder_otherto only verify descriptor metadata without model instantiation 
Confidence Score: 4/5
- This PR is safe to merge with minor concerns about test coverage reduction
 - The change removes runtime validation but trusts that 
AutoConfig.hidden_sizeaccurately reflects embedding dimensions. While this simplifies tests and reduces CI time, it removes validation that the descriptor's dimensions match actual model behavior. The PR description mentions this is the "responsibility of the descriptor" which is architecturally correct, but reduces test coverage for catching mismatches between AutoConfig metadata and actual embedding dimensions - No files require special attention - this is a straightforward test simplification
 
Important Files Changed
File Analysis
| Filename | Score | Overview | 
|---|---|---|
| tests/ai/transformers/test_transformers_text_embedder.py | 5/5 | Moved test file from tests/ai/ to tests/ai/transformers/ and removed runtime validation tests that checked model dimensions against actual embeddings | 
Sequence Diagram
sequenceDiagram
    participant Test as Test Suite
    participant Provider as TransformersProvider
    participant Descriptor as TextEmbedderDescriptor
    participant AutoConfig as AutoConfig (HuggingFace)
    participant Model as SentenceTransformer Model
    Note over Test,Model: Before (Removed Code)
    Test->>Provider: get_text_embedder(model_name)
    Provider->>Descriptor: Create descriptor
    Test->>Descriptor: get_dimensions()
    Descriptor->>AutoConfig: from_pretrained(model).hidden_size
    AutoConfig-->>Descriptor: hidden_size value
    Descriptor-->>Test: EmbeddingDimensions(hidden_size)
    Test->>Descriptor: instantiate()
    Descriptor->>Model: Load SentenceTransformer
    Model-->>Descriptor: model instance
    Descriptor-->>Test: embedder
    Test->>Model: get_sentence_embedding_dimension()
    Model-->>Test: true_dimensions
    Note over Test: Assert descriptor dims == true dims
    Test->>Model: embed_text(["Hello", "Bye"])
    Model-->>Test: embeddings
    Note over Test: Verify embedding lengths
    Note over Test,AutoConfig: After (Current Code)
    Test->>Provider: get_text_embedder(model_name)
    Provider->>Descriptor: Create descriptor
    Test->>Descriptor: get_dimensions()
    Descriptor->>AutoConfig: from_pretrained(model).hidden_size
    AutoConfig-->>Descriptor: hidden_size value
    Descriptor-->>Test: EmbeddingDimensions(hidden_size)
    Note over Test: Assert descriptor dims == expected dims
    Note over Test,Model: No model instantiation or embedding tests
    1 file reviewed, no comments
          Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@            Coverage Diff             @@
##             main    #5445      +/-   ##
==========================================
- Coverage   71.48%   71.47%   -0.02%     
==========================================
  Files         996      996              
  Lines      126405   126405              
==========================================
- Hits        90363    90349      -14     
- Misses      36042    36056      +14     🚀 New features to boost your workflow:
  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Changes Made
Don't check the model directly for embedding dimensions, this is the responsibility of the descriptor and fix the refactor.
Related Issues
https://dist-data.slack.com/archives/C052CA6Q9N1/p1761328915396179
Checklist
docs/mkdocs.ymlnavigation