-
Notifications
You must be signed in to change notification settings - Fork 115
feat: DIA-2067: Add message type counters in MessageBuilder #369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces message type counters in the MessageBuilder functionality and updates tests and usage reporting accordingly. Key changes include:
- Adding "message_counts" to expected results in tests.
- Adjusting LLM usage functions to include message type counts.
- Updating dependencies and refactoring message parsing code.
Reviewed Changes
Copilot reviewed 9 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_stream_inference.py | Added "message_counts" field to the expected output dictionary. |
| tests/test_multiimage_support.py | Updated test assertions and added message trimming test logic. |
| tests/test_llm.py | Updated expected cost values and "message_counts" in test outputs. |
| pyproject.toml | Added mypy dependency. |
| adala/utils/parse.py | Refactored message parsing models and removed the split function. |
| adala/utils/llm_utils.py | Incorporated message counters into usage data and refactored LLM calls. |
Files not reviewed (2)
- .cursor/rules/common-rules.mdc: Language not supported
- .cursorignore: Language not supported
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #369 +/- ##
==========================================
+ Coverage 69.07% 70.23% +1.15%
==========================================
Files 49 51 +2
Lines 2862 2980 +118
==========================================
+ Hits 1977 2093 +116
- Misses 885 887 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements message type counter support in the MessageBuilder and updates tests and related utilities to track message counts. Key changes include:
- Adding a "_message_counts" field to usage data in multiple test files and LLM utility functions.
- Updating tests in various modules to assert the presence and correctness of message counts.
- Removing legacy functions (e.g. split_message_into_chunks) and updating return types in MessagesBuilder.
Reviewed Changes
Copilot reviewed 13 out of 15 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_stream_inference.py | Adds a "_message_counts" entry to the expected test output |
| tests/test_multiimage_support.py | Adds a new test to validate message trimming and confirms message counts |
| tests/test_llm_utils.py | Adds tests for message type counting |
| tests/test_llm.py | Updates expected test outputs; removes split_message_into_chunks test |
| tests/conftest.py | Updates VCR configuration to filter additional query parameters |
| pyproject.toml | Upgrades pytest and adds mypy for code quality checks |
| adala/utils/token_counter.py | Introduces a new token counter implementation using LiteLLM |
| adala/utils/parse.py | Removes legacy message chunk splitting code, streamlining template parsing |
| adala/utils/llm_utils.py | Modifies usage dict creation to include message counts and adjusts message extraction from MessagesBuilder |
Files not reviewed (2)
- .cursor/rules/common-rules.mdc: Language not supported
- .cursorignore: Language not supported
Comments suppressed due to low confidence (4)
tests/test_llm.py:148
- Removal of the test_split_message_into_chunks function may reduce test coverage for the message chunk splitting functionality. Consider retaining or replacing these tests if the functionality is still required.
def test_split_message_into_chunks():
adala/utils/llm_utils.py:351
- The usage of messages_builder.get_messages(payload).messages assumes that get_messages now returns an object with a 'messages' attribute. Ensure that this change is consistent throughout the codebase and that the return type is properly documented.
messages = messages_builder.get_messages(payload).messages
adala/utils/llm_utils.py:420
- The change to access a 'messages' attribute from get_messages(payload) needs consistency checks across all consumers of this method. Please verify that all related code and documentation are updated accordingly.
messages = messages_builder.get_messages(payload).messages
adala/utils/llm_utils.py:492
- Ensure that returning an object with a 'messages' property from get_messages(payload) is the intended design, and that all invocations are updated consistently to prevent potential runtime errors.
messages = messages_builder.get_messages(payload).messages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please no abstract classes here :( keep it to the minimum layers of abstraction necessary, here you can just use the litellm token count functions directly https://docs.litellm.ai/docs/completion/token_usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're trying to build the codebase using SOLID principles to resolve potential dependency inversion problems. Specifically, every time you design a new token system, a method of tokenizing, or rely on different ways of counting tokens, you end up with a tightly coupled dependency on litellm. I envision a more sophisticated token management scenario in the future; therefore, decoupling and abstracting this component is essential, as well as in general following the standards of code development.
matt-bernstein
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you remove cursorrules from this PR so we can review it separately?
No description provided.