Add intent extraction test suite and fix priority extraction#181
Open
anfredette wants to merge 2 commits intollm-d-incubation:mainfrom
Open
Add intent extraction test suite and fix priority extraction#181anfredette wants to merge 2 commits intollm-d-incubation:mainfrom
anfredette wants to merge 2 commits intollm-d-incubation:mainfrom
Conversation
Add comprehensive test suite for LLM intent extraction covering all 9 use cases, GPU/model extraction, priority enforcement, user counts, conversation history, and UI button strings. Tests require Ollama and are run explicitly via `make test-intent`. Two smoke tests run as part of `make test-integration`. Assisted-by: Claude <noreply@anthropic.com> Signed-off-by: Andre Fredette <afredette@redhat.com>
49e3c52 to
b452605
Compare
Priority extraction fix: - LLM now returns *_mentioned booleans alongside *_priority values. Post-processing trusts priority only when _mentioned=true; resets to "medium" otherwise. This prevents the LLM (qwen2.5:7b) from inferring priorities from use-case type rather than explicit user statements. The SLO profiles already handle use-case-appropriate targets. Prompt rewrite (for smaller LLMs): - Replace verbose prose prompt with a short, directive format using ordered pattern-matching rules. The prompt is now self-contained (schema embedded inline) so INTENT_EXTRACTION_SCHEMA constant and the schema_description parameter on extract_structured_data() are removed. - Remove experience_class, complexity_priority, and additional_context from the LLM prompt. experience_class is inferred deterministically from use_case in post-processing; complexity_priority and additional_context were never consumed downstream. Post-processing hardening: - Case-insensitive normalization for domain_specialization, experience_class, and *_mentioned booleans (handles string "True"). - Lowercase use_case before alias/fuzzy lookup so mixed-case LLM responses like "Text_Summarization" are handled correctly. - Add logger.warning when an unrecognized use_case cannot be resolved by alias map or fuzzy match. - Priority value aliases (e.g. "very_high" -> "high") applied before validation. - Remove stale complexity_priority from test helper _base_intent(). - Add 4 unit tests for case-insensitive normalization. Assisted-by: Claude <noreply@anthropic.com> Signed-off-by: Andre Fredette <afredette@redhat.com>
b452605 to
ad8c9a7
Compare
Collaborator
Author
|
@amito I suggest you test this with the use cases that were giving you trouble. And, better yet, we can add them to the test cases now or in a later pr. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add 39-scenario test suite for LLM intent extraction covering all 9 use cases, GPU/model extraction, priority enforcement, user counts, conversation history, and UI button strings. Run explicitly via make test-intent (requires Ollama); 2 smoke tests run as part of make test-integration.
Rewrite the intent extraction prompt as a short, directive format with ordered pattern-matching rules, replacing the verbose prose prompt. Remove experience_class, complexity_priority, and additional_context from the prompt — fields that were either inferred deterministically or never consumed downstream. Consolidate the prompt as self-contained (remove INTENT_EXTRACTION_SCHEMA constant and schema_description parameter from extract_structured_data).
Fix unreliable priority extraction: the LLM (qwen2.5:7b) was inferring priorities from use case type rather than from explicit user statements, causing incorrect scoring weights. The LLM now returns *_mentioned booleans alongside *_priority values, and post-processing resets priority to "medium" when the topic was not explicitly mentioned.
Harden post-processing: case-insensitive normalization for domain_specialization, experience_class, use_case, and *_mentioned booleans; priority value aliases (e.g. "very_high" -> "high") applied before validation; logger.warning when an unrecognized use_case cannot be resolved by alias map or fuzzy match. Add 4 unit tests for case-insensitive normalization.