Hyperion: Add verification checker for Consistency Check Service#12290
Hyperion: Add verification checker for Consistency Check Service#12290
Hyperion: Add verification checker for Consistency Check Service#12290Conversation
…ndent-verification-loop
…ndent-verification-loop
|
@khinevich Test coverage could not be fully measured because some tests failed. Please check the workflow logs for details. |
…ndent-verification-loop
|
@khinevich Test coverage could not be fully measured because some tests failed. Please check the workflow logs for details. |
…ndent-verification-loop
…ndent-verification-loop
…tion-loop' of github.com:ls1intum/Artemis into feature/hyperion/consistency-check-independent-verification-loop
|
@khinevich Test coverage has been automatically updated in the PR description. |
|
@khinevich Test coverage has been automatically updated in the PR description. |
|
@khinevich Test coverage could not be fully measured because some tests failed. Please check the workflow logs for details. |
|
@khinevich Test coverage could not be fully measured because some tests failed. Please check the workflow logs for details. |
End-to-End Test Results
Test Strategy: Two-phase execution
Overall: ❌ Phase 2 (remaining tests) failed |
There was a problem hiding this comment.
Pull request overview
Adds an independent LLM-based verification pass after the structural + semantic Hyperion consistency checks to reduce false positives, deduplicate overlaps, and improve issue quality before returning results to instructors.
Changes:
- Add a new verification prompt (
consistency_verification.st) describing a 4-phase cleanup pipeline and a unified JSON output schema. - Extend
HyperionConsistencyCheckServiceto call the verifier, parse unified structured output, and fall back to pre-verification results on failure. - Reduce consistency-check logging verbosity (info → debug) for per-issue output.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/main/resources/prompts/hyperion/consistency_verification.st |
New verifier prompt defining false-positive removal, deduplication, and output schema. |
src/main/java/de/tum/cit/aet/artemis/hyperion/service/HyperionConsistencyCheckService.java |
Orchestrates the new verification call, introduces unified schema parsing, and adds fallback behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/main/java/de/tum/cit/aet/artemis/hyperion/service/HyperionConsistencyCheckService.java
Show resolved
Hide resolved
src/main/java/de/tum/cit/aet/artemis/hyperion/service/HyperionConsistencyCheckService.java
Outdated
Show resolved
Hide resolved
✅ Actions performedReview triggered.
|
1 similar comment
✅ Actions performedReview triggered.
|
Hyperion: Add verification checker for Consistency Check Service
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
WalkthroughAdds a verifier LLM call to Hyperion's consistency check: combined structural+semantic issues are serialized with an injected ObjectMapper and sent to a verification prompt; verifier output is parsed into unified issues and mapped to internal DTOs, with a fallback to combined issues and adjusted logging/observability. Changes
Sequence DiagramsequenceDiagram
participant Checker as Consistency Checker
participant OM as ObjectMapper
participant Verifier as Verifier LLM
participant Schema as StructuredOutputSchema
Checker->>Checker: Combine structural + semantic issues
Checker->>OM: serialize(combinedIssues) -> detected_issues_json
OM-->>Checker: detected_issues_json
Checker->>Verifier: send prompt + detected_issues_json
activate Verifier
Verifier->>Verifier: Phase 1: remove false positives
Verifier->>Verifier: Phase 2: deduplicate & merge locations
Verifier->>Verifier: Phase 3: verify/correct ranges, descriptions, categories
Verifier->>Schema: emit UnifiedConsistencyIssues JSON
Schema-->>Verifier: parsed unified schema
deactivate Verifier
Verifier-->>Checker: verification result (unified issues) or null/error
alt verification succeeded
Checker->>Checker: map unified issues -> ConsistencyIssue DTOs
Checker-->>Caller: return verified, deduplicated issues
else verification failed/null
Checker-->>Caller: return combinedIssues (fallback)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@src/main/java/de/tum/cit/aet/artemis/hyperion/service/HyperionConsistencyCheckService.java`:
- Around line 190-192: The code currently serializes combinedIssues to a bare
JSON array via
HyperionConsistencyCheckService.OBJECT_MAPPER.writeValueAsString(combinedIssues.stream().map(this::mapConsistencyIssueToDto).toList())
and puts it under verificationInput.put("detected_issues_json", issuesJson);
instead wrap the serialized payload in an object with key "issues" so the
verifier and the UnifiedConsistencyIssues parser receive { "issues": [...] } —
e.g., build a wrapper Map or DTO containing "issues" mapped to the list produced
by mapConsistencyIssueToDto and serialize that (or put that wrapper object into
verificationInput as detected_issues_json) instead of the naked array.
- Around line 187-196: The verifier is invoked even when combinedIssues is
empty, exposing full context and allowing synthesis of new issues; change the
logic in HyperionConsistencyCheckService so that if combinedIssues.isEmpty() it
directly sets issueDTOs = List.of() (using mapConsistencyIssueToDto only when
needed) and skips creating issuesJson, populating verificationInput, and calling
runVerificationCheck; ensure the log still reflects that zero issues were
present and only call runVerificationCheck when combinedIssues is non-empty.
In `@src/main/resources/prompts/hyperion/consistency_verification.st`:
- Around line 132-153: The JSON example in consistency_verification.st uses
TypeScript-style union syntax (e.g., "HIGH" | "MEDIUM" | "LOW" and category
values joined with |) which makes the example invalid JSON and breaks downstream
parsing; update the example JSON to use concrete sample values (e.g.,
"severity": "HIGH", "category": "METHOD_RETURN_TYPE_MISMATCH", and one concrete
entry in each "type" field) and move the allowed enum sets for "severity",
"category", and "type" out of the JSON block into plain prose immediately below
the example, listing the valid values for each field so the schema is clear
without embedding non-JSON notation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8dfe4584-b1aa-408e-bad8-473d6d39bd03
📒 Files selected for processing (2)
src/main/java/de/tum/cit/aet/artemis/hyperion/service/HyperionConsistencyCheckService.javasrc/main/resources/prompts/hyperion/consistency_verification.st
src/main/java/de/tum/cit/aet/artemis/hyperion/service/HyperionConsistencyCheckService.java
Show resolved
Hide resolved
src/main/java/de/tum/cit/aet/artemis/hyperion/service/HyperionConsistencyCheckService.java
Outdated
Show resolved
Hide resolved
Claudia-Anthropica
left a comment
There was a problem hiding this comment.
@khinevich The verification pass is a clean addition to the pipeline and the benchmark numbers look solid (precision from 0.61 to 0.90 is a big win). A few things to fix inline though: the ObjectMapper should be injected via Spring, the JSON format sent to the verifier doesn't match the expected output schema, and there's a wasted LLM call when both checkers find nothing. Also, validate-pr-title CI is failing — the title needs the backtick format (e.g. `Hyperion`: Add independent verification pass to consistency check pipeline).
src/main/java/de/tum/cit/aet/artemis/hyperion/service/HyperionConsistencyCheckService.java
Outdated
Show resolved
Hide resolved
src/main/java/de/tum/cit/aet/artemis/hyperion/service/HyperionConsistencyCheckService.java
Outdated
Show resolved
Hide resolved
src/main/java/de/tum/cit/aet/artemis/hyperion/service/HyperionConsistencyCheckService.java
Show resolved
Hide resolved
|
@khinevich Test coverage could not be fully measured because some tests failed. Please check the workflow logs for details. |
✅ Actions performedReview triggered.
|
…ionConsistencyCheckService
…tion-loop' of github.com:ls1intum/Artemis into feature/hyperion/consistency-check-independent-verification-loop
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@src/main/java/de/tum/cit/aet/artemis/hyperion/service/HyperionConsistencyCheckService.java`:
- Line 192: The code currently serializes combinedIssues through
mapConsistencyIssueToDto (which fills defaults) into issuesJson and thus mutates
checker output before verification; instead, serialize a verifier-specific
payload that preserves raw checker fields (allowing null severity/category) into
detected_issues_json by introducing/using a method like
mapConsistencyIssueToVerifierDto or serializeRawConsistencyIssues that does not
apply MEDIUM/METHOD_PARAMETER_MISMATCH defaults, and keep
mapConsistencyIssueToDto only for the final API response after
verification/fallback.
- Around line 320-332: The model class UnifiedConsistencyIssues initializes the
issues field to an empty list which prevents Jackson from distinguishing a
missing JSON property from an intentionally empty array; update the
UnifiedConsistencyIssues definition to remove the default assignment (make the
issues field nullable / omit List.of() default) so deserialization yields null
when the JSON property is missing, and adjust any usages (e.g.,
runVerificationCheck and the analogous check around lines 444-448) to treat a
null issues as a failure (return null or trigger the fallback) rather than an
empty list; ensure the field has appropriate `@JsonProperty` / `@Nullable`
annotations if used in your codebase to preserve schema expectations.
In
`@src/test/java/de/tum/cit/aet/artemis/hyperion/service/HyperionConsistencyCheckServiceTest.java`:
- Line 99: The test checkConsistency_tracksTokenUsageAndCosts() now exercises
three model calls, so update the token-usage assertions or stub the verifier
separately: either change the assertions that inspect resp.tokens() to expect
prompt=300L, completion=150L, total=450L (reflecting three
DefaultUsage(100,50,150) calls), or add a separate stub for the verifier call so
the aggregate remains the previous two-call totals; locate the test method
checkConsistency_tracksTokenUsageAndCosts() and the DefaultUsage(...) stub to
apply the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8586bfb5-9a34-499d-b901-9aec6f99900a
📒 Files selected for processing (2)
src/main/java/de/tum/cit/aet/artemis/hyperion/service/HyperionConsistencyCheckService.javasrc/test/java/de/tum/cit/aet/artemis/hyperion/service/HyperionConsistencyCheckServiceTest.java
src/main/java/de/tum/cit/aet/artemis/hyperion/service/HyperionConsistencyCheckService.java
Show resolved
Hide resolved
src/main/java/de/tum/cit/aet/artemis/hyperion/service/HyperionConsistencyCheckService.java
Show resolved
Hide resolved
src/test/java/de/tum/cit/aet/artemis/hyperion/service/HyperionConsistencyCheckServiceTest.java
Show resolved
Hide resolved
|
@khinevich Test coverage could not be fully measured because some tests failed. Please check the workflow logs for details. |
Claudia-Anthropica
left a comment
There was a problem hiding this comment.
@khinevich All feedback addressed — ObjectMapper is now injected via Spring, and the issues JSON is properly wrapped in the {"issues": [...]} object. Nice work. One small heads-up: the token-usage test (checkConsistency_tracksTokenUsageAndCosts) still asserts 200/100/300 tokens (two calls), but with the verification step there are now three model calls, so those assertions may need to become 300/150/450. CI failures here look unrelated (DB timeouts, no client changes).
Summary
Adds an independent verification pass to the Hyperion consistency check pipeline. After the structural and semantic checkers run in parallel, a third LLM call reviews
their combined output — removing false positives, deduplicating overlapping issues reported by both checkers, and sharpening surviving issues (line numbers, descriptions,
categories). The result is a significantly cleaner and more actionable issue list for instructors.
Checklist
General
Server
Motivation and Context
The existing two-checker approach (structural + semantic) produces a high rate of false positives: pedagogical stubs, intentional template incompleteness, and
near-duplicate issues reported independently by both checkers all surface as apparent inconsistencies. This noise reduces trust in the tool and forces instructors to
manually triage results before acting on them.
Description
Introduces a post-processing verification step in
HyperionConsistencyCheckService:consistency_verification.st).(merging overlapping issues from both checkers), issue improvement (correcting line numbers, sharpening descriptions, fixing miscategorised categories), and
self-verification.
UnifiedConsistencyIssuesstructured output schema covers all six issue categories and is used exclusively by the verifier.Before / After comparison:
Aggregate Results
Steps for Testing
Prerequisites:
Testserver States
You can manage test servers using Helios. Check environment statuses in the environment list. To deploy to a test server, go to the CI/CD page, find your PR or branch, and trigger the deployment.
Review Progress
Performance Review
Code Review
Manual Tests
Test Coverage
Warning: Server tests failed. Coverage could not be fully measured. Please check the workflow logs.
Last updated: 2026-03-21 19:04:41 UTC