Skip to content

Auto-fill thesis abstract from uploaded PDFs#1127

Merged
krusche merged 8 commits into
developfrom
feature/auto-extract-abstract
Jun 21, 2026
Merged

Auto-fill thesis abstract from uploaded PDFs#1127
krusche merged 8 commits into
developfrom
feature/auto-extract-abstract

Conversation

@krusche

@krusche krusche commented Jun 20, 2026

Copy link
Copy Markdown
Member

Summary

Automatically fills a thesis's abstract from the uploaded PDF, so the field is populated consistently instead of relying on students to copy it by hand (today it's manual, and the dashboard even nags when it's blank).

  • On proposal upload (proposal phase) and thesis-document upload (writing phase, file type THESIS), the server extracts the abstract from the PDF.
  • Extraction is deterministic and in-process via iText (already a dependency) — no external/LLM calls, so unpublished thesis text never leaves the system and the output is the PDF's exact words. It locates the Abstract/Summary heading, bounds it at the next section, rebuilds paragraphs from vertical gaps, and rejoins line-end hyphenation (com-/prehensivecomprehensive).

Demo

End-to-end in the real UI (student account). To embed: open this description in the editor and drag each file from the chat onto the matching marker line (GitHub uploads it and inserts the image/video), then delete the marker text.

1. A blank abstract is auto-filled silently — uploading the thesis PDF populates the empty field with the extracted text, line-end hyphenation rejoined (com-/prehensivecomprehensive). Before upload the field is empty (01-before-upload-empty-abstract.png).

↓ drag 02-blank-autofilled.png here ↓

2. An existing abstract is never overwritten silently — when the abstract already has content, a new upload opens a confirmation modal (so it can't be overlooked) showing the extracted text vs. the current one, with Use extracted abstract / Keep current. If the extracted text matches the current abstract, no modal is shown.

↓ drag 03-confirmation-modal.png here ↓

3. Confirming replaces the abstract with the extracted version; Keep current (or closing the modal) leaves it untouched.

↓ drag 04-after-confirm.png here ↓

Screencast (full flow):

↓ drag abstract-extraction-screencast.webm here ↓

How "100% correct" is honored

We can't always locate the abstract reliably across varied templates, so correctness is enforced by a confidence gate — we never store a wrong abstract:

Outcome Behavior
Confident (clear heading + boundary + sane length) Auto-fill the abstract
Uncertain Stage a suggestion; a confirmation modal appears after upload (Use extracted abstract / Keep current)
None Leave it blank — manual entry, as before

Overwrite rules track the abstract's source: a blank abstract is auto-filled silently; anything that would replace existing text is confirmed via the modal, so a human-edited abstract is never overwritten without explicit confirmation. An extraction matching the current abstract is ignored. Extraction is fully guarded — an unreadable / image-only / encrypted PDF is a no-op and never breaks the upload.

Changes

Server

  • core/utility/AbstractExtractor — the PDF→abstract extractor (heading/boundary detection, paragraph rebuild, de-hyphenation).
  • thesis/service/AbstractAutoFillService — fill / suggest / overwrite rules + guarded process.
  • New abstract_source + abstract_suggestion columns (migration 40_abstract_extraction.sql); ThesisAbstractSource enum.
  • Wired into ThesisService.uploadProposal / uploadThesisFile; updateThesisInfo marks MANUAL and clears the suggestion; new POST …/abstract-suggestion/accept and …/dismiss endpoints; ThesisDto exposes the fields; anonymization clears the suggestion.

Client

  • AbstractSuggestionModal in ThesisInfoSection (Use extracted abstract / Keep current), opened automatically after an upload that would replace the abstract; IThesis gains abstractSource / abstractSuggestion; confident fills of a blank abstract surface via the existing thesis refresh.

Tests (TDD)

  • AbstractExtractorTest (6) — built PDFs in-process at exact coordinates: confident extraction with de-hyphenation + paragraph split + boundary, English-only (German Zusammenfassung → none), over-length → uncertain, no-boundary → uncertain, keep-hyphen before uppercase/digit.
  • AbstractAutoFillServiceTest (7) — every fill/suggest/overwrite branch + guarded no-op on bad bytes.
  • ThesisControllerTest (+3) — accept/dismiss endpoints + access control (real DB + Liquibase).
  • Full server suite green (880); client build / tsc / eslint clean.

Notes / out of scope

English headings only; German/configurable headings, OCR for scanned PDFs, and async processing are possible follow-ups.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Automatically extract thesis abstracts from uploaded PDF documents
    • Modal prompt allows students to review and accept extracted abstract suggestions
    • System intelligently preserves manually-entered abstracts while offering extracted alternatives

When a student uploads a proposal (proposal phase) or the thesis document
(writing phase, file type THESIS), the server extracts the abstract from the
PDF and fills Thesis.abstractField. Extraction is deterministic and in-process
via iText: locate the Abstract/Summary heading, bound it at the next section,
rebuild paragraphs from vertical gaps, and rejoin line-end hyphenation.

Correctness is gated by confidence so a wrong abstract is never stored:
- CONFIDENT  -> auto-fill the field (refreshing a previously auto-filled value),
               but never overwrite a human-edited abstract (tracked via the new
               abstract_source column).
- UNCERTAIN  -> store an editable suggestion (Use this / Edit / Dismiss banner).
- NONE       -> leave the field blank (manual entry, as before).

Extraction runs synchronously but is fully guarded: any failure (unreadable /
image-only / encrypted PDF) is a no-op and never breaks the upload.

Server: AbstractExtractor (core/utility), AbstractAutoFillService, new
abstract_source / abstract_suggestion columns (migration 40), accept/dismiss
suggestion endpoints; anonymization clears the suggestion.
Client: AbstractSuggestionBanner in ThesisInfoSection; IThesis fields;
manual saves mark the abstract MANUAL and clear the suggestion.

Tests (TDD): AbstractExtractor (6), AbstractAutoFillService (7), and
ThesisController accept/dismiss integration (3). Full server suite + client
build/tsc/lint green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 20, 2026 15:07

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@coderabbitai

coderabbitai Bot commented Jun 20, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@krusche, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 51 minutes and 14 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 76a6d006-e23d-44be-afe4-03676ada66ee

📥 Commits

Reviewing files that changed from the base of the PR and between 500c60e and 78c7b4f.

📒 Files selected for processing (3)
  • client/src/thesis/pages/ThesisPage/components/ThesisInfoSection/components/AbstractSuggestionModal/AbstractSuggestionModal.tsx
  • server/src/main/java/de/tum/cit/aet/thesis/thesis/service/AbstractAutoFillService.java
  • server/src/test/java/de/tum/cit/aet/thesis/thesis/service/AbstractAutoFillServiceTest.java

Walkthrough

Adds end-to-end PDF abstract auto-extraction: a new iText-based AbstractExtractor utility parses thesis PDFs and classifies results as CONFIDENT/UNCERTAIN/NONE. A new AbstractAutoFillService auto-fills or stages suggestions on file upload. Two new REST endpoints let students accept or dismiss suggestions. New DB columns, DTO fields, and a React modal complete the feature.

Changes

Abstract Auto-Extraction Feature

Layer / File(s) Summary
Data contracts: enum, entity, DTO, DB migration, client types
server/.../thesis/constants/ThesisAbstractSource.java, server/.../thesis/entity/Thesis.java, server/.../thesis/dto/ThesisDto.java, server/.../db/changelog/changes/40_abstract_extraction.sql, server/.../db/changelog/db.changelog-master.xml, client/src/thesis/requests/responses/thesis.ts
ThesisAbstractSource enum adds MANUAL/EXTRACTED; Thesis entity and ThesisDto record gain abstractSource and abstractSuggestion fields; Liquibase migration adds the two columns with a check constraint; IThesis client interface gains matching optional fields.
PDF abstract extractor (AbstractExtractor)
server/.../core/utility/AbstractExtractor.java
New 436-line stateless utility that scans up to MAX_PAGES, locates an English abstract heading, finds a section boundary, excludes footnote lines, reconstructs paragraphs with dehyphenation, serializes to escaped HTML, and returns a Result(Confidence, html).
AbstractAutoFillService and ThesisService upload integration
server/.../thesis/service/AbstractAutoFillService.java, server/.../thesis/service/ThesisService.java
New Spring service AbstractAutoFillService.process reads MultipartFile bytes, calls AbstractExtractor, and swallows exceptions. apply auto-fills abstractField for CONFIDENT results on blank abstracts; otherwise stages extracted HTML as abstractSuggestion. ThesisService injects the service and triggers it after proposal and THESIS-type uploads.
Accept/dismiss REST endpoints
server/.../thesis/controller/ThesisController.java, server/.../thesis/service/ThesisService.java, server/.../thesis/service/ThesisAnonymizationService.java
Two POST endpoints (/abstract-suggestion/accept, /abstract-suggestion/dismiss) enforce student-only access; acceptAbstractSuggestion promotes the suggestion into abstractField and sets source to EXTRACTED; dismissAbstractSuggestion sets source to MANUAL and clears the suggestion. Anonymization additionally nulls abstractSuggestion.
AbstractSuggestionModal component and ThesisInfoSection wiring
client/src/thesis/pages/ThesisPage/components/ThesisInfoSection/components/AbstractSuggestionModal/AbstractSuggestionModal.tsx, client/src/thesis/pages/ThesisPage/components/ThesisInfoSection/ThesisInfoSection.tsx
New Mantine Modal shows the extracted suggestion and optionally the current abstract in read-only editors with Keep current/Use extracted abstract buttons. ThesisInfoSection adds useThesisUpdateAction hooks for accept/dismiss, computes showSuggestion visibility, and renders the modal inside the accordion panel.
AbstractExtractor unit, benchmark, and corpus tests
server/src/test/.../core/utility/AbstractExtractorTest.java, server/src/test/.../core/utility/AbstractExtractionBenchmarkTest.java, server/src/test/.../core/utility/AbstractExtractorCorpusTest.java
AbstractExtractorTest builds one-page and multi-page iText PDFs to assert CONFIDENT/UNCERTAIN/NONE outcomes across heading variants, hyphen handling, footnote exclusion, and multi-page windows. AbstractExtractionBenchmarkTest adds a @TestFactory of synthetic layout scenarios. AbstractExtractorCorpusTest adds an optional directory-driven harness emitting a markdown extraction report.
Service, controller, and e2e tests
server/src/test/.../thesis/service/AbstractAutoFillServiceTest.java, server/src/test/.../thesis/service/ThesisServiceTest.java, server/src/test/.../thesis/controller/ThesisControllerTest.java, client/e2e/helpers.ts, client/e2e/thesis/abstract-extraction.spec.ts
AbstractAutoFillServiceTest verifies all confidence/source combinations and process with real and invalid PDFs. ThesisServiceTest adds the AbstractAutoFillService mock. ThesisControllerTest tests accept, dismiss, and 403 scenarios. Playwright e2e test uploads a generated PDF, confirms silent fill, edits manually, re-uploads to trigger the modal, accepts, and asserts the rejoined text renders correctly.

Sequence Diagram

sequenceDiagram
  actor Student
  participant ThesisInfoSection
  participant ThesisController
  participant AbstractAutoFillService
  participant AbstractExtractor
  participant ThesisRepository

  rect rgba(70, 130, 180, 0.5)
    note over ThesisController,ThesisRepository: File upload triggers extraction
    Student->>ThesisController: POST /upload (thesis PDF)
    ThesisController->>ThesisRepository: save file
    ThesisController->>AbstractAutoFillService: process(thesis, file)
    AbstractAutoFillService->>AbstractExtractor: extract(pdfBytes)
    AbstractExtractor-->>AbstractAutoFillService: Result(CONFIDENT, html)
    AbstractAutoFillService->>ThesisRepository: save thesis (abstractSuggestion or abstractField set)
  end

  rect rgba(60, 179, 113, 0.5)
    note over Student,ThesisRepository: Student resolves suggestion
    ThesisController-->>ThesisInfoSection: ThesisDto (abstractSuggestion present)
    ThesisInfoSection->>Student: show AbstractSuggestionModal
    Student->>ThesisInfoSection: click "Use extracted abstract"
    ThesisInfoSection->>ThesisController: POST /abstract-suggestion/accept
    ThesisController->>ThesisRepository: save (abstractField = suggestion, EXTRACTED, suggestion cleared)
    ThesisController-->>ThesisInfoSection: updated ThesisDto
  end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Suggested reviewers

  • Claudia-Anthropica
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main feature: automatic population of thesis abstracts from uploaded PDF documents.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/auto-extract-abstract

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (3)
client/src/thesis/pages/ThesisPage/components/ThesisInfoSection/components/AbstractSuggestionBanner/AbstractSuggestionBanner.tsx (1)

49-49: Use a named export instead of default export.

This component currently uses a default export, but the project guidelines prefer named exports for TS/React modules to improve consistency and maintainability.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@client/src/thesis/pages/ThesisPage/components/ThesisInfoSection/components/AbstractSuggestionBanner/AbstractSuggestionBanner.tsx`
at line 49, Convert the default export of AbstractSuggestionBanner to a named
export. Change the export statement from `export default
AbstractSuggestionBanner` to `export { AbstractSuggestionBanner }` or use named
export syntax directly on the component declaration. Additionally, update any
import statements throughout the codebase that import this component to use the
named import syntax (e.g., `import { AbstractSuggestionBanner }`) instead of the
default import syntax.

Source: Coding guidelines

server/src/test/java/de/tum/cit/aet/thesis/thesis/controller/ThesisControllerTest.java (1)

353-369: ⚡ Quick win

Dismiss endpoint test is missing the abstractSource invariant

Please assert that dismiss sets abstractSource to MANUAL; otherwise regressions in overwrite-protection behavior won’t be caught.

Proposed test assertion
 		Thesis updated = thesisRepository.findById(thesisId).orElseThrow();
 		assertThat(updated.getAbstractSuggestion()).isNull();
 		assertThat(updated.getAbstractField()).isEqualTo("<p>Existing abstract.</p>");
+		assertThat(updated.getAbstractSource()).isEqualTo(ThesisAbstractSource.MANUAL);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@server/src/test/java/de/tum/cit/aet/thesis/thesis/controller/ThesisControllerTest.java`
around lines 353 - 369, The dismissAbstractSuggestion_Success test method is
missing an assertion to verify that the abstractSource field is set to MANUAL
after dismissing the abstract suggestion. Add an assertion after the existing
assertions (after checking abstractField and abstractSuggestion on the updated
thesis object) to verify that updated.getAbstractSource() is set to the MANUAL
enum value, ensuring the overwrite-protection behavior is properly validated.
server/src/test/java/de/tum/cit/aet/thesis/thesis/service/AbstractAutoFillServiceTest.java (1)

3-5: Replace JUnit assertions with AssertJ in test class

This test uses assertEquals and assertNull from JUnit; project guidelines require AssertJ's assertThat for consistency across all test code.

Update imports and all assertions across lines 3–4, 42–45, 53–56, 64–67, 76–79, 88–91, 101–104, and 114–117.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@server/src/test/java/de/tum/cit/aet/thesis/thesis/service/AbstractAutoFillServiceTest.java`
around lines 3 - 5, Replace the JUnit assertion imports (assertEquals and
assertNull) with AssertJ's assertThat import from
org.assertj.core.api.Assertions. Then update all assertion calls throughout the
test class: convert each assertEquals(expected, actual) call to
assertThat(actual).isEqualTo(expected) and convert each assertNull(value) call
to assertThat(value).isNull(). This applies to all assertion statements across
the AbstractAutoFillServiceTest class to ensure consistency with project
guidelines.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@client/src/thesis/pages/ThesisPage/components/ThesisInfoSection/ThesisInfoSection.tsx`:
- Around line 84-87: In the onEditSuggestion function, the setAbstractText call
is assigning thesis.abstractSuggestion directly without handling that it is
optional in IThesis. Apply a string fallback using the nullish coalescing
operator by changing the setAbstractText call to use thesis.abstractSuggestion
?? '' to ensure a safe empty string value is always set when the suggestion is
undefined or null.

In
`@server/src/main/java/de/tum/cit/aet/thesis/thesis/controller/ThesisController.java`:
- Around line 296-299: The hasStudentAccess method used in the access control
checks is too permissive and grants access to supervisors and examiners as well.
Replace the hasStudentAccess check in the ThesisController endpoints (including
the abstract suggestion management endpoint around line 296-299 and the other
affected endpoint around line 318-320) with a strict student-role membership
verification that only allows users who are enrolled as students in the thesis,
explicitly excluding supervisor and examiner roles.

In `@server/src/main/java/de/tum/cit/aet/thesis/thesis/dto/ThesisDto.java`:
- Around line 116-117: The `thesis.getAbstractSuggestion()` field on line 117 is
always included in the serialization regardless of the access context, which
exposes pending extracted text outside of student-only flows. Gate this field by
checking the `studentAccess` variable/parameter: include
`thesis.getAbstractSuggestion()` only when `studentAccess` is true, otherwise
return `null` for this field in the DTO mapping.

In
`@server/src/main/java/de/tum/cit/aet/thesis/thesis/service/ThesisService.java`:
- Around line 424-429: The dismissAbstractSuggestion method clears the
abstractSuggestion field but does not clear the abstractSource field. If
abstractSource remains set to EXTRACTED, subsequent uploads can still
automatically overwrite the abstract, violating the dismiss semantics. In
addition to calling thesis.setAbstractSuggestion(null), also clear the
abstractSource field by setting it to null (or the appropriate cleared state) to
prevent future automatic overwrites of the abstract.
- Around line 407-408: Remove the `@Transactional` annotation from both the
acceptAbstractSuggestion() and dismissAbstractSuggestion() methods in the
ThesisService class. The project guideline requires relying on Spring Data's
per-repository-call transactions at the repository layer instead of managing
transactions at the service layer. Ensure both methods are designed to be
idempotent so that individual repository calls within each method can safely
manage their own transactional boundaries.

In `@server/src/main/resources/db/changelog/changes/40_abstract_extraction.sql`:
- Around line 9-13: The abstract_source column in the theses table lacks a
database-level CHECK constraint to enforce valid enum values. Add a CHECK
constraint to the ALTER TABLE statement that adds the abstract_source column to
ensure only 'MANUAL' or 'EXTRACTED' values can be inserted into this column.
This prevents invalid values from being inserted outside of JPA and causing read
failures when the application tries to deserialize the persisted enum value.

In
`@server/src/test/java/de/tum/cit/aet/thesis/core/utility/AbstractExtractorTest.java`:
- Around line 3-4: The AbstractExtractorTest class currently uses JUnit
assertions (assertEquals, assertTrue) instead of the required AssertJ
assertions. Replace the static imports from org.junit.jupiter.api.Assertions
with static imports from org.assertj.core.api.Assertions, specifically importing
assertThat. Then replace all assertEquals calls with
assertThat(...).isEqualTo(...) syntax, all assertTrue calls with
assertThat(...).isTrue() syntax, and any assertFalse calls with
assertThat(...).isFalse() syntax throughout the test class. This applies to all
assertion statements in the methods that test the extractor functionality.

---

Nitpick comments:
In
`@client/src/thesis/pages/ThesisPage/components/ThesisInfoSection/components/AbstractSuggestionBanner/AbstractSuggestionBanner.tsx`:
- Line 49: Convert the default export of AbstractSuggestionBanner to a named
export. Change the export statement from `export default
AbstractSuggestionBanner` to `export { AbstractSuggestionBanner }` or use named
export syntax directly on the component declaration. Additionally, update any
import statements throughout the codebase that import this component to use the
named import syntax (e.g., `import { AbstractSuggestionBanner }`) instead of the
default import syntax.

In
`@server/src/test/java/de/tum/cit/aet/thesis/thesis/controller/ThesisControllerTest.java`:
- Around line 353-369: The dismissAbstractSuggestion_Success test method is
missing an assertion to verify that the abstractSource field is set to MANUAL
after dismissing the abstract suggestion. Add an assertion after the existing
assertions (after checking abstractField and abstractSuggestion on the updated
thesis object) to verify that updated.getAbstractSource() is set to the MANUAL
enum value, ensuring the overwrite-protection behavior is properly validated.

In
`@server/src/test/java/de/tum/cit/aet/thesis/thesis/service/AbstractAutoFillServiceTest.java`:
- Around line 3-5: Replace the JUnit assertion imports (assertEquals and
assertNull) with AssertJ's assertThat import from
org.assertj.core.api.Assertions. Then update all assertion calls throughout the
test class: convert each assertEquals(expected, actual) call to
assertThat(actual).isEqualTo(expected) and convert each assertNull(value) call
to assertThat(value).isNull(). This applies to all assertion statements across
the AbstractAutoFillServiceTest class to ensure consistency with project
guidelines.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 472427c9-ceb8-494c-8f9d-a02c0630a437

📥 Commits

Reviewing files that changed from the base of the PR and between 54673e4 and b5dba10.

📒 Files selected for processing (17)
  • client/src/thesis/pages/ThesisPage/components/ThesisInfoSection/ThesisInfoSection.tsx
  • client/src/thesis/pages/ThesisPage/components/ThesisInfoSection/components/AbstractSuggestionBanner/AbstractSuggestionBanner.tsx
  • client/src/thesis/requests/responses/thesis.ts
  • server/src/main/java/de/tum/cit/aet/thesis/core/utility/AbstractExtractor.java
  • server/src/main/java/de/tum/cit/aet/thesis/thesis/constants/ThesisAbstractSource.java
  • server/src/main/java/de/tum/cit/aet/thesis/thesis/controller/ThesisController.java
  • server/src/main/java/de/tum/cit/aet/thesis/thesis/dto/ThesisDto.java
  • server/src/main/java/de/tum/cit/aet/thesis/thesis/entity/Thesis.java
  • server/src/main/java/de/tum/cit/aet/thesis/thesis/service/AbstractAutoFillService.java
  • server/src/main/java/de/tum/cit/aet/thesis/thesis/service/ThesisAnonymizationService.java
  • server/src/main/java/de/tum/cit/aet/thesis/thesis/service/ThesisService.java
  • server/src/main/resources/db/changelog/changes/40_abstract_extraction.sql
  • server/src/main/resources/db/changelog/db.changelog-master.xml
  • server/src/test/java/de/tum/cit/aet/thesis/core/utility/AbstractExtractorTest.java
  • server/src/test/java/de/tum/cit/aet/thesis/thesis/controller/ThesisControllerTest.java
  • server/src/test/java/de/tum/cit/aet/thesis/thesis/service/AbstractAutoFillServiceTest.java
  • server/src/test/java/de/tum/cit/aet/thesis/thesis/service/ThesisServiceTest.java

Comment thread server/src/main/java/de/tum/cit/aet/thesis/thesis/dto/ThesisDto.java Outdated
Comment thread server/src/main/java/de/tum/cit/aet/thesis/thesis/service/ThesisService.java Outdated
Adds a Playwright test that uploads a structured thesis PDF as the student and
verifies the abstract is populated with the extracted, de-hyphenated text
(accepting the suggestion banner when the abstract is non-empty). Adds a
createAbstractTestPdfBuffer() helper that builds a valid PDF with an Abstract
heading, a line-end hyphenation, and an Introduction boundary.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Edit suggestion prefills from abstractSuggestion (not abstractText)
- ThesisDto only exposes abstractSuggestion to student-access viewers
- Drop @transactional from accept/dismiss suggestion service methods
- Dismiss marks source MANUAL so a later upload re-suggests, not overwrites
- Add CHECK constraint on abstract_source (MANUAL, EXTRACTED)
- Convert new extractor/auto-fill tests to AssertJ

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Q9bHyAtxRLqMJzFayq8sYR
Validated against a 50-PDF corpus of real proposals and theses (kept out of
the repo). Theses went from 0 usable extractions to 22/25 confident; overall
46/50 confident, the rest correctly downgraded to a suggestion or left blank.

- Scan the first 12 pages, not 5: theses place the abstract around page 6
  (title, declaration, acknowledgements, then abstract), so a 5-page window
  never saw it.
- Normalize hyphen encodings: some thesis fonts decode their hyphen glyph as
  U+FFFD, and LaTeX uses soft hyphens (U+00AD) at line breaks; both are folded
  to a plain hyphen so de-hyphenation rejoins words correctly.
- Stop at the German "Zusammenfassung"/"Kurzfassung" heading: it follows the
  English abstract at body point size, so the font-size rule alone missed it
  and the abstract over-captured into German.
- Drop footnote definition lines (those starting with a superscript marker).
- Match section-numbered abstract headings such as "1. Abstract".
- Split paragraphs on a first-line indent, not only on a vertical gap.

Adds AbstractExtractorCorpusTest, a CI-safe evaluation harness that runs the
extractor over a local corpus (ABSTRACT_CORPUS_DIR) and writes a report; it
self-skips when the env var is unset.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Q9bHyAtxRLqMJzFayq8sYR
A committable, fully synthetic benchmark that reproduces every layout pattern
found in a corpus of real proposals and theses (no real student content):
abstract on page six behind thesis front matter, line-end de-hyphenation, a
same-size German "Zusammenfassung" boundary, footnote definition lines,
section-numbered headings, indented multi-paragraph splitting, over-length and
missing abstracts. Surfaced as named scenarios via a JUnit TestFactory so each
pattern reports independently. The real corpus stays local (gitignored) as a
validation oracle and is never committed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Q9bHyAtxRLqMJzFayq8sYR
A new upload now surfaces the extracted abstract in a confirmation modal
(Use extracted abstract / Keep current) directly after upload, so a student
can't overlook it, instead of an easily-missed inline banner.

- A blank abstract is still auto-filled silently when extraction is confident.
- Anything that would replace existing abstract text — or any uncertain
  extraction — is staged as a suggestion and confirmed via the modal; closing
  the modal keeps the current abstract.
- An extraction that matches the current abstract is ignored (no prompt).
- Server apply() reworked accordingly; AbstractSuggestionModal replaces
  AbstractSuggestionBanner; e2e covers auto-fill + modal confirm.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Q9bHyAtxRLqMJzFayq8sYR
CodeQL flagged the regex tag-strip in AbstractSuggestionModal as incomplete
sanitization. The value is only used to decide whether to show the current
abstract preview (never rendered), but parse the HTML and read its text content
instead — robust and not flagged. The parsed document is detached and inert.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Q9bHyAtxRLqMJzFayq8sYR

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/src/main/java/de/tum/cit/aet/thesis/thesis/service/AbstractAutoFillService.java (1)

33-35: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Include the exception object in warning logs.

At Line 34, logging only e.getMessage() drops stack trace context, which makes extraction failures harder to diagnose in production.

Proposed fix
-		} catch (Exception e) {
-			log.warn("Abstract extraction failed for thesis {}: {}", thesis.getId(), e.getMessage());
+		} catch (Exception e) {
+			log.warn("Abstract extraction failed for thesis {}", thesis.getId(), e);
 		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@server/src/main/java/de/tum/cit/aet/thesis/thesis/service/AbstractAutoFillService.java`
around lines 33 - 35, In the catch block of the exception handler within
AbstractAutoFillService.java, modify the log.warn call to include the full
exception object instead of just e.getMessage(). Replace e.getMessage() with the
exception object e itself as the last parameter to the log.warn call. This will
ensure the complete stack trace is logged, providing better diagnostic
information for debugging extraction failures in production.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@client/src/thesis/pages/ThesisPage/components/ThesisInfoSection/components/AbstractSuggestionModal/AbstractSuggestionModal.tsx`:
- Around line 25-30: The Modal component's onClose prop on the
AbstractSuggestionModal is calling onDeny unconditionally, which allows users to
dismiss the modal while a suggestion action is in flight. To fix this, modify
the onClose handler to only execute onDeny when the loading state is false. This
prevents conflicting operations by blocking the close action during pending API
requests. Apply the same fix to the other Modal instance mentioned in the file
at lines 50-56.

In
`@server/src/main/java/de/tum/cit/aet/thesis/thesis/service/AbstractAutoFillService.java`:
- Around line 52-60: The AbstractAutoFillService method has two early return
paths (when confidence is NONE and when extracted text is blank or identical to
current abstract) that leave stale abstractSuggestion data intact, potentially
displaying outdated modals. Before each early return statement in these no-op
branches, clear the abstractSuggestion field by setting it to null to ensure the
UI state stays consistent with the "no change, no prompt" behavior.
Specifically, add abstractSuggestion clearing logic before the first return at
the confidence NONE check and before the second return at the isBlank/sameText
validation check.

---

Outside diff comments:
In
`@server/src/main/java/de/tum/cit/aet/thesis/thesis/service/AbstractAutoFillService.java`:
- Around line 33-35: In the catch block of the exception handler within
AbstractAutoFillService.java, modify the log.warn call to include the full
exception object instead of just e.getMessage(). Replace e.getMessage() with the
exception object e itself as the last parameter to the log.warn call. This will
ensure the complete stack trace is logged, providing better diagnostic
information for debugging extraction failures in production.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4532f049-5214-4d44-84f1-8323b8281fe9

📥 Commits

Reviewing files that changed from the base of the PR and between 4d99474 and 500c60e.

📒 Files selected for processing (6)
  • client/e2e/thesis/abstract-extraction.spec.ts
  • client/src/thesis/pages/ThesisPage/components/ThesisInfoSection/ThesisInfoSection.tsx
  • client/src/thesis/pages/ThesisPage/components/ThesisInfoSection/components/AbstractSuggestionModal/AbstractSuggestionModal.tsx
  • server/src/main/java/de/tum/cit/aet/thesis/thesis/service/AbstractAutoFillService.java
  • server/src/test/java/de/tum/cit/aet/thesis/core/utility/AbstractExtractionBenchmarkTest.java
  • server/src/test/java/de/tum/cit/aet/thesis/thesis/service/AbstractAutoFillServiceTest.java
🚧 Files skipped from review as they are similar to previous changes (2)
  • client/e2e/thesis/abstract-extraction.spec.ts
  • server/src/test/java/de/tum/cit/aet/thesis/thesis/service/AbstractAutoFillServiceTest.java

- AbstractAutoFillService: clear abstractSuggestion on the no-op paths (NONE,
  or extraction equal to the current abstract) so a stale suggestion from an
  earlier upload can't keep an outdated modal alive.
- AbstractSuggestionModal: don't let a close-based deny (Escape / close button)
  fire while an accept/deny request is in flight, avoiding conflicting POSTs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Q9bHyAtxRLqMJzFayq8sYR
@krusche krusche merged commit 07e5336 into develop Jun 21, 2026
10 of 11 checks passed
@krusche krusche deleted the feature/auto-extract-abstract branch June 21, 2026 11:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants