Skip to content

Commit 5541dd6

Browse files
authored
Flesh out RAG system (#6197)
# Description of Changes Flesh out the RAG system and connect it to the PDF Question Agent so it can respond to questions about PDFs of an extremely large size. I'd expect lots more work will need to be done to finish off the RAG system to really be what we need, but this should be a reasonable start which will let us connect it to tools and have the ingestion mostly handled automatically. I'm leaving file deletion and proper file ID management to be done in a future PR. We also need to consider whether all tools should retrieve content exclusively via RAG, or whether it's beneficial to have tools sometimes fetch the direct content and other times fetch it from RAG. A diagram of the expected interaction is as follows: ```mermaid sequenceDiagram autonumber actor U as User participant FE as Frontend<br/>(ChatPanel) participant J as Java<br/>(AiWorkflowService) participant O as Engine:<br/>OrchestratorAgent participant QA as Engine:<br/>PdfQuestionAgent participant RAG as Engine:<br/>RagService + SqliteVecStore participant V as VoyageAI<br/>(embeddings) participant L as LLM<br/>(Claude / etc.) U->>FE: types "Summarise this PDF"<br/>(PDF already uploaded) FE->>J: POST /api/v1/ai/orchestrate/stream<br/>multipart: fileInputs[], userMessage Note over J: ByteHashFileIdStrategy<br/>id = sha256(bytes)[:16] J->>O: POST /api/v1/orchestrator<br/>{ files:[{id,name}], userMessage } O->>L: route via fast model L-->>O: delegate_pdf_question O->>QA: PdfQuestionRequest loop for each file QA->>RAG: has_collection(file.id) RAG-->>QA: false end QA-->>O: NeedIngestResponse(files_to_ingest) O-->>J: { outcome:"need_ingest", filesToIngest:[...] } Note over J: onNeedIngest loop per file J->>J: PDFBox: extract page text J->>O: POST /api/v1/rag/documents<br/>(long-running timeout) O->>RAG: chunk + stage documents O->>V: embed_documents (batches of 256) V-->>O: embeddings O->>RAG: add_documents O-->>J: { chunks_indexed: N } end Note over J: retry with resumeWith=pdf_question J->>O: POST /api/v1/orchestrator Note over O: fast-path to PdfQuestionAgent O->>QA: PdfQuestionRequest Note over QA: build RagCapability<br/>pinned to file IDs QA->>L: run(prompt) with search_knowledge tool loop up to max_searches L->>QA: search_knowledge(query) QA->>V: embed_query V-->>QA: query vector QA->>RAG: search(vector, collections=[file.id]) RAG-->>QA: top-k chunks QA-->>L: formatted chunks end Note over QA: once budget spent,<br/>prepare() hides the tool L-->>QA: PdfQuestionAnswerResponse QA-->>O: answer O-->>J: { outcome:"answer", answer, evidence } J-->>FE: SSE "result" FE->>U: assistant bubble ```
1 parent 5605062 commit 5541dd6

48 files changed

Lines changed: 1057 additions & 524 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

app/common/src/main/java/stirling/software/common/model/ApplicationProperties.java

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -237,6 +237,13 @@ public static class AiEngine {
237237
private boolean enabled = false;
238238
private String url = "http://localhost:5001";
239239
private int timeoutSeconds = 120;
240+
241+
/**
242+
* Longer timeout for heavy operations like RAG ingestion, which embeds the whole document
243+
* and can take multiple minutes for large books. Applied per-call when the caller
244+
* explicitly requests it via {@code AiEngineClient.postWithTimeout}.
245+
*/
246+
private int longRunningTimeoutSeconds = 600;
240247
}
241248

242249
@Data
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
package stirling.software.proprietary.model.api.ai;
2+
3+
import io.swagger.v3.oas.annotations.media.Schema;
4+
5+
import lombok.AllArgsConstructor;
6+
import lombok.Data;
7+
import lombok.NoArgsConstructor;
8+
9+
/**
10+
* A file supplied to the AI engine, identified by a stable opaque id plus a display name.
11+
*
12+
* <p>Values MUST match {@code AiFile} in {@code engine/src/stirling/contracts/common.py}.
13+
*/
14+
@Data
15+
@NoArgsConstructor
16+
@AllArgsConstructor
17+
@Schema(description = "File reference sent to the AI engine")
18+
public class AiFile {
19+
20+
@Schema(
21+
description =
22+
"Opaque, stable identifier. Owned by Java; used as the RAG collection key.")
23+
private String id;
24+
25+
@Schema(description = "Original filename, used by agents in user-facing prompts and responses.")
26+
private String name;
27+
}
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
package stirling.software.proprietary.model.api.ai;
2+
3+
import java.util.List;
4+
5+
import lombok.AllArgsConstructor;
6+
import lombok.Data;
7+
import lombok.NoArgsConstructor;
8+
9+
/**
10+
* Body for {@code POST /api/v1/rag/documents} on the AI engine. Sent by Java when the engine
11+
* reports {@code need_ingest} and the requested document's extracted content must be stored before
12+
* the workflow can continue.
13+
*/
14+
@Data
15+
@NoArgsConstructor
16+
@AllArgsConstructor
17+
public class AiRagIngestRequest {
18+
19+
private String documentId;
20+
21+
private String source;
22+
23+
private List<AiRagPageText> pageText;
24+
}
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
package stirling.software.proprietary.model.api.ai;
2+
3+
import lombok.AllArgsConstructor;
4+
import lombok.Data;
5+
import lombok.NoArgsConstructor;
6+
7+
/** A single page of extracted text for RAG ingest requests. */
8+
@Data
9+
@NoArgsConstructor
10+
@AllArgsConstructor
11+
public class AiRagPageText {
12+
13+
private int pageNumber;
14+
15+
private String text;
16+
}

app/proprietary/src/main/java/stirling/software/proprietary/model/api/ai/AiWorkflowEditPlan.java

Lines changed: 0 additions & 35 deletions
This file was deleted.

app/proprietary/src/main/java/stirling/software/proprietary/model/api/ai/AiWorkflowFileRequest.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,8 @@
1111
@Schema(description = "Per-file content extraction request from the AI engine")
1212
public class AiWorkflowFileRequest {
1313

14-
@Schema(description = "Original filename of the requested file", example = "contract.pdf")
15-
private String fileName;
14+
@Schema(description = "The file the engine wants content extracted for")
15+
private AiFile file;
1616

1717
@Schema(description = "Specific 1-based page numbers to extract from this file")
1818
private List<Integer> pageNumbers = new ArrayList<>();

app/proprietary/src/main/java/stirling/software/proprietary/model/api/ai/AiWorkflowOutcome.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ public enum AiWorkflowOutcome {
1212
ANSWER("answer"),
1313
NOT_FOUND("not_found"),
1414
NEED_CONTENT("need_content"),
15+
NEED_INGEST("need_ingest"),
1516
PLAN("plan"),
1617
NEED_CLARIFICATION("need_clarification"),
1718
CANNOT_DO("cannot_do"),

app/proprietary/src/main/java/stirling/software/proprietary/model/api/ai/AiWorkflowResponse.java

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,12 @@ public class AiWorkflowResponse {
7373
@Schema(description = "Per-file text extraction requests from the AI engine")
7474
private List<AiWorkflowFileRequest> files = new ArrayList<>();
7575

76+
@Schema(
77+
description =
78+
"Files the AI engine requires to be ingested into RAG before it can continue"
79+
+ " the workflow. Populated on need_ingest outcomes.")
80+
private List<AiFile> filesToIngest = new ArrayList<>();
81+
7682
@Schema(description = "Maximum number of pages the AI engine wants text extracted from")
7783
private Integer maxPages;
7884

@@ -89,11 +95,4 @@ public class AiWorkflowResponse {
8995
+ " body or via the X-Stirling-Tool-Report header. May be null for tools"
9096
+ " that produce only a file.")
9197
private JsonNode report;
92-
93-
@Schema(
94-
description =
95-
"Optional plan attached to an answer outcome. When non-null on outcome=ANSWER,"
96-
+ " run the plan steps before delivering the answer; the resumed call"
97-
+ " produces the real answer.")
98-
private AiWorkflowEditPlan editPlan;
9998
}

app/proprietary/src/main/java/stirling/software/proprietary/service/AiEngineClient.java

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,20 +43,36 @@ public AiEngineClient(ApplicationProperties applicationProperties) {
4343

4444
public String post(String path, String jsonBody) throws IOException {
4545
ApplicationProperties.AiEngine config = applicationProperties.getAiEngine();
46+
return postWithTimeout(path, jsonBody, Duration.ofSeconds(config.getTimeoutSeconds()));
47+
}
48+
49+
/**
50+
* POST with an explicit per-call timeout, for heavy operations (e.g. RAG ingestion of a large
51+
* document) that legitimately take longer than the default timeout.
52+
*/
53+
public String postLongRunning(String path, String jsonBody) throws IOException {
54+
ApplicationProperties.AiEngine config = applicationProperties.getAiEngine();
55+
return postWithTimeout(
56+
path, jsonBody, Duration.ofSeconds(config.getLongRunningTimeoutSeconds()));
57+
}
58+
59+
private String postWithTimeout(String path, String jsonBody, Duration timeout)
60+
throws IOException {
61+
ApplicationProperties.AiEngine config = applicationProperties.getAiEngine();
4662
if (!config.isEnabled()) {
4763
throw new ResponseStatusException(
4864
HttpStatus.SERVICE_UNAVAILABLE, "AI engine is not enabled");
4965
}
5066

5167
String url = config.getUrl().stripTrailing() + path;
52-
log.debug("Proxying AI engine request to {}", url);
68+
log.debug("Proxying AI engine request to {} (timeout {}s)", url, timeout.toSeconds());
5369

5470
HttpRequest request =
5571
HttpRequest.newBuilder()
5672
.uri(URI.create(url))
5773
.header("Content-Type", "application/json")
5874
.header("Accept", "application/json")
59-
.timeout(Duration.ofSeconds(config.getTimeoutSeconds()))
75+
.timeout(timeout)
6076
.POST(HttpRequest.BodyPublishers.ofString(jsonBody))
6177
.build();
6278

0 commit comments

Comments
 (0)