harness: Add extract/chunk observability artifacts#1585
Open
jioffe502 wants to merge 1 commit intoNVIDIA:mainfrom
Open
harness: Add extract/chunk observability artifacts#1585jioffe502 wants to merge 1 commit intoNVIDIA:mainfrom
jioffe502 wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
- wire extract/chunk dump paths through harness command and results - add lightweight JSONL snapshot helper with optional durable mirror - expand focused harness and batch tests for observability contract Signed-off-by: Jacob Ioffe <jioffe@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This adds harness-visible extract and chunk artifact dumps with the smallest viable wiring on top of current retriever behavior. It keeps LanceDB/recall semantics intact and avoids broader schema/runtime changes.
Adds harness config + command plumbing for extract/chunk artifact paths and records them in results.json (including ingest_errors.json path)
Adds a lightweight JSONL snapshot helper and wires dumps at two seams only: post-extract and pre-embed
Expands focused harness/batch tests for config validation, artifact path reporting, and snapshot stage wiring
Observability artifacts (extracts/chunks)
This change adds optional harness observability artifacts for semantic inspection and debugging:
extract_artifacts_dir) and pre-embed chunks (chunk_manifest_dir)ingest_errors_filepath captured inresults.jsonobservability_archive_diris configuredUsage (harness config):
write_extract_artifacts: truewrite_chunk_manifest: trueobservability_archive_dir: null(or durable root path)This is intentionally minimal wiring on top of existing behavior; no LanceDB schema migration or broader runtime changes are introduced.
How to run and inspect artifacts
Artifacts are written under:
nemo_retriever/artifacts/<run_name>_<timestamp>/Check paths in results.json:
Then inspect shards:
Checklist