Skip to content

RPOPC-1347: Add UUID tracking fields to Chronicler schema#58

Merged
grdumas merged 3 commits into
mainfrom
feat/RPOPC-1347-uuid-fields
Jun 29, 2026
Merged

RPOPC-1347: Add UUID tracking fields to Chronicler schema#58
grdumas merged 3 commits into
mainfrom
feat/RPOPC-1347-uuid-fields

Conversation

@grdumas

@grdumas grdumas commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Summary

Add three UUID fields (project_uuid, run_uuid, result_uuid) to the Metadata dataclass in Chronicler schema to support Zathras UUID-based tracking for better result correlation across different indices and systems.

Acceptance Criteria

  • Add three UUID fields to the Metadata dataclass: project_uuid, run_uuid, result_uuid
  • All UUID fields are Optional[str] for backward compatibility
  • UUIDs are included in to_dict() method output when present
  • UUIDs are excluded from calculate_content_hash() (they're identifiers, not content)
  • Verify schema serialization/deserialization with UUIDs present and absent
  • Confirm backward compatibility with documents lacking UUID fields
  • Update test_schema.py with UUID test cases

Changes

  • Added project_uuid, run_uuid, result_uuid fields to Metadata dataclass (all Optional[str])
  • Updated calculate_content_hash() to exclude UUIDs from hash calculation
  • Added comprehensive unit tests for UUID serialization and backward compatibility
  • Updated docstring to document UUID exclusion from content hash

Testing

  • Unit tests added for UUID serialization (present and absent)
  • Unit tests added for UUID exclusion from content hash
  • All 43 schema tests passing
  • Backward compatibility verified (existing tests still pass)

Technical Notes

  • UUIDs use Optional[str] type for backward compatibility
  • Existing to_dict() method already handles Optional fields correctly
  • UUIDs are excluded from content hash because they are identifiers for tracking/linking, not part of the actual test result content
  • This ensures duplicate detection is based on test data, not tracking metadata

Related

Agent VM and others added 2 commits June 29, 2026 18:56
- Add failing tests for project_uuid, run_uuid, result_uuid fields
- Implement three optional UUID fields in Metadata dataclass
- Verify serialization includes UUIDs when present
- Verify backward compatibility when UUIDs are None

Part of RPOPC-1347. Supports Zathras UUID-based tracking for better
result correlation across different indices and systems.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add failing test for UUID exclusion from content hash
- Implement UUID exclusion in calculate_content_hash()
- Update docstring to document UUID exclusion

UUIDs are identifiers for tracking/linking, not part of the actual
test result content. Excluding them ensures duplicate detection is
based on test data, not tracking metadata.

Part of RPOPC-1347.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@grdumas, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 42 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: c53d6d56-c9a6-4612-a774-2ad29afe42d7

📥 Commits

Reviewing files that changed from the base of the PR and between a823b10 and 122482a.

📒 Files selected for processing (2)
  • src/chronicler/schema.py
  • tests/test_schema.py
📝 Walkthrough

Walkthrough

Three optional UUID fields (project_uuid, run_uuid, result_uuid) are added to Metadata. ZathrasDocument.calculate_content_hash is updated to strip these fields before SHA256 computation. Tests verify serialization presence/absence and hash stability across differing UUIDs.

Changes

UUID Fields and Hash Exclusion

Layer / File(s) Summary
Metadata UUID fields and hash exclusion logic
src/chronicler/schema.py
Adds project_uuid, run_uuid, result_uuid as optional fields on Metadata; updates calculate_content_hash docstring and removes these fields from doc_dict['metadata'] before JSON serialization and SHA256 hashing.
Tests for UUID serialization and hash stability
tests/test_schema.py
Adds to_dict() tests asserting UUID fields serialize when set and are omitted when absent; adds calculate_content_hash test confirming differing UUID values produce identical hashes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly matches the main change: adding UUID tracking fields to the Chronicler schema.
Description check ✅ Passed The description is directly about the schema UUID fields and hash behavior changed in this PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/chronicler/schema.py`:
- Line 397: The hash contract in calculate_content_hash and the related schema
logic is inconsistent because UUIDs are only removed when
exclude_processing_timestamp is true. Update the UUID exclusion path in
src/chronicler/schema.py so project_uuid, run_uuid, and result_uuid are always
omitted from the hashed content, regardless of the timestamp flag, and keep the
timestamp handling separate from UUID filtering. Verify the normalization used
by calculate_content_hash matches the documented behavior and the constants or
helpers that build the content payload.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: b4135823-2f91-4250-86f9-1aa908b5bb7e

📥 Commits

Reviewing files that changed from the base of the PR and between 1e1ae48 and a823b10.

📒 Files selected for processing (2)
  • src/chronicler/schema.py
  • tests/test_schema.py

Comment thread src/chronicler/schema.py

@grdumas grdumas left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: RPOPC-1347: Add UUID tracking fields to Chronicler schema

Summary

PR adds three optional UUID fields (project_uuid, run_uuid, result_uuid) to the Metadata dataclass for Zathras integration. Implementation is correct with proper backward compatibility and content hash exclusion. No blocking issues found.

Critical Issues (MUST FIX)

None found.

Major Issues (SHOULD FIX)

None found.

Minor Issues (NICE TO HAVE)

None found.

Nitpicks (OPTIONAL)

None found.

Positive Notes

  • Proper use of Optional[str] for backward compatibility with existing documents
  • Correct exclusion of UUIDs from content hash calculation (identifiers vs content)
  • Comprehensive test coverage including serialization, backward compatibility, and hash exclusion
  • Clear documentation in docstring explaining why UUIDs are excluded from hash
  • Test assertions verify both presence and absence of UUID fields
  • Consistent with existing field patterns (pcp_archive_url follows same Optional approach)

Overall Assessment

  • Status: APPROVE
  • Reasoning: Clean implementation with appropriate type safety, backward compatibility, and test coverage. UUIDs are correctly treated as opaque identifiers rather than validated content, which is appropriate for external tracking metadata from Zathras.
  • Next Steps: Ready to merge

Reviewed by: Claude Sonnet 4.5 via automated code review

@grdumas grdumas self-assigned this Jun 29, 2026
@grdumas grdumas added the enhancement New feature or request label Jun 29, 2026
Restructured calculate_content_hash() to always exclude UUIDs,
document_id, and pcp_archive_url regardless of the
exclude_processing_timestamp parameter. This ensures the
implementation matches the documented contract.

Previously, UUIDs were only excluded when exclude_processing_timestamp
was True, which violated the docstring promise that "UUIDs are always
excluded" from the hash.

- Moved UUID/document_id/pcp_archive_url removal outside conditional
- Kept timestamp removal conditional as intended
- Added test verifying UUIDs excluded with timestamp flag False

Addresses review feedback on PR #58.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@grdumas

grdumas commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator Author

PR Update: Addressed Review Feedback

What was done

  1. Fixed UUID exclusion logic (commit 122482a)
    • Restructured calculate_content_hash() to separate timestamp handling from UUID/document_id/pcp_archive_url exclusion
    • UUIDs, document_id, and pcp_archive_url are now always excluded from hash calculation, regardless of the exclude_processing_timestamp parameter
    • Previously these fields were only excluded when exclude_processing_timestamp=True, which violated the documented contract

Why this approach

The original implementation had an inconsistency: the docstring promised "UUIDs are always excluded" from the hash, but the code only excluded them conditionally. This created a contract violation where calling calculate_content_hash(exclude_processing_timestamp=False) would include UUIDs in the hash.

The fix separates two distinct concerns:

  • Timestamp exclusion: Conditional (controlled by the exclude_processing_timestamp flag)
  • Identifier exclusion: Unconditional (document_id, pcp_archive_url, and UUIDs are always omitted)

This makes the code match its documentation and provides more predictable behavior - identifiers never affect the content hash, while timestamps can optionally be included.

Verification

Added test test_calculate_content_hash_excludes_uuids_regardless_of_timestamp_flag to verify UUIDs are excluded even when exclude_processing_timestamp=False:

# Same content, different UUIDs, timestamp flag False - UUIDs still excluded
hash1 = doc1.calculate_content_hash(exclude_processing_timestamp=False)
hash2 = doc2.calculate_content_hash(exclude_processing_timestamp=False)
assert hash1 == hash2  # ✅ Passes

All 44 schema tests pass.

The PR is now ready for re-review.


Responded by: Claude Sonnet 4.5 via automated workflow

@grdumas grdumas left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: RPOPC-1347: Add UUID tracking fields to Chronicler schema (Updated)

Summary

Updated PR adds three optional UUID fields (project_uuid, run_uuid, result_uuid) to the Metadata dataclass for Zathras integration. Implementation includes a correctness fix for calculate_content_hash() that makes document_id and pcp_archive_url exclusion independent of the timestamp flag, matching the documented behavior. No blocking issues found.

Critical Issues (MUST FIX)

None found.

Major Issues (SHOULD FIX)

None found.

Minor Issues (NICE TO HAVE)

  • File: src/chronicler/schema.py:412-425
    Consideration: The refactored hash calculation logic is a behavior change for callers using exclude_processing_timestamp=False. Previously, document_id and pcp_archive_url were only excluded when the flag was True. Now they're always excluded.
    Impact: This is actually a bug fix (implementation now matches docstring), but it's a breaking change for the edge case where someone passes False. The only known caller (base_processor.py:143) uses the default True, so production impact is minimal.
    Suggestion: Consider documenting this behavior change in the PR description or commit message for future reference. The existing commit message for 122482a already does this well.

Nitpicks (OPTIONAL)

None found.

Positive Notes

  • Excellent test coverage including the edge case with exclude_processing_timestamp=False
  • Correct identification and fix of the document_id/pcp_archive_url exclusion bug
  • Clear commit message in 122482a explaining the rationale for the refactoring
  • Comments in the code clearly explain the "always remove" vs "conditional remove" distinction
  • Test name "test_calculate_content_hash_excludes_uuids_regardless_of_timestamp_flag" is descriptive
  • Implementation now correctly matches the documented contract in the docstring

Overall Assessment

  • Status: APPROVE
  • Reasoning: Clean implementation with appropriate bug fixes. The hash calculation refactoring corrects a long-standing bug where the implementation didn't match the documented behavior. UUID fields are properly added with backward compatibility, comprehensive test coverage, and correct hash exclusion logic. The behavioral change for exclude_processing_timestamp=False is actually a correctness improvement.
  • Next Steps: Ready to merge

Reviewed by: Claude Sonnet 4.5 via automated code review

@grdumas grdumas merged commit e2b25ca into main Jun 29, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant