Skip to content

refactor: replace pipeline trace with separate inputs and outputs#1392

Open
jakewheeler wants to merge 12 commits into
mainfrom
jw/1168
Open

refactor: replace pipeline trace with separate inputs and outputs#1392
jakewheeler wants to merge 12 commits into
mainfrom
jw/1168

Conversation

@jakewheeler

@jakewheeler jakewheeler commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

🔀 PULL REQUEST

Important

I reviewed the augmentation piece of refining and decided it's fine as it is. I focused on refactoring the input/outputs of the pipeline instead.

💡 Summary

This PR removes the concept of RefinementTrace, which was an object that was partially built up to be used as an input, built up the rest of the way by the refinement process, and then provided back as an output. This can be somewhat confusing at times since it's difficult to tell if a value has been "collected" yet or not.

In place of RefinementTrace there is now a dedicated input (RefinementContext) and dedicated outputs (RefinementDocuments, RefinementMetrics, and RefinementReport).

With these changes, it should now be very clear what entry point function refine_for_condition requires as an input and what information is guaranteed to you by its output.

⛑️ What does this change?

This is primarily just a refactor but does remove a few properties from the previous trace object that didn't provide value.

Please take a look at the updated snapshots to see which properties were removed and ensure you agree with my decisions here. Anything I removed was due to us already being able to acquire this info in another way.

🔗 Related Issue

Fixes #1168

🧪 How to test

  • Refining continues to work as it did before
    • This includes both from Lambda and within the app
  • Automated tests continue to pass

@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

🔒 Security Scan Results

⚠️ Found 15 vulnerabilities

Severity Total
🟠 High 12
🟡 Medium 3

📦 refiner-app

No vulnerabilities found

📦 refiner-lambda

Severity Count
🟠 High 4

📦 refiner-ops

Severity Count
🟠 High 8
🟡 Medium 3

View detailed results: Security tab
Last updated: 2026-06-22 18:59:50 UTC

# NOTE:
# STAGE 2: REFINEMENT EXECUTION
# =============================================================================
class RefinementException(Exception):

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better place to put this? It is not used anywhere else.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only other thought is to put it in app/core/exceptions.py (which we should probably make a decision to either use more consistently or remove entirely)

# begins
# * callers (testing.py, lambda_function.py) set this when
# they resolve the RSG -> grouper mapping before calling
if trace.canonical_url is None:

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context requires canonical_url so this is no longer possible.

@jakewheeler jakewheeler changed the title WIP: refactor pipeline refactor: replace pipeline trace with separate inputs and outputs Jun 22, 2026
@jakewheeler jakewheeler marked this pull request as ready for review June 22, 2026 19:03
@fzhao99 fzhao99 self-assigned this Jun 22, 2026
The metrics calculated during the refinement process.
"""

eicr: RefinementMetricsEicr

@fzhao99 fzhao99 Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you say a little more about the value of bundling the eICR metrics into a wrapper like this is over just having a flat object?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other two objects make a lot of sense to me though: think we've broken the trace object up in a pretty intuitive way

)


def log_refinement_summary(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth having this function still to just call it once in the lines above it? Seems like it's more common to have the orchestration code just call logger.info directly in other places

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[TASK] Refactor augmentation context class into a separate module

2 participants