refactor: replace pipeline trace with separate inputs and outputs#1392
refactor: replace pipeline trace with separate inputs and outputs#1392jakewheeler wants to merge 12 commits into
trace with separate inputs and outputs#1392Conversation
🔒 Security Scan Results
|
| Severity | Total |
|---|---|
| 🟠 High | 12 |
| 🟡 Medium | 3 |
📦 refiner-app
✅ No vulnerabilities found
📦 refiner-lambda
| Severity | Count |
|---|---|
| 🟠 High | 4 |
📦 refiner-ops
| Severity | Count |
|---|---|
| 🟠 High | 8 |
| 🟡 Medium | 3 |
View detailed results: Security tab
Last updated: 2026-06-22 18:59:50 UTC
| # NOTE: | ||
| # STAGE 2: REFINEMENT EXECUTION | ||
| # ============================================================================= | ||
| class RefinementException(Exception): |
There was a problem hiding this comment.
Is there a better place to put this? It is not used anywhere else.
There was a problem hiding this comment.
only other thought is to put it in app/core/exceptions.py (which we should probably make a decision to either use more consistently or remove entirely)
| # begins | ||
| # * callers (testing.py, lambda_function.py) set this when | ||
| # they resolve the RSG -> grouper mapping before calling | ||
| if trace.canonical_url is None: |
There was a problem hiding this comment.
context requires canonical_url so this is no longer possible.
trace with separate inputs and outputs
| The metrics calculated during the refinement process. | ||
| """ | ||
|
|
||
| eicr: RefinementMetricsEicr |
There was a problem hiding this comment.
could you say a little more about the value of bundling the eICR metrics into a wrapper like this is over just having a flat object?
There was a problem hiding this comment.
the other two objects make a lot of sense to me though: think we've broken the trace object up in a pretty intuitive way
| ) | ||
|
|
||
|
|
||
| def log_refinement_summary( |
There was a problem hiding this comment.
is it worth having this function still to just call it once in the lines above it? Seems like it's more common to have the orchestration code just call logger.info directly in other places
🔀 PULL REQUEST
Important
I reviewed the augmentation piece of refining and decided it's fine as it is. I focused on refactoring the input/outputs of the pipeline instead.
💡 Summary
This PR removes the concept of
RefinementTrace, which was an object that was partially built up to be used as an input, built up the rest of the way by the refinement process, and then provided back as an output. This can be somewhat confusing at times since it's difficult to tell if a value has been "collected" yet or not.In place of
RefinementTracethere is now a dedicated input (RefinementContext) and dedicated outputs (RefinementDocuments,RefinementMetrics, andRefinementReport).With these changes, it should now be very clear what entry point function
refine_for_conditionrequires as an input and what information is guaranteed to you by its output.⛑️ What does this change?
This is primarily just a refactor but does remove a few properties from the previous
traceobject that didn't provide value.Please take a look at the updated snapshots to see which properties were removed and ensure you agree with my decisions here. Anything I removed was due to us already being able to acquire this info in another way.
🔗 Related Issue
Fixes #1168
🧪 How to test