feat(api): cursor-based pagination for GET /flows/{flow_id}/runs (#322)#463
feat(api): cursor-based pagination for GET /flows/{flow_id}/runs (#322)#463GunaPalanivel wants to merge 2 commits intoNetflix:masterfrom
Conversation
Introduce RFC 5988 cursor-based pagination using run_number as the cursor field. Responses include Link, X-Total-Count, and X-Pagination-Limit headers. Legacy _limit=0 behavior is preserved for backward compatibility. - Add get_all_runs_paginated() with run_number DESC ordering - Add count_records() for efficient X-Total-Count without full fetch - Harden input validation: reject negative _limit and non-positive _after - Validate column names in count_records against self.keys - Add 5-case integration test covering multi-page iteration, legacy opt-out, invalid params, and empty flows - Make init_db gracefully fall back to direct table check when goose binary is unavailable (supports local development without Docker) Fixes Netflix#322
|
@romain-intel could you trigger CI when you get a chance? All integration tests pass locally (8 passed, 0 regressions). |
| ) | ||
| return response | ||
|
|
||
| async def count_records(self, filter_dict=None) -> int: |
There was a problem hiding this comment.
any performance concerns with the COUNT approach for providing pagination data? there is a plan to introduce filtering based on tags as well, which would require matching values in the JSONB tags column.
There was a problem hiding this comment.
In this PR X-Total-Count uses a COUNT(*) scoped to flow_id, which is acceptable for the current query shape. For future tag-based JSONB filtering, I agree we should revisit with a dedicated strategy (likely GIN index plus possibly different total-count behavior under tag filters). I documented this trade-off in code and kept it out of scope for this PR.
There was a problem hiding this comment.
Is it not necessary to add pagination to anything besides runs?
There was a problem hiding this comment.
I scoped this PR to runs because runs are the unbounded-growth endpoint and the one currently at risk for oversized responses. Steps/tasks have different bounded patterns. I kept the implementation reusable so adding pagination to other endpoints in follow-up PRs is straightforward.
| after_run_number = None | ||
|
|
||
| # --- Legacy opt-out (_limit=0): unbounded query, no headers ------- | ||
| if limit == 0: |
There was a problem hiding this comment.
how do we ensure backwards compatibility here? it seems the limit is never 0 if the client does not explicitly set it to the value.
There was a problem hiding this comment.
You are right. _limit=0 is not a true backward-compat path unless explicitly sent. I updated behavior so compatibility is preserved when _limit is omitted (legacy unbounded path). I removed _limit=0 opt-out and now require positive _limit when pagination is requested; tests were updated accordingly.
|
Nice approach. Using run_number as the cursor key is actually cleaner for ordering stability than ts_epoch since it's a monotonic PK with no tie risk. I went with ts_epoch on the GSoC fork (saikonen/metaflow-service PR #9) because it's available on all six table types (flows, steps, tasks, artifacts, metadata all have ts_epoch but not all have a sequential PK equivalent). Tradeoff is needing a compound cursor for tiebreaking which I raised in Issue #11 on the GSoC fork. Curious if you're planning to extend beyond runs to the other endpoints? |
Description
500error encountered onflows/<flow_id>/runsrequests due to size of payload #322This PR adds cursor-based pagination to
GET /flows/{flow_id}/runsusingrun_numberas the cursor.Response body shape is unchanged (still a flat array of runs).
When pagination is enabled, metadata is returned through headers:
Linkwithrel="next"(RFC 5988 style)X-Total-CountX-Pagination-LimitBackward Compatibility
Backward compatibility is preserved by default:
_limitis omitted, endpoint keeps legacy behavior and returns all runs without pagination headers._limitis provided, it must be a positive integer and pagination is applied._limit=0is now treated as invalid (400) to avoid ambiguous semantics.Also,
_afteris only valid when_limitis provided.Problem
/flows/{flow_id}/runscurrently returns all runs in one unbounded response.For large flows this can lead to:
Design
Cursor Choice
run_numberis monotonic and stable for pagination.run_number DESC(newest first)Linkis emitted only when current page is full (possible next page)Count Behavior and Trade-off
X-Total-Countis produced with a separateCOUNT(*)query scoped byflow_id.X-Total-Countis best-effort metadata, not a strict transactional guarantee under concurrent writes.Validation and Errors
Returns
400for invalid pagination inputs:_limit_limit <= 0_after_after <= 0_afterprovided without_limitTests
Integration coverage includes:
_limit-> legacy unbounded behavior, no pagination headers_limit=2-> multi-page traversal withLinknext and no duplicates400(including_limit=0)200with empty arrayScope
This PR is intentionally scoped to runs endpoint pagination only.
Runs are the endpoint with unbounded growth and immediate payload-risk; other resources can be handled in follow-ups if needed.
AI Tool Usage
AI assistance was used during implementation.
All changes were reviewed manually, validated locally, and adjusted based on reviewer feedback.