feat(api): cursor-based pagination for GET /flows/{flow_id}/runs (#322) by GunaPalanivel · Pull Request #463 · Netflix/metaflow-service

GunaPalanivel · 2026-02-20T22:09:59Z

Description

Fixes 500 error encountered on flows/<flow_id>/runs requests due to size of payload #322

This PR adds cursor-based pagination to GET /flows/{flow_id}/runs using run_number as the cursor.

Response body shape is unchanged (still a flat array of runs).
When pagination is enabled, metadata is returned through headers:

Link with rel="next" (RFC 5988 style)
X-Total-Count
X-Pagination-Limit

Backward Compatibility

Backward compatibility is preserved by default:

If _limit is omitted, endpoint keeps legacy behavior and returns all runs without pagination headers.
If _limit is provided, it must be a positive integer and pagination is applied.
_limit=0 is now treated as invalid (400) to avoid ambiguous semantics.

Also, _after is only valid when _limit is provided.

Problem

/flows/{flow_id}/runs currently returns all runs in one unbounded response.
For large flows this can lead to:

API Gateway payload-size failures
high latency
unnecessary memory pressure
no way to consume results incrementally

Design

Cursor Choice

run_number is monotonic and stable for pagination.

order is run_number DESC (newest first)
next cursor is the last run_number from the current page
Link is emitted only when current page is full (possible next page)

Count Behavior and Trade-off

X-Total-Count is produced with a separate COUNT(*) query scoped by flow_id.

For current query shape, this is acceptable and index-backed.
For future tag-based JSONB filters, we will need to revisit strategy (for example GIN index and/or different count behavior when tag filters are active).

X-Total-Count is best-effort metadata, not a strict transactional guarantee under concurrent writes.

Validation and Errors

Returns 400 for invalid pagination inputs:

non-integer _limit
_limit <= 0
non-integer _after
_after <= 0
_after provided without _limit

Tests

Integration coverage includes:

omitted _limit -> legacy unbounded behavior, no pagination headers
_limit=2 -> multi-page traversal with Link next and no duplicates
ordering remains newest-first
invalid params return 400 (including _limit=0)
non-existent flow returns 200 with empty array

Scope

This PR is intentionally scoped to runs endpoint pagination only.
Runs are the endpoint with unbounded growth and immediate payload-risk; other resources can be handled in follow-ups if needed.

AI Tool Usage

AI assistance was used during implementation.
All changes were reviewed manually, validated locally, and adjusted based on reviewer feedback.

Introduce RFC 5988 cursor-based pagination using run_number as the cursor field. Responses include Link, X-Total-Count, and X-Pagination-Limit headers. Legacy _limit=0 behavior is preserved for backward compatibility. - Add get_all_runs_paginated() with run_number DESC ordering - Add count_records() for efficient X-Total-Count without full fetch - Harden input validation: reject negative _limit and non-positive _after - Validate column names in count_records against self.keys - Add 5-case integration test covering multi-page iteration, legacy opt-out, invalid params, and empty flows - Make init_db gracefully fall back to direct table check when goose binary is unavailable (supports local development without Docker) Fixes Netflix#322

GunaPalanivel · 2026-02-24T06:41:29Z

@romain-intel could you trigger CI when you get a chance?

All integration tests pass locally (8 passed, 0 regressions).
Happy to address any review feedback.

saikonen · 2026-03-16T17:07:45Z

        )
        return response

+    async def count_records(self, filter_dict=None) -> int:


any performance concerns with the COUNT approach for providing pagination data? there is a plan to introduce filtering based on tags as well, which would require matching values in the JSONB tags column.

In this PR X-Total-Count uses a COUNT(*) scoped to flow_id, which is acceptable for the current query shape. For future tag-based JSONB filtering, I agree we should revisit with a dedicated strategy (likely GIN index plus possibly different total-count behavior under tag filters). I documented this trade-off in code and kept it out of scope for this PR.

saikonen · 2026-03-16T17:16:02Z

Is it not necessary to add pagination to anything besides runs?

I scoped this PR to runs because runs are the unbounded-growth endpoint and the one currently at risk for oversized responses. Steps/tasks have different bounded patterns. I kept the implementation reusable so adding pagination to other endpoints in follow-up PRs is straightforward.

saikonen · 2026-03-16T17:20:27Z

+            after_run_number = None
+
+        # --- Legacy opt-out (_limit=0): unbounded query, no headers -------
+        if limit == 0:


how do we ensure backwards compatibility here? it seems the limit is never 0 if the client does not explicitly set it to the value.

You are right. _limit=0 is not a true backward-compat path unless explicitly sent. I updated behavior so compatibility is preserved when _limit is omitted (legacy unbounded path). I removed _limit=0 opt-out and now require positive _limit when pagination is requested; tests were updated accordingly.

Aryan95614 · 2026-03-24T04:50:23Z

Nice approach. Using run_number as the cursor key is actually cleaner for ordering stability than ts_epoch since it's a monotonic PK with no tie risk. I went with ts_epoch on the GSoC fork (saikonen/metaflow-service PR #9) because it's available on all six table types (flows, steps, tasks, artifacts, metadata all have ts_epoch but not all have a sequential PK equivalent). Tradeoff is needing a compound cursor for tiebreaking which I raised in Issue #11 on the GSoC fork. Curious if you're planning to extend beyond runs to the other endpoints?

GunaPalanivel mentioned this pull request Feb 23, 2026

500 error encountered on flows/<flow_id>/runs requests due to size of payload #322

Open

saikonen reviewed Mar 16, 2026

View reviewed changes

Fix run pagination compatibility semantics and clarify count trade-offs

650e4cf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api): cursor-based pagination for GET /flows/{flow_id}/runs (#322)#463

feat(api): cursor-based pagination for GET /flows/{flow_id}/runs (#322)#463
GunaPalanivel wants to merge 2 commits intoNetflix:masterfrom
GunaPalanivel:fix/issue-322-paginate-flow-runs

GunaPalanivel commented Feb 20, 2026 •

edited

Loading

Uh oh!

GunaPalanivel commented Feb 24, 2026

Uh oh!

saikonen Mar 16, 2026

Uh oh!

GunaPalanivel Mar 17, 2026 •

edited

Loading

Uh oh!

saikonen Mar 16, 2026

Uh oh!

GunaPalanivel Mar 17, 2026 •

edited

Loading

Uh oh!

saikonen Mar 16, 2026

Uh oh!

GunaPalanivel Mar 17, 2026 •

edited

Loading

Uh oh!

Aryan95614 commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

GunaPalanivel commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Backward Compatibility

Problem

Design

Cursor Choice

Count Behavior and Trade-off

Validation and Errors

Tests

Scope

AI Tool Usage

Uh oh!

GunaPalanivel commented Feb 24, 2026

Uh oh!

saikonen Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

GunaPalanivel Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

saikonen Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

GunaPalanivel Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

saikonen Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

GunaPalanivel Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Aryan95614 commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GunaPalanivel commented Feb 20, 2026 •

edited

Loading

GunaPalanivel Mar 17, 2026 •

edited

Loading

GunaPalanivel Mar 17, 2026 •

edited

Loading

GunaPalanivel Mar 17, 2026 •

edited

Loading