Endpoint to get latest step instance views across runs for a given workflow instance by anjujha · Pull Request #204 · Netflix/maestro

anjujha · 2026-04-08T23:05:19Z

Pull Request type

Bugfix
Feature
Refactoring (no functional changes, no api changes)
Build related changes (Please run ./gradlew build --write-locks to refresh dependencies)
Other (please describe):

NOTE: Please remember to run ./gradlew spotlessApply to fix any format violations.

Changes in this PR

Add endpoint to get latest step instance views across all runs for a workflow instance

Adds GET /{workflowId}/instances/{workflowInstanceId}/steps which returns the most recent step attempt per step across all runs, useful for workflows restarted from failure where steps ran in different runs.

Why is this needed

This endpoint makes it easy to get a snapshot of the current state of all steps in a workflow instance, regardless of how many times it has been restarted.
Currently, to understand the final state of each step you need to know the latest run ID and call the run-specific /runs/{runId}/steps endpoint. But for restarted workflows, different steps may have completed in different runs — some steps succeed early and are skipped in subsequent runs.
TheGET /{workflowId}/instances/{workflowInstanceId}/steps endpoint added in this PR solves this by querying across all runs and returns the most recent attempt per step, giving a complete and accurate view of the workflow instance's step states in a single call.

Example

Say you have a workflow with 3 steps: step-a, step-b, step-c.

Run 1 — all 3 steps ran, but step-c failed:

step	run	status
step-a	1	SUCCEEDED
step-b	1	SUCCEEDED
step-c	1	FAILED

Run 2 — restarted from failure, only step-c ran again:

step	run	status
step-c	2	SUCCEEDED

GET /instances/1/runs/2/steps (existing) — incomplete, only shows steps from run 2:

step	run	status
step-c	2	SUCCEEDED

GET /instances/1/steps (new) — complete picture, latest attempt per step across all runs:

step	run	status
step-a	1	SUCCEEDED
step-b	1	SUCCEEDED
step-c	2	SUCCEEDED

Testing

DAO (MaestroStepInstanceDaoTest):

testGetAllStepInstanceViews — simulates a restart-from-failure scenario: run 1 has two steps (job1, job2), run 2 only re-ran job1. Verifies that job1 is returned from run 2 and job2 from run 1 — i.e. the most recent attempt per step is correctly selected
across runs.

Controller (StepInstanceControllerTest):

testGetAllStepInstanceViews — verifies the DAO is called with correct arguments and the result is sorted by stepInstanceId.

Locally:

Also tested locally by spinning the server , creating a workflow, triggering multiple runs, and verifying endpoint returns as expected

…workflow instance Adds GET /{workflowId}/instances/{workflowInstanceId}/steps which returns the most recent step attempt per step across all runs, useful for workflows restarted from failure where steps ran in different runs.

akashdw · 2026-04-09T05:10:09Z

+      INNER_RANK_QUERY_ALL_FIELD_WITH
+          + ", ROW_NUMBER() OVER (PARTITION BY step_id ORDER BY workflow_run_id DESC, step_attempt_id DESC) AS rank"
+          + GET_STEP_FIELD_QUERY_FROM
+          + ") SELECT * FROM inner_ranked WHERE rank=1";


If you have any benchmarks or EXPLAIN / query plan results, could you share those as well?

I have pasted the query plan below

QUERY PLAN

Subquery Scan on inner_ranked (cost=56.59..57.05 rows=1 width=1707) (actual time=1.327..1.486 rows=211 loops=1) Filter: (inner_ranked.rank = 1) -> WindowAgg (cost=56.59..56.88 rows=13 width=1707) (actual time=1.326..1.471 rows=211 loops=1) Run Condition: (row_number() OVER (?) <= 1) -> Sort (cost=56.59..56.62 rows=13 width=1699) (actual time=1.320..1.329 rows=211 loops=1) Sort Key: maestro_step_instance.step_id COLLATE "C", maestro_step_instance.workflow_run_id DESC, maestro_step_instance.step_attempt_id DESC Sort Method: quicksort Memory: 410kB -> Index Scan using maestro_step_instance_pkey on maestro_step_instance (cost=0.42..56.35 rows=13 width=1699) (actual time=0.028..0.254 rows=211 loops=1) Index Cond: ((workflow_id = '<redacted>_large_demo'::text) AND (workflow_instance_id = 1)) Planning Time: 0.346 ms Execution Time: 1.567 ms (11 rows) Query always hits the primary key index on (workflow_id, workflow_instance_id)

akashdw · 2026-04-09T05:20:09Z

    value = "/api/v3/workflows",
    produces = MediaType.APPLICATION_JSON_VALUE,
    consumes = MediaType.APPLICATION_JSON_VALUE)
+@SuppressWarnings("PMD.AvoidDuplicateLiterals")


Can you clarify why this is needed?

I added this here because without this we will have to use constant for 'workflowId' and 'workflowInstanceId' in line 98 and 99 below
@Valid @NotNull @PathVariable("workflowId") String workflowId

Previously this file has a few such strings but with my new endpoint it crossed over the PMD threshold

Similar pattern is used in other controllers

maestro/maestro-server/src/main/java/com/netflix/maestro/server/controllers/WorkflowController.java

Line 74 in 65c2d8a

@SuppressWarnings("PMD.AvoidDuplicateLiterals")

maestro/maestro-server/src/main/java/com/netflix/maestro/server/controllers/WorkflowActionController.java

Line 52 in 65c2d8a

@SuppressWarnings("PMD.AvoidDuplicateLiterals")

maestro/maestro-server/src/main/java/com/netflix/maestro/server/controllers/WorkflowInstanceActionController.java

Line 41 in 65c2d8a

@SuppressWarnings("PMD.AvoidDuplicateLiterals")

rdeepak2002

lgtm!

praneethy91

LGTM

anjujha marked this pull request as ready for review April 8, 2026 23:06

akashdw reviewed Apr 9, 2026

View reviewed changes

akashdw approved these changes Apr 9, 2026

View reviewed changes

rdeepak2002 approved these changes Apr 9, 2026

View reviewed changes

anjujha merged commit 0150f2a into Netflix:main Apr 9, 2026
1 check passed

praneethy91 reviewed Apr 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Endpoint to get latest step instance views across runs for a given workflow instance #204

Endpoint to get latest step instance views across runs for a given workflow instance #204
anjujha merged 1 commit intoNetflix:mainfrom
anjujha:anju/get-workflow-instance-steps

anjujha commented Apr 8, 2026 •

edited

Loading

Uh oh!

akashdw Apr 9, 2026 •

edited

Loading

Uh oh!

anjujha Apr 9, 2026 •

edited

Loading

Uh oh!

akashdw Apr 9, 2026

Uh oh!

anjujha Apr 9, 2026

Uh oh!

rdeepak2002 left a comment

Uh oh!

Uh oh!

praneethy91 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

anjujha commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request type

Changes in this PR

Why is this needed

Example

Testing

Uh oh!

akashdw Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anjujha Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akashdw Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

anjujha Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

rdeepak2002 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

praneethy91 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

anjujha commented Apr 8, 2026 •

edited

Loading

akashdw Apr 9, 2026 •

edited

Loading

anjujha Apr 9, 2026 •

edited

Loading