Skip to content

[PERF] Optimize /workflowDefs page: paginate and search latest-versions API #781

@akhilpathivada

Description

@akhilpathivada

Motivation

The UI /workflowDefs page currently calls GET /metadata/workflow which returns all versions of all workflows. This causes slow response times, high memory usage, and longer page load times. The UI only needs the latest version of each workflow.

The GET /metadata/workflow/latest-versions API already exists and returns only the latest versions, but it's also unpaginated. We propose:

  1. Redirect UI to use /latest-versions instead of /workflow
  2. Add pagination support to /latest-versions to handle large-scale deployments efficiently
  3. Add server-side field-level search to support the existing search bar with database-backed filtering

Proposed Solution

Modify the existing /metadata/workflow/latest-versions endpoint to support pagination and field-level search, and redirect the UI to use it.

Since /metadata/workflow/latest-versions is currently unused (only appears in tests), we can safely enhance it without breaking changes.

Key Components

1. Optional Interface (PaginatedMetadataDAO)

  • Define in core module as opt-in capability
  • Methods:
    • searchWorkflowDefsLatestVersions(int start, int size)
    • searchWorkflowDefsLatestVersions(int start, int size, String filterField, String filterValue)
  • Returns: SearchResult<WorkflowDef> with total count and paginated/filtered results

2. Service Layer with Fallback

if (metadataDAO instanceof PaginatedMetadataDAO) {
    // Use database-level pagination and filtering
} else {
    // Fallback: in-memory pagination (no server-side filtering)
    // UI handles client-side filtering for non-Postgres backends
}

** 3. Input Validation**

ALLOWED_FILTER_FIELDS whitelist in MetadataServiceImpl prevents arbitrary field access: name, description, ownerEmail, version, schemaVersion, timeoutPolicy, timeoutSeconds, restartable, workflowStatusListenerEnabled

4. Enhanced REST Endpoint

GET /metadata/workflow/latest-versions?start={start}&size={size}&filterField={field}&filterValue={value}

Parameter Required Default Description
start No 0 Starting index (0-based)
size No 100 Results per page
filterField No Workflow definition field to filter on
filterValue No Case-insensitive substring to match

Response:

{
  "totalHits": 5000,
  "results": [
    { "name": "workflow1", "version": 3, ... },
    { "name": "workflow2", "version": 2, ... }
  ]
}

Changes from current:

  • Return type: List → SearchResult
  • Adds pagination parameters
  • Adds field-level search parameters
  • Returns total count for pagination controls

5. Implementation Plan

  • Phase 1: PostgreSQL (native SQL LIMIT/OFFSET, ILIKE for search)
  • Phase 2: Other backends opt-in as needed (same pattern) - can take this later

6. UI Migration

  • Change useWorkflowDefs() to call /metadata/workflow/latest-versions with pagination and filter params
  • Re-enable search bar for server-side filtered search with 300ms debounce
  • Disable client-side sorting when paginating server-side (avoids misleading single-page sorts)
  • Retain client-side filtering as fallback for older backends

7. Backward Compatibility

  • All existing endpoints unchanged
  • No breaking changes — when no parameters are provided, returns all latest versions wrapped in SearchResult
  • Old UI + new backend → works (UI ignores pagination)
  • New UI + old backend → works (UI falls back to client-side filtering)
  • Non-Postgres backends → unaffected (in-memory pagination, no server-side search)

Why This Approach?

Why modify /latest-versions instead of creating new endpoint?

  • Endpoint exists but is unused (only in tests)
  • Cleaner than having multiple similar endpoints
  • Clear intent - "latest versions" with pagination and search

Why optional interface pattern?

  • Zero impact on other DAO implementations
  • Graceful degradation with in-memory fallback
  • Backends opt-in based on capabilities

Why SearchResult<WorkflowDef> instead of List<WorkflowDef>?

  • Need to return total count for pagination controls (e.g., "Page 1 of 50")
  • Follows existing pattern: /executions page uses the same approach
  • /workflow/search returns SearchResult<Workflow> with server-side pagination
  • Consistent UX across the application

Why filterField/filterValue instead of a single query param?

  • Matches the existing search bar UI pattern (field selector + substring input)
  • Avoids collision with existing /workflow/search API conventions (query for structured search, freeText for unstructured)
  • Allows ALLOWED_FILTER_FIELDS whitelist validation per-field

Why json_data::jsonb cast?

  • The meta_workflow_def table stores only name, version, latest_version, created_on, modified_on as real columns
  • All other fields (description, ownerEmail, timeoutPolicy, etc.) live inside a text-typed json_data column
  • The ->> operator requires jsonb, so an explicit ::jsonb cast is needed

Why not IndexDAO?
At this point, Metadata lives in persistence layer, not index layer. Direct pagination at persistence is simpler and more efficient.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions