Skip to content

Feature: Resumable Workflows with Task Output Persistence #31

@nehmetohmedb

Description

@nehmetohmedb

Problem Statement

When multi-step AI workflows fail partway through execution, users must restart from the beginning. This results in:

  • Loss of completed work and progress
  • Unnecessary costs from re-running successful tasks
  • Wasted time repeating already-completed operations
  • Frustration when long-running workflows fail near completion

User Stories

As a user, I want to:

  • Resume failed workflows from the last successful checkpoint
  • See which tasks completed successfully before failure
  • Review outputs from completed tasks
  • Choose whether to resume, re-run specific tasks, or start fresh
  • Understand what external data has changed since the failure
  • Make informed decisions about resume safety

Functional Requirements

Core Resume Capability

  • System preserves outputs from each completed task
  • Failed executions can be resumed from any completed checkpoint
  • Previous task outputs are available as context for remaining tasks
  • Parent-child execution relationships are maintained for audit trail

Data Consistency Protection

  • System detects changes in external dependencies (vector searches, databases, APIs)
  • Risk assessment shows impact of changes on workflow consistency
  • Multiple resume strategies offered based on risk level
  • Clear visual indicators of resume safety (green/yellow/orange/red)

User Controls

  • "Resume" button on failed/stopped executions
  • Resume analysis dialog showing:
    • Completed vs remaining tasks
    • Detected changes in external systems
    • Risk assessment and recommendations
    • Available resume strategies
  • Option to re-run only affected tasks
  • Force resume option for advanced users

Acceptance Criteria

  • Task outputs are automatically preserved during execution
  • Failed executions display a "Resume" action
  • Resume analysis correctly identifies data consistency risks
  • Users can view completed task outputs before resuming
  • System provides clear risk assessment with color coding
  • Multiple resume strategies are available based on risk
  • Resumed workflows skip already-completed tasks
  • Previous outputs are injected as context for remaining tasks
  • Audit trail links resumed execution to original
  • Old checkpoint data is cleaned up after configurable retention period

Success Metrics

  • Reduction in repeated task executions by 60-80%
  • Cost savings from avoided LLM API calls
  • Improved workflow completion rate
  • User satisfaction with resume experience
  • Reduced time to recover from failures

Out of Scope

  • Modifying CrewAI core library
  • Real-time workflow state synchronization
  • Automatic resume without user intervention
  • Cross-version workflow compatibility

Dependencies

  • Existing ExecutionTrace system for output capture
  • TaskStatus tracking for execution state
  • Vector search and database dependency tracking
  • Frontend dialog components for user interaction

Priority

High - Directly impacts user productivity and platform costs

Labels

enhancement, user-experience, execution, resilience, cost-optimization

Estimated Impact

  • User Benefit: High - Saves time and reduces frustration
  • Cost Benefit: High - Reduces unnecessary LLM API calls
  • Technical Complexity: Medium - Builds on existing infrastructure
  • Risk: Low-Medium - With proper safety checks

Additional Context

This feature is particularly valuable for:

  • Long-running analytical workflows
  • Multi-stage data processing pipelines
  • Workflows with expensive LLM operations
  • Development and testing iterations
  • Production workflows with transient failure points

The implementation should prioritize safety and transparency, ensuring users understand the implications of resuming with potentially changed data while still providing flexibility for different use cases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions