Skip to content

Conversation

safoinme
Copy link
Contributor

@safoinme safoinme commented Sep 8, 2025

Describe changes

This PR delivers a focused, pragmatic refactor to make serving reliable and easy to reason about for a beta release. It simplifies capture configuration to a single typed Capture, unifies the execution path, introduces memory-only isolation, and hardens the realtime runtime with bounded resources and better shutdown behavior.

Summary

  • Collapse capture to a single typed API: Capture(memory_only, code, logs, metadata, visualizations, metrics).
  • Canonical capture fields on deployments; StepLauncher reads only those (no env/dict overrides).
  • Serving request parameters are merged safely (allowlist + light validation + size caps); logged.
  • Memory-only serving mode: truly no runs/artifacts/log writes; in-process handoff with per-request isolation.
  • Realtime runtime: bounded queue, safe cache sweep, circuit-breaker maintained, improved shutdown and metrics.
  • Defensive artifact writes: validation and minimal retries/backoff; fail fast on partial responses.

Below is a high-level architecture and request flow for clarity.

+------------------+              +------------------------+
|  HTTP Request    |------------->|   Serving Context?     |
+------------------+              +------------------------+
                                       |            |
                                       | Yes        | No
                                       v            v
                              +----------------+  +------------------------+
                              | memory_only?   |  |  Batch Runtime         |
                              +----------------+  |  (DefaultStepRuntime)  |
                                  |      |        +------------------------+
                                  |Yes   |No                    |
                                  v      v                      v
                      +------------------------+        +-------------------+
                      | MemoryStepRuntime      |        |    StepRunner     |
                      | (in-process handoff)   |        +-------------------+
                      +------------------------+                   |
                                  |                               v
                                  v                        +-------------------+
                         +-------------------+            |     Response      |
                         |     StepRunner    |            +-------------------+
                         +-------------------+
                                  |
                                  v
                         +-------------------+
                         |     Response      |
                         +-------------------+

Realtime (non-memory) path detail:

+------------------------+
| RealtimeStepRuntime    |
| (bounded queue +       |
|  background publisher) |
+------------------------+
            |
            v
     +-------------+      (publish events may be queued; if the queue is full,
     |  StepRunner |       they are processed inline as a safe fallback)
     +-------------+
            |
            v
      +-----------+
      | Response  |
      +-----------+

Motivation

  • Ensure serving is fast (async by default) and memory-only mode never touches DB/FS.
  • Prevent cross-request contamination in memory-only; bound resource usage under load.
  • Provide clear logs and metrics for diagnosis; pave the way for production hardening.

Key Behavioral Changes

  • Pipeline code uses a single Capture type; dicts/strings disallowed in code paths.
  • Serving merges request parameters only from a declared allowlist; oversized/mismatched params are dropped with warnings.
  • Memory-only serving executes fully in-process (no runs/artifacts), with explicit logs; step logs disabled to avoid FS writes.
  • Realtime runtime backgrounds publishing with a bounded queue; if the queue is full, events are processed inline (backpressure).

File-Level Changes (Selected)

  • Capture & Compiler

    • src/zenml/capture/config.py: Single Capture dataclass; removed BatchCapture/RealtimeCapture/CapturePolicy.
    • src/zenml/config/compiler.py: Normalizes typed capture into canonical deployment fields.
    • src/zenml/models/v2/core/pipeline_deployment.py: Adds canonical capture fields to deployment models.
    • src/zenml/zen_stores/schemas/pipeline_deployment_schemas.py: Adds DB columns for canonical capture fields.
  • Orchestrator

    • src/zenml/orchestrators/step_launcher.py:
      • Uses canonical fields and serving context.
      • Adds _validate_and_merge_request_params (allowlist + type coercion + size caps).
      • Disables logs in memory-only; avoids FS cleanup for memory:// URIs.
    • src/zenml/orchestrators/run_entity_manager.py: In-memory step_run stub with minimal config (enable_*, substitutions).
    • src/zenml/orchestrators/utils.py: Serving context helpers and docstrings; removed request-level override plumbing.
  • Execution Runtimes

    • src/zenml/execution/step_runtime.py:
      • MemoryStepRuntime: instance-scoped store/locks; per-run cleanup; no globals.
      • DefaultStepRuntime.store_output_artifacts: defensive batch create (retries/backoff), response count validation; TODO for server-side atomicity.
  • src/zenml/execution/realtime_runtime.py:

    • Bounded queue (maxsize=1024), inline fallback on Full.
    • Safe cache sweep (snapshot + safe pop, small time budget).
    • Shutdown logs final metrics and warns on non-graceful termination; TODOs for thread-pool or async migration and metrics export.
  • Serving Service & Docs

    • src/zenml/deployers/serving/service.py: Serving context handling; parameter exposure; cleanup.
    • docs/book/serving/*: Updated to single Capture, serving async default, memory-only warning/behavior.
    • examples/serving/README.md: Updated to reflect new serving model; memory-only usage.

Configuration & Tuning

  • Serving mode is inferred by context (batch vs. serving). No per-request capture overrides.
  • Realtime runtime tuning via env:
    • ZENML_RT_CACHE_TTL_SECONDS (default 60), ZENML_RT_CACHE_MAX_ENTRIES (default 256)
    • ZENML_RT_ERR_REPORT_INTERVAL (default 15), circuit breaker envs unchanged
  • Memory-only: ignored outside serving with a warning.

Testing & Validation

  • Unit
    • Request parameter validation: allowlist, size caps, type coercion.
    • Memory runtime isolation: per-instance store; no cross-contamination.
    • Realtime runtime: queue Full → inline fallback; race-free cache sweep; shutdown metrics.
  • Defensive artifact writes: retries/backoff; mismatch detection.
Client  -->  Server:  POST /invoke (JSON parameters)
Server         :  Merge params (allowlist / type-coercion / size caps)

Case A (Serving + memory_only):
  Server  -->  MemoryStepRuntime : in-process handoff, no DB/FS writes
  MemoryStepRuntime --> StepRunner : execute step(s)
  StepRunner --> Server : outputs
  Server  --> Client : Response (immediate)

Case B (Serving, async default):
  Server  -->  RealtimeStepRuntime : enqueue publish events
  RealtimeStepRuntime --> (Worker) : drain bounded queue in background
  RealtimeStepRuntime --> StepRunner : execute step(s)
  StepRunner --> Server : outputs
  Server  --> Client : Response (non-blocking; background publish continues)

Case C (Batch, outside serving):
  Server  -->  DefaultStepRuntime : blocking publish
  DefaultStepRuntime --> StepRunner : execute step(s)
  StepRunner --> Server : outputs
  Server  --> Client : Response (after publish)
  • Manual
    • Memory-only serving: no /runs or /artifact_versions calls; explicit log: [Memory-only] … in-process handoff (no runs/artifacts).
    • Serving async default: responses return immediately; background updates proceed.

Risk & Mitigations

  • Request param merge: now restricted by allowlist/size/type; unknowns dropped with warnings.
  • Memory-only: per-request isolation and no FS/DB writes; logs disabled to avoid side effects.
  • Realtime: bounded queue with inline fallback; circuit breaker remains in place.
  • Artifact writes: fail fast rather than proceed with partial results (client-side). Server-side atomic semantics to follow.

Pre-requisites

Please ensure you have done the following:

  • I have read the CONTRIBUTING.md document.
  • I have added tests to cover my changes.
  • I have based my new branch on develop and the open PR is targeting develop. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.
  • IMPORTANT: I made sure that my changes are reflected properly in the following resources:
    • ZenML Docs
    • Dashboard: Needs to be communicated to the frontend team.
    • Templates: Might need adjustments (that are not reflected in the template tests) in case of non-breaking changes and deprecations.
    • Projects: Depending on the version dependencies, different projects might get affected.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Other (add details above)

This commit introduces a new capture configuration system that simplifies the handling of capture modes in ZenML. The `Capture` class is now used to define capture settings, allowing for more explicit and typed configurations. The pipeline decorator and runtime management have been updated to support this new structure, enhancing clarity and usability.

Additionally, the `MemoryStepRuntime` and `RealtimeStepRuntime` classes have been improved to better manage runtime states and error reporting, including the implementation of a circuit breaker for resilience under load.

This refactor aims to streamline the serving architecture and improve the overall performance and maintainability of the codebase.
This commit introduces significant enhancements to the ZenML serving architecture, focusing on a unified capture configuration and memory-only execution mode. The `Capture` class has been simplified to a single typed API, streamlining the capture process and eliminating confusion around capture modes.

Key changes include the introduction of memory-only serving, which ensures no database or filesystem writes occur, and the implementation of a robust realtime runtime with improved resource management and error handling. Additionally, request parameter validation has been enhanced to ensure safe merging and type coercion, while logging and metrics have been refined for better observability.

These updates aim to provide a more efficient and user-friendly experience for serving pipelines, paving the way for future enhancements and production readiness.
This commit introduces a new Alembic migration that creates the `pipeline_endpoint` table and modifies the `pipeline_deployment` table to include additional capture-related columns. The new schema supports enhanced capture configurations for pipeline endpoints, improving the overall functionality and flexibility of the ZenML framework.

The migration includes the following changes:
- Creation of the `pipeline_endpoint` table with relevant fields and constraints.
- Addition of columns for capturing various aspects of pipeline deployments, such as memory usage, logs, and metrics.

This update lays the groundwork for improved pipeline management and monitoring capabilities.
This commit updates the weather pipeline example to include a memory-only capture configuration using the `Capture` class. This enhancement allows for improved management of pipeline execution without persisting data to a database or filesystem.

Additionally, the `run_entity_manager.py` file has been modified to utilize `field(default_factory=...)` for better initialization of dataclass fields, ensuring that default values are generated correctly. The `step_launcher.py` file has also been updated to handle memory-only stubs gracefully during execution interruptions.

These changes contribute to a more robust and flexible pipeline serving architecture, aligning with recent refactors in the ZenML framework.
This commit introduces a new `DefaultStepRuntime` class, consolidating the runtime logic previously scattered across multiple files. The `MemoryStepRuntime` and `DefaultStepRuntime` have been separated into their respective files, enhancing modularity and maintainability.

Additionally, the `weather_pipeline.py` example has been updated to ensure proper execution of the pipeline. These changes aim to streamline the step execution process and improve the overall structure of the ZenML codebase, aligning with recent architectural enhancements.
This commit introduces a new document, `beta_todos.md`, outlining a comprehensive checklist for post-beta hardening efforts. The checklist includes key areas such as serving runtime, artifact write semantics, request parameter schema, monitoring, and resource management. Each section details specific tasks aimed at enhancing production readiness and scalability.

This addition serves as a roadmap for future improvements and ensures that all necessary steps are documented for achieving a robust and reliable deployment of the ZenML framework.
Copy link
Contributor

coderabbitai bot commented Sep 8, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/step-runtime-capture

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

github-actions bot commented Sep 8, 2025

🔍 Broken Links Report

Summary

  • 📁 Files with broken links: 1
  • 🔗 Total broken links: 1
  • 📄 Broken markdown links: 1
  • 🖼️ Broken image links: 0
  • ⚠️ Broken reference placeholders: 0

Details

File Link Type Link Text Broken Path
book/toc.md 📄 "Pipeline Serving Capture Policies" how-to/serving/capture-policies.md
📂 Full file paths
  • /home/runner/work/zenml/zenml/scripts/../docs/book/toc.md

@github-actions github-actions bot added internal To filter out internal PRs and issues enhancement New feature or request labels Sep 8, 2025
Copy link

Warning

Review the following alerts detected in dependencies.

According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.

Action Severity Alert  (click "▶" to expand/collapse)
Warn Critical
[email protected] has a Critical CVE.

CVE: GHSA-3863-2447-669p transformers has a Deserialization of Untrusted Data vulnerability (CRITICAL)

Affected versions: < 4.36.0

Patched version: 4.36.0

From: ?pypi/[email protected]

ℹ Read more on: This package | This alert | What is a critical CVE?

Next steps: Take a moment to review the security alert above. Review the linked package source code to understand the potential risk. Ensure the package is not malicious before proceeding. If you're unsure how to proceed, reach out to your security team or ask the Socket team for help at [email protected].

Suggestion: Remove or replace dependencies that include known critical CVEs. Consumers can use dependency overrides or npm audit fix --force to remove vulnerable dependencies.

Mark the package as acceptable risk. To ignore this alert only in this pull request, reply with the comment @SocketSecurity ignore pypi/[email protected]. You can also ignore all packages with @SocketSecurity ignore-all. To ignore an alert for all future pull requests, use Socket's Dashboard to change the triage state of this alert.

View full report

Base automatically changed from feature/served-pipelines to develop September 26, 2025 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request internal To filter out internal PRs and issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant