Skip to content

Pipeline Requirements Management and Dependency Conflicts #545

@Maleya

Description

@Maleya

The current pipeline requirements installation system installs all dependencies globally as root without isolation, leading to potential dependency conflicts, security vulnerabilities, and overall system instability.

Current Behavior

As far as I can see pipeline requirements are handled through two mechanisms:

  1. Frontmatter requirements - specified in pipeline docstrings (e.g., requirements: langfuse<3.0.0)
  2. Global requirements file - via PIPELINES_REQUIREMENTS_PATH environment variable

Both methods install packages globally using:

subprocess.check_call([sys.executable, "-m", "pip", "install", req])

Via:

  • Pipeline loading: main.py:load_module_from_path()
  • Frontmatter parsing: main.py:install_frontmatter_requirements()
  • Setup script: start.sh:install_frontmatter_requirements()

Issues

This leads to a number of potentially serious issues.

  • Multiple pipelines can specify conflicting version requirements
    • Example: Pipeline A requires requests==2.25.0, Pipeline B requires requests>=2.30.0
    • No conflict resolution mechanism exists
    • Last installed version "wins", potentially breaking other pipelines
  • All packages installed with root privileges
    • No sandboxing or permission restrictions
  • All dependencies installed in global Python environment
    • Potential conflicts with base system requirements
    • Installation order affects final environment state
    • Same pipeline configuration can result in different environments

Reproduction Steps

This should be fairly easy to reproduce. For instance:

  1. Create two pipelines with conflicting requirements
  2. Install both pipelines
  3. Observe that only the last installed version is available

Expected Behavior

  • Pipelines should have isolated dependency environments
  • Dependency conflicts should be detected and reported
  • System should remain stable regardless of pipeline requirements
  • Security boundaries should prevent unauthorized system modifications

Suggested changes

At the very least each pipeline should have its own virtual environment (1) to avoid conflicts. But further changes could be:

  1. Virtual Environment Isolation: Each pipeline gets its own virtual environment
  2. Container-based Isolation: Use containers for pipeline execution
  3. Dependency Caching: Shared cache for common dependencies
  4. Package Verification: Verify package integrity and sources
  5. Resource Limits: Implement installation timeouts and size limits

Related Github issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions