-
-
Notifications
You must be signed in to change notification settings - Fork 629
Open
Description
The current pipeline requirements installation system installs all dependencies globally as root without isolation, leading to potential dependency conflicts, security vulnerabilities, and overall system instability.
Current Behavior
As far as I can see pipeline requirements are handled through two mechanisms:
- Frontmatter requirements - specified in pipeline docstrings (e.g.,
requirements: langfuse<3.0.0
) - Global requirements file - via
PIPELINES_REQUIREMENTS_PATH
environment variable
Both methods install packages globally using:
subprocess.check_call([sys.executable, "-m", "pip", "install", req])
Via:
- Pipeline loading:
main.py:load_module_from_path()
- Frontmatter parsing:
main.py:install_frontmatter_requirements()
- Setup script:
start.sh:install_frontmatter_requirements()
Issues
This leads to a number of potentially serious issues.
- Multiple pipelines can specify conflicting version requirements
- Example: Pipeline A requires
requests==2.25.0
, Pipeline B requiresrequests>=2.30.0
- No conflict resolution mechanism exists
- Last installed version "wins", potentially breaking other pipelines
- Example: Pipeline A requires
- All packages installed with root privileges
- No sandboxing or permission restrictions
- All dependencies installed in global Python environment
- Potential conflicts with base system requirements
- Installation order affects final environment state
- Same pipeline configuration can result in different environments
Reproduction Steps
This should be fairly easy to reproduce. For instance:
- Create two pipelines with conflicting requirements
- Install both pipelines
- Observe that only the last installed version is available
Expected Behavior
- Pipelines should have isolated dependency environments
- Dependency conflicts should be detected and reported
- System should remain stable regardless of pipeline requirements
- Security boundaries should prevent unauthorized system modifications
Suggested changes
At the very least each pipeline should have its own virtual environment (1) to avoid conflicts. But further changes could be:
- Virtual Environment Isolation: Each pipeline gets its own virtual environment
- Container-based Isolation: Use containers for pipeline execution
- Dependency Caching: Shared cache for common dependencies
- Package Verification: Verify package integrity and sources
- Resource Limits: Implement installation timeouts and size limits
Related Github issues
SamuelBortolinAfliant, amtsvch, winstonallo, MichaelSParkin3 and realogbrother
Metadata
Metadata
Assignees
Labels
No labels