Skip to content

Latest commit

 

History

History
159 lines (106 loc) · 5.79 KB

File metadata and controls

159 lines (106 loc) · 5.79 KB

YT Framework

PyPI - Version Documentation Status CI Ask DeepWiki PyPI - Python Version coverage GitHub License

PyPI | Docs | DeepWiki | Examples


Overview

Python helpers and conventions for YTsaurus pipelines: YAML config, ordered stages under stages/, dev mode that mirrors many prod behaviors on disk, and prod mode that uploads src/ bundles to the cluster.

Architecture

  • Pipeline — loads config, builds the YT client, walks enabled_stages.
  • Stage — one BaseStage subclass plus config.yaml (and optional src/ for jobs).
  • Operations — map, vanilla, map-reduce/reduce, YQL via the client, S3 helpers, sorts, etc.
  • Configuration — OmegaConf-backed YAML; secrets in configs/secrets.env.

What ships in the box

  • Stage discovery (DefaultPipeline) from the filesystem layout.
  • dev / prod switch on the same code paths where possible.
  • Map, vanilla, YQL helpers, S3 listing/download patterns, table helpers, checkpoint upload wiring.
  • Optional custom Docker images, tokenizer tarballs, and multi-operation stages.

Installation

For Users

Install from PyPI into any Python 3.11+ environment (system Python, a virtualenv, or a Conda env):

pip install yt-framework

For Developers and Contributors

Recommended: one Conda environment for tests, formatting, pre-commit, and local documentation builds (avoids reinstalling tooling for each task):

git clone https://github.com/GregoryKogan/yt-framework.git
cd yt-framework
conda create -n yt-framework python=3.11
conda activate yt-framework
pip install -e ".[dev,docs]"

Use conda-forge as the channel when creating the env if that matches your setup (conda create -n yt-framework python=3.11 -c conda-forge).

Alternative: pip only — install in editable mode from source:

git clone https://github.com/GregoryKogan/yt-framework.git
cd yt-framework
pip install -e .

For development with testing tools (without the docs extra):

pip install -e ".[dev]"

For local Sphinx builds without the full dev extra, use pip install -e ".[docs]".

See CONTRIBUTING.md for the full development setup and Installation Guide for prerequisites.

Quick start

Three files: layout, entrypoint, stage + pipeline config.

  1. Layout

    mkdir my_pipeline && cd my_pipeline
    mkdir -p stages/my_stage configs
  2. pipeline.py

    from yt_framework.core.pipeline import DefaultPipeline
    
    if __name__ == "__main__":
        DefaultPipeline.main()
  3. Stage + config

    # stages/my_stage/stage.py
    from yt_framework.core.stage import BaseStage
    
    class MyStage(BaseStage):
        def run(self, debug):
            self.logger.info("Hello from YT Framework!")
            return debug
    # configs/config.yaml
    stages:
      enabled_stages:
        - my_stage
    
    pipeline:
      mode: "dev"  # Use "dev" for local development
python pipeline.py

Next: Docs quick start (table write), examples/, Pipelines and stages.

Examples

examples/ holds runnable trees; each folder has a README with scope and commands.

Requirements

Prerequisites

  • Python 3.11+
  • YT proxy + token when you run pipeline.mode: prod

YT Cluster Requirements

When running pipelines in production mode, code from ytjobs executes on YT cluster nodes. The cluster's Docker image (default or custom) must include:

  • Python 3.11+
  • ytsaurus-client >= 0.13.0 (for checkpoint operations)
  • boto3 == 1.35.99 (for S3 operations)
  • botocore == 1.35.99 (auto-installed with boto3)

If the cell default image lacks those pins, build a custom Docker image. Background: Cluster requirements.

Documentation

Getting help

Contributing

See CONTRIBUTING.md.