Skip to content

Refactor/g5 pipeline executors#183

Open
rsteele5 wants to merge 3 commits into
g5from
refactor/g5-pipeline-executors
Open

Refactor/g5 pipeline executors#183
rsteele5 wants to merge 3 commits into
g5from
refactor/g5-pipeline-executors

Conversation

@rsteele5
Copy link
Copy Markdown
Collaborator

Goals:

  • See Refactor: Pipeline Executors #177 for develop branch version
  • Consolidate pipeline executor functionality (Extract, Classification, and [LLM] Indexing)
    • Reducing code duplication
    • Making pipeline related parsing and execution functions consistent (and modular)
  • Reorganize certain executor into functional packages
    • NOTE: this only effects PipelineExecutor and DocumentLLMPipelineExecutor.
    • PipelineExecutor has become a base class and is not longer intended for direct instantiation.
    • DocumentLLMPipelineExecutor is now LLMIndexerExecutor
    • MetaMergeExecutor has been derived from the merge_metadata function and is no longer in PipelineExecutor
  • Convert all lingering executor imports to absolute imports

Additional Requested Feature:

  • Add a force OCR regeneration feature to the TextExtractionExecutor
    • This executor attempts to parse a "force" attribute from an "ocr" "extract" feature in the request's feature list. If true, any existing or cached OCR will be ignored.
    • Example: {"name":"ocr","type":"extract","force":true}
  • check and update documentation. See guide and ask the team.

@rsteele5 rsteele5 force-pushed the refactor/g5-pipeline-executors branch from 4daa1be to 50b6cf7 Compare October 3, 2025 21:49
rsteele added 3 commits October 14, 2025 14:59
* Consolidate pipeline related functions
** moved store_metadata pipeline function to base_pipeline.py
** All pipelines generate their root_asset_dir from the MARIE_CACHE dir
* Removed classification processing from llm_pipeline.py
* Add OCR enabled flag to pipeline config for classification and llm pipelines (default True)
* generalized pipeline component enabled status
* move meta_merge_executor from merge to meta package
* fix imports
* Feature example:{"name":"ocr","type":"extract","force":true}
* meta_merge_executor use request_util.get_payload_features to parse payload features
@rsteele5 rsteele5 force-pushed the refactor/g5-pipeline-executors branch from 50b6cf7 to 72d01ee Compare October 14, 2025 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant