This pipeline automates the process of ingesting documentation into a data science pipeline (doc ingestion pipeline) when changes are detected in the collections directory. It uses GitHub webhooks to detect pushes to the main branch and executes a series of tasks to process the documentation.
- Name:
shared-workspace - Storage: 3Gi
- Access Mode: ReadWriteOnce
- Provisioner: AWS EBS
- Name:
doc-ingestion-listener - Listens for GitHub push events via a webhook
- Name:
doc-ingestion-trigger - Includes a CEL interceptor that filters for:
- Push events to the
mainbranch - Changes (modified, added, or removed) in the
collectionsdirectory
- Push events to the
- Name:
doc-ingestion - Parameters:
APPLICATION_NAME: Repository nameGIT_URL: Git repository URLGIT_BRANCH: Git branch name (default: main)
-
fetch-ds-pipeline-repository
- Clones the Git repository to a shared workspace
- Uses the OpenShift Pipelines git-clone task
-
execute-doc-ingestion-pipeline
- Executes the data science pipeline for document ingestion
- Uses a custom task that connects to Data Science Pipelines via Kubernetes API
- Creates a run in the Data Science Pipeline environment
-
doc-process-status
- Monitors the status of the data science pipeline run
- Streams logs from the system-container-impl pod
- Reports success or failure based on the pod status
- Times out after 10 hours
- A push to the main branch with changes in the collections directory triggers the pipeline
- The repository is cloned to a shared workspace
- The document ingestion data science pipeline is executed
- Logs are streamed and the status is monitored until completion
The EventListener is exposed via an OpenShift Route with TLS edge termination.
- OpenShift cluster with Tekton Pipelines installed
- Data Science Pipeline Application is deployed in the same namespace
- GitHub repository configured with webhooks pointing to the EventListener route