Skip to content

Add Pipeline design #21

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
274 changes: 271 additions & 3 deletions doc/design.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ spec:
We hope Fluid users could represent it by the following line.

```python
skaffold_git = fluid.Git(
skaffold_git = fluid.git_resource(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also update below descriptions: Please be aware that the call to fluid.Git doesn't include the name

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use fluid.git for simple?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use fluid.git for simple?

Just tracing the current implementation. I think we can update the API and design in another PR if need.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember that Tekton has a limited number of pre-defined resource types and git is one of them. I would suggest we keep it Git other than git_resource, because git_resouce is not a fullname; git_pipeline_resource is. But git_pipeline_resource is too long. It seems reasonable to use a short name Git for one of a few pre-defined types.

revision="master",
url="https://github.com/GoogleContainerTools/skaffold)
```
Expand All @@ -125,7 +125,7 @@ spec:
We hope Fluid users could represent the above YAML file by the following line.

```python
skaffold_image_leeroy_web = fluid.Image(
skaffold_image_leeroy_web = fluid.image_resource(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, "image" is one of a few pre-defined resource types. How about we keep the name Image.

url="gcr.io/wangkuiyi/leeroy-web")
```

Expand All @@ -142,7 +142,7 @@ According to the [document](https://github.com/tektoncd/pipeline/blob/master/doc
The following example from the [Tekton tutorial](https://github.com/tektoncd/pipeline/blob/master/docs/tutorial.md#task-inputs-and-outputs) takes an input resource, an output resource, and two input parameters.

```yaml
goapiVersion: tekton.dev/v1alpha1
apiVersion: tekton.dev/v1alpha1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out!

kind: Task
metadata:
name: build-docker-image-from-git-source
Expand Down Expand Up @@ -184,5 +184,273 @@ spec:
We hope Fluid users could represent it by the following Python/Fluid code.

```python
@fluid.task
def build_docker_image_from_git_source(
docker_source: "input,git",
built_image: "output,image",
path_to_dockerfile="/workspace/docker-source/Dockerfile",
path_to_context="/workspace/docker-source"):
'''Define a Tekton Task that builds a Docker image from a Git repo'''
couler.step(image="gcr.io/kaniko-project/executor:v0.14.0",
cmd=["/kaniko/executor"],
args=[f"--dockerfile={path_to_dockerfile}",
f"--destination={built_image.url}",
f"--context={path_to_context}"],
env={"DOCKER_CONFIG": "/tekton/home/.docker/"})
```

### Pipeline

A `Pipeline` object is like function declaration, according to the [definition](https://github.com/tektoncd/pipeline/blob/master/docs/pipelines.md).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the above text, we stated that a Task is like a function. Here we state the same with Pipeline. What is the difference between these two types of "functions"?


A `Pipeline` in Tekton defines an ordered series of Tasks. Users can specify whether
the output of a `Task` is used as an input for the next `Task` using `from` property on `PipelineResources`

As the following example comes from [Tekton's tutorial](https://github.com/tektoncd/pipeline/blob/master/docs/tutorial.md#creating-and-running-a-pipeline)

``` yaml
apiVersion: tekton.dev/v1beta1
kind: Pipeline
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the lifecycle of a pipeline object?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pipeline object is like function definition in Python which includes Pipeline Tasks
A PipelineRun object would invoke the Pipeline Tasks as the dependency, can find some information from Task Status

metadata:
name: tutorial-pipeline
spec:
resources:
- name: source-repo
type: git
- name: web-image
type: image
tasks:
- name: build-skaffold-web
taskRef:
name: build-docker-image-from-git-source
params:
- name: pathToDockerFile
value: Dockerfile
- name: pathToContext
value: /workspace/docker-source/examples/microservices/leeroy-web #configure: may change according to your source
resources:
inputs:
- name: docker-source
resource: source-repo
outputs:
- name: builtImage
resource: web-image
- name: deploy-web
taskRef:
name: deploy-using-kubectl
resources:
inputs:
- name: source
resource: source-repo
- name: image
resource: web-image
from:
- build-skaffold-web
params:
- name: path
value: /workspace/source/examples/microservices/leeroy-web/kubernetes/deployment.yaml #configure: may change according to your source
- name: yamlPathToImage
value: "spec.template.spec.containers[0].image"
```

The above `Pipeline` is referencing a `Task` called `deploy-using-kubectl` defined as follows:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is referencing => refers to


``` yaml
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
name: deploy-using-kubectl
spec:
params:
- name: path
type: string
description: Path to the manifest to apply
- name: yamlPathToImage
type: string
description: |
The path to the image to replace in the yaml manifest (arg to yq)
resources:
inputs:
- name: source
type: git
- name: image
type: image
steps:
- name: replace-image
image: mikefarah/yq
command: ["yq"]
args:
- "w"
- "-i"
- "$(params.path)"
- "$(params.yamlPathToImage)"
- "$(resources.inputs.image.url)"
- name: run-kubectl
image: lachlanevenson/k8s-kubectl
command: ["kubectl"]
args:
- "apply"
- "-f"
- "$(params.path)"
```

We hope Fluid users can write the following program to express the above YAML file:

``` python
@fluid.task
def deploy_using_kubectl(
source_repo: "input, git",
web_image: "input,image",
path="/workspace/source/examples/microservices/leeroy-web/kubernetes/deployment.yaml",
yaml_path_to_image="spec.template.spec.containers[0].image"):
fluid.step(image="mikefarah/yq",
command=["yq"],
args=["w",
"-i",
f"{path}",
f"{yaml_path_to_image}",
f"{image.url}"])
fluid.step(image="lachlanevenson/k8s-kubectl",
command=["kubectl"],
args=["apply", "-f", f"{path}"])

@fluid.pipeline
def tutorial(source_repo: "resource,git", web_image: "resource,image"):
build_skaffold_web = build_docker_image_from_git_source(source_repo, web_image)

deploy_web = deploy_using_kubectl(source_repo, web_image)
deploy_web.web_image.from(build_skaffold_web)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to define dependency by not using input/output?

Copy link
Collaborator Author

@Yancey0623 Yancey0623 Mar 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use the runAfter keyword, and I added a section PiepeLine with DAG to introduce how to construct the DAG.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deploy_web.web_image.from syntax looks confusing. Python programmer do not do this with function parameters.

I am afraid that this weird design might come from the fact that a Pipeline is NOT similar to a function definition.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading more about Pipeline, I see it describes a DAG of tasks, where edges are data dependencies between tasks.

Thinking about the following example from the Tekton tutorial https://github.com/tektoncd/pipeline/blob/master/docs/pipelines.md#from:

- name: build-app
  taskRef:
    name: build-push
  resources:
    outputs:
      - name: image
        resource: my-image
- name: deploy-app
  taskRef:
    name: deploy-kubectl
  resources:
    inputs:
      - name: image
        resource: my-image
        from:
          - build-app

Using programming language idiom, it is simply function calls

deploy_kubectl(image=build_push(my_image))

It seems that what we expect users to write is

@fluid.pipeline
def build_and_deploy(image):
    deploy_kubectl(image=build_push(my_image))

where @fluid.pipeline should dry-run the function build_and_deploy to analysis the function dependencies, which is deploy_kubectl.image <- build_push, and generate the YAML definition of the Pipeline object.

I am not sure if the above suggestion is correct, or how reasonable it is. It has been a while I haven't use Tekton.

```

### Pipeline with DAG

The `Pipeline Tasks` in a `Pipeline` can be connected as a Directed Acyclic Graph (DAG). Each of the Tasks is a node, which can be
connected with an edge by:

- `from`: clauses on the PipelineResources needed by a Task.
- `runAfter`: clauses on the Pipeline Tasks.

As the following example of `Pipeline spec` comes from Tekton Pipeline tutorials:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which tutorial? We need a URL here.


``` yaml
- name: lint-repo
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look like a complete Kubernetes YAML. What is its kind?

taskRef:
name: pylint
resources:
inputs:
- name: workspace
resource: my-repo
- name: test-app
taskRef:
name: make-test
resources:
inputs:
- name: workspace
resource: my-repo
- name: build-app
taskRef:
name: kaniko-build-app
runAfter:
- test-app
resources:
inputs:
- name: workspace
resource: my-repo
outputs:
- name: image
resource: my-app-image
- name: build-frontend
taskRef:
name: kaniko-build-frontend
runAfter:
- test-app
resources:
inputs:
- name: workspace
resource: my-repo
outputs:
- name: image
resource: my-frontend-image
- name: deploy-all
taskRef:
name: deploy-kubectl
resources:
inputs:
- name: my-app-image
resource: my-app-image
from:
- build-app
- name: my-frontend-image
resource: my-frontend-image
from:
- build-frontend
```

This will result in the following execution graph:

``` text
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No whitespace between ``` and text

| |
v v
test-app lint-repo
/ \
v v
build-app build-frontend
\ /
v v
deploy-all
```

We hope users can write the following Fluid program to construct the above DAG:

``` python
@fluid.pipeline
def dag(my_repo: "resource,git", my_app_image: "resource,image", my_frontend_image: "resource,image"):
lint_repo = pylint()
test_app = make_test()

build_app = kaniko-build-app(my_repo, my_app_image)
build_app.run_after(test_app)

build_frontend = kaniko-build-frontend(my_repo, my_frontend_image)
build_frontend.run_after(test_app)

deploy_all = deploy_kubectl(my_app_image, my_frontend_image)
deploy_all.inputs.my_app_image.from(build_app)
deploy_all.inputs.my_frontend_image.from(build_frontend)
```

### PipelineRun

A PipelineRun object is like a function invocation.

A PipelineRun object defines a call to a Pipeline. The following is a PipelineRun example from [Tekton's tutorial](https://github.com/tektoncd/pipeline/blob/master/docs/tutorial.md#creating-and-running-a-pipeline):

``` yaml
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
name: tutorial-pipeline-run-1
spec:
serviceAccountName: tutorial-service
pipelineRef:
name: tutorial-pipeline
resources:
- name: source-repo
resourceRef:
name: skaffold-git
- name: web-image
resourceRef:
name: skaffold-image-leeroy-web
```

We hope Fluid users write the following program:

``` python
skaffold_git = fluid.git_resource(
revision="master",
url="https://github.com/GoogleContainerTools/skaffold")
skaffold_image_leeroy_web = fluid.image_resource(
url="gcr.io/wangkuiyi/leeroy-web")

tutorial(skaffold_git, skaffold_image_leeroy_web)
```