Conversation
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
| mkdir -p "${MOUNT_PATH}/${MODEL_PATH}"; | ||
| python -m pip install huggingface_hub; | ||
| hf auth login --token "${HF_TOKEN}"; | ||
| hf download "${HF_MODEL_ID}" --local-dir "/cache/${MODEL_PATH}" |
There was a problem hiding this comment.
Does this skip the model download if it is already present in the path? I believe this is the entire idea of using the PVC to reuse models locally ... right?
There was a problem hiding this comment.
I believe hf download detects whether or not the model has been downloaded. Furthermore mkdir -p accepts folders that already exist, so I think this is good.
There was a problem hiding this comment.
hf download is intelligent and will skip the work. In the current implementation, the PVC is created locally and used only once. Each parallel experiment is running in a different namespace. In principle, we can define a PVC to an existing PV that has only once copy of the model. If we do this, we probably want to add a task before the parallelism to download the model.
| @@ -0,0 +1,40 @@ | |||
| apiVersion: v2 | |||
There was a problem hiding this comment.
Is this chart needed?
Can we simply use the inference-perf container as part of the StepAction, and configure it so that the step actually directly sends the Inference request?
This approach eliminates the chart, along with the need for a separate harness pod (the Tekton step will run as part of some pod anyway).
There was a problem hiding this comment.
The step has now been updated to use the llm-d-benchmark image directly. It isn't as clean as it might be to use the inference-perf image directly. However, it should also be possible to modify it to be easier to use.
|
|
||
| echo "✅ workload completed" | ||
|
|
||
| - name: upload-results |
There was a problem hiding this comment.
This section needs to be completed.
Let us start by implementing HTTP(S)-based results upload to an S3 compatible storage.
See example here: https://www.ibm.com/docs/en/storage-scale/5.2.1?topic=storage-connectivity-cloud-object
There was a problem hiding this comment.
This has been done; the results folder is tarred and uploaded to and s3 compatible bucket.
| mkdir -p "${MOUNT_PATH}/${MODEL_PATH}"; | ||
| python -m pip install huggingface_hub; | ||
| hf auth login --token "${HF_TOKEN}"; | ||
| hf download "${HF_MODEL_ID}" --local-dir "/cache/${MODEL_PATH}" |
There was a problem hiding this comment.
I believe hf download detects whether or not the model has been downloaded. Furthermore mkdir -p accepts folders that already exist, so I think this is good.
| @@ -0,0 +1,150 @@ | |||
| inferenceExtension: | |||
There was a problem hiding this comment.
I am guessing the user would be creating these values yaml files for the experiment pipeline?
There was a problem hiding this comment.
Today, the pipeline takes the location (url) as input. It can take a stringified values file as well. When sweeping through values, the values file ise overridden using --set.
An alternative is to use a (yaml) description of the desired environment to generate the values files. This seems to assume we can express it more simply than the values files to today. It is not clear to me that this is the case.
| A _matrix_ based `Task` can be unrolled into multiple tasks to reduce the parallelism. | ||
| The utility script `utility/transform-pr-parallel.py` does this as follows: | ||
|
|
||
| 1. Unroll a single parameter into one `Task` per value. Each resulting Task defines a matrix over the remaining parameters. |
There was a problem hiding this comment.
Curious what the "unrolled" output looks like here.
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
|
|
||
| 1. Create a namespace where the Tekton pipeline will execute. | ||
| ```shell | ||
| export $NAMESPACE=your_namespace |
There was a problem hiding this comment.
| export $NAMESPACE=your_namespace | |
| export NAMESPACE=your_namespace |
| - the namespace (where the PipelineRun executes) | ||
| - s3 details: secret name, bucket name and endpoint URL | ||
|
|
||
| Run by creating the PipelineRun: |
There was a problem hiding this comment.
This appears as one-liner after rendering. May need to un-indent.
| ```shell | ||
| kubectl apply -f pipeline/stepactions.yaml |
There was a problem hiding this comment.
| ```shell | |
| kubectl apply -f pipeline/stepactions.yaml | |
| ```shell | |
| cd tekton-poc | |
| kubectl apply -f pipeline/stepactions.yaml |
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
| This proof of concept currently implements a variation of the inference-scheduling [scenairo](https://github.com/llm-d/llm-d-benchmark/blob/main/scenarios/guides/inference-scheduling.sh)/[experiment](https://github.com/llm-d/llm-d-benchmark/blob/main/experiments/inference-scheduling.yaml). | ||
|
|
There was a problem hiding this comment.
| This proof of concept currently implements a variation of the inference-scheduling [scenairo](https://github.com/llm-d/llm-d-benchmark/blob/main/scenarios/guides/inference-scheduling.sh)/[experiment](https://github.com/llm-d/llm-d-benchmark/blob/main/experiments/inference-scheduling.yaml). | |
| This proof of concept currently implements a variation of the inference-scheduling [scenairo](https://github.com/llm-d/llm-d-benchmark/blob/main/scenarios/guides/inference-scheduling.sh)/[experiment](https://github.com/llm-d/llm-d-benchmark/blob/main/experiments/inference-scheduling.yaml). | |
| To change the Inference Scheduling configs for the experiment, update `tekton-poc/examples/inference-scheduling/gaie-values.yaml`, then `git push` to your fork, and supply the new URL to `inference-scheduling/` for the `experimentBaseUrl` value in the [pipeline run yaml](./pipeline/pipelinerun-matrix.yaml#L46). |
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
|
This PR is marked as stale after 21d of inactivity. After an additional 14d of inactivity (7d to become rotten, then 7d more), it will be closed. To prevent this PR from being closed, add a comment or remove the |
See tekton_poc/README.md