Add Triton Inference Server Support #34252

SaumilPatel03 · 2025-03-11T16:51:11Z

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

github-actions · 2025-03-11T18:40:34Z

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @jrmccluskey for label python.

Available commands:

stop reviewer notifications - opt out of the automated review tooling
remind me after tests pass - tag the comment author after tests pass
waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

sdks/python/apache_beam/ml/inference/vertex_ai_inference.py

jrmccluskey

Taking code from the JSON handler is fine, but not actually updating it to match the triton inference use case isn't going to work. Please write some unit tests and an integration test (for the latter I can help you get resources stood up in apache-beam-testing to run against.)

sdks/python/apache_beam/ml/inference/vertex_ai_inference.py

jrmccluskey · 2025-03-13T20:21:47Z

sdks/python/apache_beam/ml/inference/vertex_ai_inference.py

+    def _retrieve_endpoint(
+      self, endpoint_id: str,
+      location: str,
+      is_private: bool) -> aiplatform.Endpoint:
+      """Retrieves an AI Platform endpoint and queries it for liveness/deployed
+      models.
+
+      Args:
+        endpoint_id: the numerical ID of the Vertex AI endpoint to retrieve.
+        is_private: a boolean indicating if the Vertex AI endpoint is a private
+          endpoint
+      Returns:
+        An aiplatform.Endpoint object
+      Raises:
+        ValueError: if endpoint is inactive or has no models deployed to it.
+      """
+      if is_private:
+        endpoint: aiplatform.Endpoint = aiplatform.PrivateEndpoint(
+            endpoint_name=endpoint_id, location=location)
+        LOGGER.debug("Treating endpoint %s as private", endpoint_id)
+      else:
+        endpoint = aiplatform.Endpoint(
+            endpoint_name=endpoint_id, location=location)
+        LOGGER.debug("Treating endpoint %s as public", endpoint_id)
+
+      try:
+        mod_list = endpoint.list_models()
+      except Exception as e:
+        raise ValueError(
+            "Failed to contact endpoint %s, got exception: %s", endpoint_id, e)
+
+      if len(mod_list) == 0:
+        raise ValueError("Endpoint %s has no models deployed to it.", endpoint_id)
+
+      return endpoint


Do triton endpoints function correctly in this way?

sdks/python/apache_beam/ml/inference/vertex_ai_inference.py

jrmccluskey · 2025-03-13T20:23:29Z

sdks/python/apache_beam/ml/inference/vertex_ai_inference.py

+        self.region = region
+        self.endpoint_name = endpoint_name
+        self.endpoint_url = f"https://{region}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{region}/endpoints/{endpoint_name}:predict"
+        self.is_private = private


are there distinctions between public and private triton endpoints?

jrmccluskey · 2025-03-13T20:24:18Z

sdks/python/apache_beam/ml/inference/vertex_ai_inference.py

+    def run_inference(
+        self,
+        batch: Sequence[Any],
+        model: aiplatform.Endpoint,


This does not align with usage, an endpoint object is not the model name

@jrmccluskey Can you explain why model parameter should not be aiplatform Endpoint. Since load_model returns an Endpoint object, it seems logical to use it for Vertex AI’s raw_predict method (e.g., with Triton).

raw_predict isn't using an endpoint object, it uses a PredictionServiceClient (https://cloud.google.com/vertex-ai/docs/predictions/get-online-predictions#raw-predict-request) because you are forced to use the raw_predict API (https://cloud.google.com/vertex-ai/docs/predictions/using-nvidia-triton#deploy_the_model_to_endpoint)

you're still deploying the model to a vertex endpoint, but that object's abstraction in the SDK is not useful here

sdks/python/apache_beam/ml/inference/vertex_ai_inference.py

SaumilPatel03 · 2025-03-16T18:10:48Z

@jrmccluskey Thank you for your response I have written some unit test for Trition inference server. And have made changes in the Inference code from nvidia-triton-custom-container-prediction.ipynb
Can you give me some resources to write integration test.

jrmccluskey

If I ask questions or point out issues, resolving them without a comment explaining the code is not good practice.

chamikaramj · 2025-03-28T18:11:44Z

@SaumilPatel03 any updates ?

SaumilPatel03 · 2025-03-29T05:24:11Z

@chamikaramj I am a bit preoccupied right now. but I’ll go ahead and convert this PR to draft in the meantime.

github-actions · 2025-04-26T12:14:16Z

Reminder, please take a look at this pr: @jrmccluskey

github-actions · 2025-04-30T12:14:48Z

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @damccorm for label python.

Available commands:

stop reviewer notifications - opt out of the automated review tooling
remind me after tests pass - tag the comment author after tests pass
waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

github-actions · 2025-05-07T12:15:37Z

Reminder, please take a look at this pr: @damccorm

damccorm · 2025-05-08T18:27:53Z

R: @jrmccluskey

assigning to jack since he started to take a look. With that said, it looks like there are many failing precommits - @SaumilPatel03 please take a look at those

github-actions · 2025-05-08T18:29:15Z

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment assign set of reviewers

jrmccluskey

Please address comments + fix precommit errors. I'm also not particularly confident that the code as-written works with the actual service.

jrmccluskey · 2025-05-08T18:32:08Z

sdks/python/apache_beam/ml/inference/test_triton_model_handler.py

+import unittest
+from unittest.mock import patch, MagicMock, ANY, call
+import json
+from google.cloud import aiplatform
+from apache_beam.ml.inference.vertex_ai_inference import VertexAITritonModelHandler
+from apache_beam.ml.inference import utils 
+from apache_beam.ml.inference.base import PredictionResult
+import numpy as np
+import base64 


import order is wrong, the linting/formatting checks should have the correct order listed but for reference you should be importing in at least two distinct blocks: native python imports first, then third-party imports. These should be in alphabetical order within each block as well.

jrmccluskey · 2025-05-08T18:32:58Z

sdks/python/apache_beam/ml/inference/vertex_ai_inference.py

+import numpy as np
+MSEC_TO_SEC = 1000
 from apache_beam.ml.inference.base import RemoteModelHandler


MSEC_TO_SEC should not be defined in the import block

jrmccluskey · 2025-05-08T18:33:30Z

sdks/python/apache_beam/ml/inference/vertex_ai_inference.py

 import logging
 from collections.abc import Iterable
 from collections.abc import Mapping
 from collections.abc import Sequence
 from typing import Any
 from typing import Optional
+from typing import Dict


use the built-in dict type for hints instead of typing.Dict

jrmccluskey · 2025-05-08T18:40:10Z

sdks/python/apache_beam/ml/inference/vertex_ai_inference.py

+    def run_inference(
+        self,
+        batch: Sequence[Any],
+        model: aiplatform.Endpoint,


raw_predict isn't using an endpoint object, it uses a PredictionServiceClient (https://cloud.google.com/vertex-ai/docs/predictions/get-online-predictions#raw-predict-request) because you are forced to use the raw_predict API (https://cloud.google.com/vertex-ai/docs/predictions/using-nvidia-triton#deploy_the_model_to_endpoint)

jrmccluskey · 2025-05-08T18:41:01Z

sdks/python/apache_beam/ml/inference/vertex_ai_inference.py

+    def run_inference(
+        self,
+        batch: Sequence[Any],
+        model: aiplatform.Endpoint,


you're still deploying the model to a vertex endpoint, but that object's abstraction in the SDK is not useful here

jrmccluskey · 2025-05-08T18:42:58Z

sdks/python/apache_beam/ml/inference/vertex_ai_inference.py

+            aiplatform.Endpoint object.
+        """
+        return self.endpoint
+
    def _retrieve_endpoint(


I cannot find any sort of discussion around public versus private triton endpoints, but as I've said before the aiplatform.Endpoint classes aren't what you should be using anyway.

Add Triton Inference Server Support

33225ce

github-actions bot added the python label Mar 11, 2025

github-actions bot added the Next Action: Reviewers label Mar 11, 2025

jrmccluskey requested changes Mar 12, 2025

View reviewed changes

jrmccluskey added Next Action: Author and removed Next Action: Reviewers labels Mar 12, 2025

Update vertex_ai_inference.py

3ee8cf6

github-actions bot added Next Action: Reviewers and removed Next Action: Author labels Mar 13, 2025

SaumilPatel03 added 2 commits March 13, 2025 22:19

Update vertex_ai_inference.py

dcd470d

Update vertex_ai_inference.py

281df71

SaumilPatel03 requested a review from jrmccluskey March 13, 2025 20:10

jrmccluskey requested changes Mar 13, 2025

View reviewed changes

jrmccluskey added Next Action: Author and removed Next Action: Reviewers labels Mar 14, 2025

Update vertex_ai_inference.py

a7b6518

github-actions bot added Next Action: Reviewers and removed Next Action: Author labels Mar 15, 2025

SaumilPatel03 added 2 commits March 16, 2025 12:45

Added unit test for Triton server

0aca56a

Write unit test for Triton server

e64a490

SaumilPatel03 requested a review from jrmccluskey March 16, 2025 18:11

jrmccluskey requested changes Mar 18, 2025

View reviewed changes

jrmccluskey added Next Action: Author and removed Next Action: Reviewers labels Mar 18, 2025

SaumilPatel03 marked this pull request as draft March 29, 2025 05:24

github-actions bot removed the Next Action: Author label Mar 29, 2025

github-actions bot added the Next Action: Reviewers label Mar 29, 2025

jrmccluskey added Next Action: Author and removed Next Action: Reviewers labels Apr 2, 2025

Update vertex_ai_inference.py

970f2f1

github-actions bot added Next Action: Reviewers and removed Next Action: Author labels Apr 6, 2025

SaumilPatel03 marked this pull request as ready for review April 9, 2025 19:28

SaumilPatel03 and others added 2 commits April 13, 2025 16:30

Merge branch 'master' into Issue31173

011982e

Updated the BaseModel to RemoteModelHandler

06b804d

SaumilPatel03 requested a review from jrmccluskey April 18, 2025 20:16

github-actions bot added the slow-review label Apr 26, 2025

github-actions bot removed the slow-review label Apr 30, 2025

github-actions bot added the slow-review label May 7, 2025

jrmccluskey added Next Action: Author and removed Next Action: Reviewers slow-review labels May 8, 2025

jrmccluskey requested changes May 8, 2025

View reviewed changes

SaumilPatel03 closed this May 19, 2025

Add Triton Inference Server Support #34252

Add Triton Inference Server Support #34252

Uh oh!

Conversation

SaumilPatel03 commented Mar 11, 2025

GitHub Actions Tests Status (on master branch)

Uh oh!

github-actions bot commented Mar 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jrmccluskey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SaumilPatel03 commented Mar 16, 2025

Uh oh!

jrmccluskey left a comment

Choose a reason for hiding this comment

Uh oh!

chamikaramj commented Mar 28, 2025

Uh oh!

SaumilPatel03 commented Mar 29, 2025

Uh oh!

github-actions bot commented Apr 26, 2025

Uh oh!

github-actions bot commented Apr 30, 2025

Uh oh!

github-actions bot commented May 7, 2025

Uh oh!

damccorm commented May 8, 2025

Uh oh!

github-actions bot commented May 8, 2025

Uh oh!

jrmccluskey left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!